Skip to main content

Delta Generation with Kettle

In one of my current project I have to do lots delta generation to figure out if any data changed and be able to work differently with the data depending if it's similar, new, changed, or deleted. I came up with the following transformation:



The file works as following

  • The current and old files are CSV inputs and both have the same format. 
  • The "Merge Rows" does the main delta generation. 
  • In "Filter rows" I take out the identical rows because they are not important to me.
  • Kettle uses long descriptions (deleted, new, updated) but I need I,U,D for my system to be compatible with another data source involved, so I map these values.
  • The delta finally gets saved in a text file and a table. One would be enough, but I put the text file in an archive in case I need it again. The table gets truncated every time the transformation runs and is used for the data load in the next step.
Do you have questions or do you see any possible improvements? Feel free to post your comments!

Comments

rpbouman said…
Hi!

nice. Just curious, is this transformation capable of comparing arbitrary tables? I needed that, but could not find a way to do it.

(I did solve my problem in the end - a job that generates transformations like the one you show here)
Ben said…
Thanks!
I'm not 100% if it will work. I only used it for files and an almost similar transformation for a table/file combination.

Popular posts from this blog

Dynamic cell references in spreadsheets with Google Docs

During my former internships in a consulting company I had to work A LOT with Microsoft Excel and often had to use dynamic cell references over multiple worksheets. Recently I started using the spreadsheets of Google Docs to track my bank account balance and to figure out where all my money goes. I decided to have one sheet for every month + one sheet of the month I want to analyze. But how do I dynamically change the reference to the sheet (the monthly sheet) I want to analyze without editing every single formula? Here is my solution : Create the target sheets and your overview sheet - I gave my sheets the names Month + Year (July 09) In the overview choose one cell that you want to contain the reference sheet and enter the sheet name - cell D24 in the example - Using the month names you might have to write 'July 09 , otherwise Google will think it's a date. In the overview you can now dynamically reference to a detail sheet using the following formula: =INDIRECT("'

Pentaho Data Integration - Multi-part Form submission with file upload using the User Defined Java Class Step

I recently needed to use Pentaho Data Integration (PDI) to send a file to a server for processing using HTTP Post. I spent several hours trying to use the existing steps HTTP Post, HTTP Client & Rest Client but I couldn't get it to work. After some more research I came across the issue PDI-10120 - Support for Multi-part Form Submittal In Web Service Steps  and I thought I was out of luck. I previously wrote a small Java client for a similar use case and remembered the PDI has a step called User Defined Java Class  (UDJC). After reading this great tutorial I created the following basic transaction. I have a dataset with the URL and the full file path and use the UDJC to make the HTTP call. HTTP Post using User Defined Java Class The Java class handles the actual HTTP Post. It uses 2 input variables, the URL (url) which is used for the call and the file name (longFileName). The HTTP call then contains the file (line 30) and the file name (line 31). I included some basi

Products you don't expect to be 'Made in China' - Del Monte fruit cups

Since I moved to Canada back in March I have started to realize how many products are actually made in China. Back in Germany you could also buy lots of stuff from China but you mostly had the choice between German or Europe products and Chinese products. When I went to Food Basics in Oakville a couple weeks ago to get some apples I stood in front of a huge tray of Chinese apples! Aren't there enough apples in Ontario, Canada or the US? Even Mexico would probably be closer than China. Another day my wife bought Del Monte fruit cups in the grocery store. I checked the label when I was going to eat it and i decided to leave it in the fridge. First of all it is 'Made in China' (again I guess no other country in this world has fruit) and second it contains artificial flavor. How bad must the fruit inside be that you need artificial flavor (and does anybody in China controls how it is made)? For my part I'll check the labels more closely whenever I buy any kind of product