In one of my current project I have to do lots delta generation to figure out if any data changed and be able to work differently with the data depending if it's similar, new, changed, or deleted. I came up with the following transformation:
The file works as following
The file works as following
- The current and old files are CSV inputs and both have the same format.
- The "Merge Rows" does the main delta generation.
- In "Filter rows" I take out the identical rows because they are not important to me.
- Kettle uses long descriptions (deleted, new, updated) but I need I,U,D for my system to be compatible with another data source involved, so I map these values.
- The delta finally gets saved in a text file and a table. One would be enough, but I put the text file in an archive in case I need it again. The table gets truncated every time the transformation runs and is used for the data load in the next step.
Comments
nice. Just curious, is this transformation capable of comparing arbitrary tables? I needed that, but could not find a way to do it.
(I did solve my problem in the end - a job that generates transformations like the one you show here)
I'm not 100% if it will work. I only used it for files and an almost similar transformation for a table/file combination.