Execute initialization code only once
Suppose it is necessary to do some kind of sophisticated custom lookup. For argument’s sake let’s look at an transformation step, that is supposed to match a string from a fixed set of ca. 1000 strings based on a modulo calculation of an input ID. This is what the first attempt might look like.
var values = [ "AARON", "ABHI", "AG",--- snip ---"ZIGZAGGLE", "ZOGLA", "ZONGKER", "ZSOLT", "ZUKAS", "ZVEROVICH" ]; var output_value = values[id % values.length];
All of the script tabs also appear in the left menu where you can also right click them. The context menu allows to give the newly created tab a more meaningful name.
After moving the initialization code into the new initialization tab, Kettle must be told to execute the code in this tab exactly once just before the first row comes in. This is done by right clicking the initialization tab and marking it as a “Start Script”.
Unfortunately in Kettle it is not possible to see at a glance whether a tab has been marked as special. When in doubt, you have to click through all your tabs and assign the appropriate script types.
Edit: Samatar fixed this little flaw in Kettle 4.1, see comments section
Is it really faster now?
The new step uses the initialization tab and avoids recreating the values array on each row. The unoptimized transformation processes 1,000,000 rows in ca. 19 secs on my laptop. The optimized version pumps the same amount of rows in under 3 seconds. Download the example transformations, if you like.
How is a script tab with no type useful?
A script tab that is not marked as special will not be executed by Kettle.
This can be useful if you want to keep a code scrapbook around. But you can also execute the typeless tab’s script dynamically from your initialization step. This can be useful if you have several initialization subtasks that you want to keep in separate tabs. The bundled function LoadScriptFromTab is used to execute a script tab dynamically. It is in the “Special functions” section. Which brings me to the next point.