Using Named Parameters in Kettle

Kettle’s named parameters often enable very elegant solutions for ETL requirements. This post gives an introduction to the named parameters feature of Kettle.

It is safe to think of named parameters as variables with their initial values assigned before the ETL process starts. In fact, they are available as variables when the ETL process executes. Their values can be specified in the launch dialog inside spoon or on the command line when running kitchen or pan.

How to define named parameters?

Named parameters are defined in the job or transformation settings dialog. Use the parameters tab to specify them. Give each parameter a name and a default value in case the parameter is not explicitly given another value during process start.

How to specify the parameter values?

When launching a job or transformation from the Spoon application the launch dialog allows to enter values for each named parameter.

When launching from the command line, you’ll have to specify them using command line parameters. The exact syntax differs slightly depending on the operating system and shell you are using, but the general idea is to pass in the parameters using the -param:name=value argument for each named parameter.

Example: launch job.kjb with named parameters files.dir=/opt/files and max.date=2010-06-02

Linux: ./kitchen.sh -file:job.kjb -param:files.dir=/opt/files -param:max.date=2010-06-02
Windows: Kitchen.bat -file:job.kjb “-param:files.dir=/opt/files” “-param:max.date=2010-06-02″

Please observe the usual escaping rules of your shell when passing in parameter values. The whole -param:name=value block must appear as a single argument to the shell.

How to pass parameters to sub jobs and transformations?

Often a master job is used for control flow. It coordinates invocations of sub jobs and transformations. Parameters already present in the master job are passed on to the sub jobs and transformations if you check their “pass all parameter values down” checkbox. You can specify more parameters manually by entering the paramerer names and values in the parameters box.

Remember that the sub job or transformation must know about the parameters you are trying to pass in. They must be defined in the settings dialog or they will not be available in the sub job or transformation, no matter what you care to pass in from outside. By the way: using this feature together with the “execute for every input row” feature enables an interesting looping technique.

How to access parameter values?

Parameters are available as variables. Each named parameter is mapped to a variable during runtime. If you have a parameter named foo you can reference it using ${foo} in any place where variables are supported.

How to access parameters in scripts?

Sometimes it is convenient to derive new variables from incoming parameters. An incoming parameter representing a date can be used to calculate variables for the paths and names of files you want to import for example. Scripting access to variables works slightly differently depending on whether the context is a job or a transformation.

Accessing variables in jobs

In Kettle jobs the JavaScript job entry can be used to read and set variables. Use the parent_job.setVariable() and parent_job.getVariable() methods to retrieve and set the values of variables. Please note that Kettle variables always hold string values. If you need to do any arithmetic you should convert the string value to an appropriate type first.

Have a look at the example job which accepts an incoming date in yyyy-MM-dd format on the parameter run.date and calculates the previous week’s date from that, saving it in the last.week variable.

// prepare
var format = new java.text.SimpleDateFormat("yyyy-MM-dd");
var cal = java.util.Calendar.getInstance();

// get variable and subtract seven days
var runDateString = parent_job.getVariable("run.date");
cal.setTime(format.parse(runDateString));
cal.add(java.util.Calendar.DAY_OF_MONTH, -7);

// set new kettle variable
parent_job.setVariable("last.week", format.format(cal.getTime()));

true

In case you are wondering how Java code ends up in the JavaScript job entry, check out this post.

Accessing variables in transformations

If you are working with the JavaScript step in a transformation you can use the predefined functions getVariable() and setVariable() to access variables. View samples of how they are used by right clicking them in the Transform Functions/Special Functions section on the navigator on the left.

In the context of transformations special caution is required: when setting variables in transformations it is important to understand that the multi-threaded nature of transformations makes it impossible to tell exactly when the variable is being set. Therefore it is illegal to set and read the same variable within the same transformation. If you intend to create or manipulate a variable within a transformation, make sure you use its value further down in the control flow of an enclosing job.

Have a look at the example illustrating the technique. A master job runs a transformation which generates random numbers between 1 and 1000. The transformation saves the number closest to 500 in the variable best.match. When the transformation completes the job logs which number was closest.

The following script is used to check for the best match and set the best.match variable.

var bestMatch;

// normalize between 1 and 1000
var randValue = abs(rand) % 1000 + 1;

// check for best match
if (bestMatch == null){
	bestMatch = randValue;
	setVariable("best.match", ""+bestMatch, "s");
}
else if (abs(bestMatch-500) > abs(randValue-500)){
	bestMatch = randValue;
	setVariable("best.match", ""+bestMatch, "s");
}

Conclusion

Named parameters are defined and given default values in the settings dialog of a job or transformation. They offer an elegant way for passing information in and out of jobs and transformations. Named parameters are conveniently available as regular variables so they can be used in any context that supports variable substitution. Named parameters can also be accessed and manipulated using Kettle’s scripting facilities, making them a versatile tool for many ETL needs.

Comments and corrections are welcome :-)

Cheers

Slawo

Related Literature

25 comments to Using Named Parameters in Kettle

  • I was having problems accessing parameters inside a sub-transformation (from a JavaScript step), started searching the Pentaho forum / wiki, was about to give up and ask on IRC… Then thought “maybe Slawomir has a post about this”. Turns out I was right. Problem solved.

    Great post, thanks!

  • Slawomir Chodnicki

    Thanks Etienne, good to hear from you again. :)
    Next post is going to be about the Kettle-Cookbook I guess. Check it out http://code.google.com/p/kettle-cookbook/

    Cheers

    Slawo

  • Anil YV

    I am using kettle variables in my job (defined in kettle.properties). When I run the job through Spoon, it works fine. But when I run it through Java API, it says cant find variable. Please help.

  • Slawomir Chodnicki

    Hi Anil,

    Kettle looks for the kettle.properties file in “user.dir”/.kettle/kettle.properties. If the KETTLE_HOME system property is set, Kettle will look for the kettle.properties file here: KETTLE_HOME/.kettle/kettle.properties

    If you’ve never set KETTLE_HOME manually chances are that the java program is run under a different user than Spoon. You can make sure by checking the “user.dir” system property inside spoon and inside your program.

    Cheers

    Slawo

  • [...] “Table Input” step supports replacing Kettle variables/parameters anywhere in the query. Suppose you have a set of tables of identical structure: mammals, birds, [...]

  • anonim

    I have read your article several times and still cannot get parameter passing work. I have a job containing a few transformations, which load data from csv to mysql db. I have specified parameter DATA_DIR (data files directory) in both job and all transformations. When I execute the job in spoon, I get an error “No file(s) specified! Stop processing” this means that the parameter specified in the job is not passed down to the transformation. Interesting is that DATA_DIR has the correct default value both in the transformation and the job which calls the transformation.
    Parameter DATA_DIR has a value of ${DATA_DIR} in the job entry window above.
    Any ideas?

  • Slawomir Chodnicki

    Hey anonim,

    please post your test transformation to the forum at http://forums.pentaho.com/forumdisplay.php?135-Pentaho-Data-Integration-Kettle
    I’ll have a look, and others may be even quicker to help ;)

    Cheers
    Slawo

  • Anup Cherian

    I only started using Kettle and Spoon yesterday. I see the question “where is kettle.properties?” asked a few times but I haven’t found an answer yet. Is this file automatically created?

  • Slawomir Chodnicki

    Hey Anup,

    the kettle.properties file is located in [home dir of user running kettle]/.kettle/kettle.properties

    On a Mac: /Users/slawo/.kettle/kettle.properties
    On Linux: /home/slawo/.kettle/kettle.properties
    On Windows XP: C:\Documents and Settings\slawo\.kettle\kettle.properties

    On other systems (especially flavors of windows) just locate the users home dir, and look into the .kettle subfolder.

    Cheers
    Slawo

  • Thank you for this article, very helpfull. However I am not sure, if this is right:

    > Remember that the sub job or transformation must know about the parameters you are
    > trying to pass in. They must be defined in the settings dialog or they will not be
    > available in the sub job or transformation, no matter what you care to pass in from
    > outside.

    In my case the parameter has been successfully passed after I deleted the definition from the transformation setting. With parameter defined my transformation used default value from the setting.

    Another problem I see when using named parameters in the table input select statements is the fact, that the statement does not parse in the design mode and there is no chance to get it’s fields in following steps.

    /Robert

  • Slawomir Chodnicki

    Hey Robert,

    love the Scott Tiger thing ;)

    Your observations are correct, here’s what’s happening in detail: any parameter you define in a job becomes essentially a variable, and thus may end up being visible in child processes (jobs and transformations) if you work with them in the same spoon session. Passing parameters explicitly requires that they be defined in every job/trans that needs them, so the job/trans can be developed and tested stand-alone, without relying on some other part of the solution to set the values they need. To develop them stand-alone, you’d usually declare the parameters and set their default values to whatever is useful for development. Sometimes it’s useful to replace it with sensible defaults for production. That sometimes means that they’re set to default values that cause errors, so you can verify that parameter passing works as expected.

    In design mode spoon likes to update the values of parameters whenever the transformation or job is saved. So if you’d like to include a dynamic part in the query, just set the default value to something reasonable, save the transformation, and keep working. The table input will take the parameter’s value into account (if you have the corresponding checkbox on, that is).

    Cheers
    Slawo

  • Thilan

    HI.

    Is there any performance issue while running the kettle transformations in Linux(RHEL 5.2)??
    Thilan

  • Slawomir Chodnicki

    Hey Thilian,

    I don’t know about RHEL 5.2 specifically, but so far every popular flavor of Linux I heard of worked just fine :)

    Cheers
    Slawo

  • mark

    Hi,
    Is there any way Switch/ Case step can be implemented at the job level. I assume one can do it with Javascript but don’t know how.
    Thanks in advance
    Mark

  • Slawomir Chodnicki

    Hei Mark,

    at the job level each entry can have two different results. It may succeed or fail. The semantics of this varies with the entry in question. “Table exists” will yield a positive result when the table is there, and a negative one if it is not, for example. There is currently no way for an entry to generate more than two distinct results. If you need a multi-way switch, most people seem to implement that by a sequence of binary yes/no steps. If it’s a custom evaluation (of a variable for example) I often see a JavaScript step that evaluates to true or false, thus generating the appropriate results.

    Hope this makes sense.

    Cheers
    Slawo

  • mark

    Hey Slawomir,
    Thanks much for the info. I have been using multiple Simple Eval steps to implement multi-way switch. It significantly decreases the readability of etl plan at the job level especially if one has more than 10 eval steps. Hope someone implements multi-way switch in the next release.
    Keep up the good work
    Mark

  • Slawomir Chodnicki

    Hey mark,

    Filing a JIRA case has helped more often, than hoping ;)

    Cheers
    Slawo

  • Manish

    Hi Slawomir,

    Thanks Man. You’re really doing a great Job for all the Pentaho Developer community. I needed “How to” on Javascript Transformation and this post really made my life really Easy.

  • Hello
    I use Named Parameters with values in Jobs and Transformation, and afterwards I save these jobs and transformations in Repository.
    Could you please tell me if in the new version of Pentaho ETL, I can can save the Named Parameters (with the corresponding values) in the Repository along with the Jobs and Transformations that use these Named Parameters ?

    Many of my Jobs use across different Repositories, different values for the same Named Parameters. So changing all the time the values of Named Parameters in kettle.properties its not a suitable option due to large number of repositories and Environments (DEV, TEST..).

    Thank you

  • sanal

    Hi Slawo

    How can we get values from our custom properties file and use it in kettle steps like add constants.

    Regards

    sanal

  • gusg3

    Hello !

    I have creating a job master that calls a sub transformations that run in cluster mode, i have configured to pass all parameters to that sub transformations, but i can’t see the variables values in the cluster transformations. I tried explicity definition of parameters but it didn’t work.

    Has anyone had the same problem ?

    Thanks in advance.

  • Akshay

    What are second arguments to getVariable() and setVariable() functions?

  • Akshay

    Understood. Just right click and see the sample from left hand side menu functions in script.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>