Contents
Today I would like to talk about the “User Defined Java Class” a.k.a. UDJC step introduced in Kettle 4.0. This step is incredibly versatile. It allows you to put arbitrary processing code into the ETL without the penalty of a performance hit. This article shows how to use the step in different scenarios, explaining each of the step features using a short example. The sample transformations for this article are available in the downloads section.
How does the UDJC step work?
Technically, the user defined java class step works by extending the class org.pentaho.di.trans.steps.userdefinedjavaclass.TransformClassBase, which you should check out to get an Idea of what’s at your fingertips here. To have a look at the class you need to get the Kettle sources. Get a source package from sourceforge, or check out the source from SVN. Information about obtaining Kettle sources from SVN is available here.
The java code you use in the UDJC step is compiled during transformation runtime in the context of a class derived from TransformClassBase. The TransformClassBase class is a generic step plugin class with nice convenience methods added on top. In the custom code you are free to use and override inherited fields and methods as you see fit. You can declare additional fields as you like, and even use import statements at the beginning of your code. The following imports are done automatically for you:
import org.pentaho.di.trans.steps.userdefinedjavaclass.*; import org.pentaho.di.trans.step.*; import org.pentaho.di.core.row.*; import org.pentaho.di.core.*; import org.pentaho.di.core.exception.*;
Assuming you’re already somewhat familiar with Kettle internals and you’re wondering how to conveniently access one thing or another from your code, check out the code snippets section to the left. The samples are going to make your day
The following sections show how the UDJC step can be used in different scenarios.
A Simple Field Transformation
The first example is doing a trivial operation: just uppercasing a string field. Its purpose is to show how to set up the step for processing rows, and how to access input and output fields. If you’ve already been developing plugins for Kettle, this will look very familiar to you. Suppose the row stream contains a field named “testfield”, and the step defines a String output field named “uppercase”. The following code would uppercase the test field and write the result to the output field.

The processRow() function is called by Kettle to tell the step to try and process another input row. It is supposed to return true if the step is ready to process anther row, false if there is nothing more to do.
The getRow() function fetches the next row from any input steps. It is a blocking call. It waits for the previous step to provide a row, if necessary. It eventually returns an Object array representing the incoming row or null to indicate that there are no more rows to process.
What follows is a short (and useless looking) snippet involving the boolean field called “first”. It is a convenience flag provided by the base class to enable special processing when the first row is coming in, which may be useful if there’s some preparation you’d like to execute only once. Feel free to omit setting this to false if you’re not using it.
The call to createOutputRow() ensures that the row array is big enough to hold all output fields added by the step.
The get() method is a helper allowing name based access to input and output fields of the step. You need to specify the field type (In, Out, Info) and the name of the field to get an instance of org.pentaho.di.trans.steps.userdefinedjavaclass.FieldHelper, which allows convenient access to the field’s data.
After the output field is set on the row, a call to putRow() passes the row to possible next steps.
This short sample shows all you need to know to do fast custom computation on incoming fields. The sample transformation for this code is uppercase.ktr.
Using Step Parameters
Suppose you have a nice piece of code, and you’d like it to become more generic. Step parameters may be a useful tool in this context. As an example I’d like to provide a regular expression and a field name as parameters. The step should check whether the specified field matches the regex and output a 1 or 0 to a result field.
Show step code»The
getParameter() method provides access to the parameters defined in the UI. Please note that the step parameter values may contain Kettle variables. Putting variables into parameters is a great way of making variable usage explicit. It certainly beats manually searching the code, to find out which variables are used by the step.
The sample transformation for this is parameters.ktr.
Working with Info Steps
Sometimes it’s necessary to combine the input of multiple steps. Possibly assigning roles to them. A stream lookup step is a classic example. This is where info steps come into play: they are input steps that are explicitly read from. Their rows are not returned by calls to getRow(). It’s easy to utilize info steps on a user defined java class step. Just attach them to the step and define them as info steps in the UDJC step UI. Reading rows from the info steps is as easy as calling getRowsFrom().
The sample transformation uses an info step to receive a list of regular expressions. It tests a field from the main stream for a match. If any of the regular expressions matches, the result field gets a 1. If none match it’s a 0. An additional output field captures which regular expression matched.
Show step code»The call tp findInfoRowSet() finds the row set to read from based on the info step name defined in the UDJC step UI. Reading from info row sets is no different from reading from the main input row set. You just need to specify the row set explicitly and call to getRowFrom().
The example transformation for using info steps is info_steps.ktr
Working with Target Steps
It is possible to direct rows to different target steps using the user defined java class step. In a regular case a call to putRow() takes care of passing on a row to the next step(s). Kettle takes care of the rest. Now if you’d like to direct rows to specific steps, you’d define all possible target steps and call putRowTo(), specifying the output row set explicitly instead. The following sample distributes rows randomly to two different target steps.
Show step code»IMPORTANT NOTE: due to bug PDI-4712 (affects 4.0.0 and 4.0.1) you need to have at least as many info steps defined as you have target steps. Just define dummies to be info steps. They serve no purpose except ensuring that the step dialog does not blow up.
The method findTargetRowSet() finds the correct target row set by the name specified in the UDJC step UI. The returned row set can be written to by calling putRowTo().
The example transformation for this is target_steps.ktr
Error Handling
The UDJC step supports Kettle's error handling feature. To enable it, drag an outbound hop to the step that receives the error rows, then right click the UDJC step and select "Defined Error Handling". Now you can configure error step to receive the bad rows, and enter a few options and field names that hold extended error information. Diverting error rows from within the UDJC step is done by calling putError(), supplying additional information about the error(s) encountered. To demonstrate, the example transformation does a simple division. If the denominator is 0, the row is put to the error stream.
The demo transformation for error handling is error_handling.ktr
Accessing Database Connections
If the java step is supposed to do something with a database, you should be probably using Kettle's facilities for obtaining a database connection. The following example uses the Kettle database connection named "TestDB". The incoming rows have a "table_name" field. The step checks whether the table exists end writes the result to an output field.
If you're planning to do non-trivial work with your databases in a user defined java class you should probably become familiar with the java package org.pentaho.di.core.database. Check out the source of existing DB related steps for examples on how to use classes from the database package.
In this sample init() and dispose() methods of the step are overridden to create the database connection and to disconnect upon completion. The call to init() happens during transformation initialization time, before the first call to processRow(). The dispose() method is called once the transformation is finished. If there's any overarching initialization and clean up code in your steps, you may consider putting it into init() and dispose() respectively.
The example transformation for this is db_access.ktr
It is also possible to create a user defined java class that serves as an input step. In this case it is generating rows of its own instead of processing rows coming in from other steps. As an example I'd like to create a step that generates a row for each java system property.

In this code the step is not calling getRow() to get incoming rows, but initializes a list of properties on the first call to processRow(). The properties are written to the output stream one by one. Since there is no incoming row, the step creates one by calling RowDataUtil.allocateRowData() first. It then sets the field values and passes the row on to the next step.
The sample transformation for this is input_step.ktr
Downloads
Download the example transformations to follow along the samples: udjc_samples.zip
All transformations were created using Kettle 4.0. Enjoy
Conclusion
This article explains how the user defined java class step can be used in different roles and scenarios. If you find that you need custom processing but the JavaScript step is not giving you the necessary performance or flexibility, you may consider using the user defined java class step instead. Be sure to also check out the samples folder that comes with Kettle. There's a few nice samples for the user defined java class step.
Happy Coding
Slawo
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
if (first){
first = false;
}
r = createOutputRow(r, data.outputRowMeta.size());
// Get the value from an input field
String test_value = get(Fields.In, "testfield").getString(r);
// play around with it
String uppercase_value = test_value.toUpperCase();
// Set a value in a new output field
get(Fields.Out, "uppercase").setValue(r, uppercase_value);
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
Powered by Hackadelic Sliding Notes 1.6.5
import java.util.regex.Pattern;
private Pattern p = null;
private FieldHelper fieldToTest = null;
private FieldHelper outputField = null;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// prepare regex and field helpers
if (first){
first = false;
String regexString = getParameter("regex");
p = Pattern.compile(regexString);
fieldToTest = get(Fields.In, getParameter("test_field"));
outputField = get(Fields.Out, "result");
}
r = createOutputRow(r, data.outputRowMeta.size());
// Get the value from an input field
String test_value = fieldToTest.getString(r);
// test for match and write result
if (p.matcher(test_value).matches()){
outputField.setValue(r, Long.valueOf(1));
}
else{
outputField.setValue(r, Long.valueOf(0));
}
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
Powered by Hackadelic Sliding Notes 1.6.5
import java.util.regex.Pattern;
import java.util.*;
private FieldHelper resultField = null;
private FieldHelper matchField = null;
private FieldHelper outputField = null;
private FieldHelper inputField = null;
private ArrayList patterns = new ArrayList(20);
private ArrayList expressions = new ArrayList(20);
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// prepare regex and field helpers
if (first){
first = false;
// get the input and output fields
resultField = get(Fields.Out, "result");
matchField = get(Fields.Out, "matched_by");
inputField = get(Fields.In, "value");
// get all rows from the info stream and compile the regex field to patterns
FieldHelper regexField = get(Fields.Info, "regex");
RowSet infoStream = findInfoRowSet("expressions");
Object[] infoRow = null;
while((infoRow = getRowFrom(infoStream)) != null){
String regexString = regexField.getString(infoRow);
expressions.add(regexString);
patterns.add(Pattern.compile(regexString));
}
}
// get the value of the field to check
String value = inputField.getString(r);
// check if any pattern matches
int matchFound = 0;
String matchExpression = null;
for(int i=0;i<patterns.size();i++){
if (((Pattern) patterns.get(i)).matcher(value).matches()){
matchFound = 1;
matchExpression = (String)expressions.get(i);
break;
}
}
// write result to stream
r = createOutputRow(r, data.outputRowMeta.size());
resultField.setValue(r, Long.valueOf(matchFound));
matchField.setValue(r, matchExpression);
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
Powered by Hackadelic Sliding Notes 1.6.5
import java.util.regex.Pattern;
import java.util.*;
private RowSet lowProbStream = null;
private RowSet highProbStream = null;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// prepare regex and field helpers
if (first){
first = false;
lowProbStream = findTargetRowSet("low_probability");
highProbStream = findTargetRowSet("high_probability");
}
// Send the row on to the next step.
if (Math.random() < 0.35){
putRowTo(data.outputRowMeta, r, lowProbStream);
}
else{
putRowTo(data.outputRowMeta, r, highProbStream);
}
return true;
}
Powered by Hackadelic Sliding Notes 1.6.5
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
if (first){
first = false;
}
r = createOutputRow(r, data.outputRowMeta.size());
// Get the value from an input field
Long numerator = get(Fields.In, "numerator").getInteger(r);
Long denominator = get(Fields.In, "denominator").getInteger(r);
// avoid dividing by 0
if (denominator == 0){
// putErro is declared as follows:
// public void putError(RowMetaInterface rowMeta, Object[] row, long nrErrors, String errorDescriptions, String fieldNames, String errorCodes)
putError(data.outputRowMeta, r, 1, "Denominator must be different from 0", "denominator", "DIV_0");
// get on with the next line
return true;
}
long integer_division = numerator / denominator;
long remainder = numerator % denominator;
// write output fields
get(Fields.Out, "integer_division").setValue(r, Long.valueOf(integer_division));
get(Fields.Out, "remainder").setValue(r, Long.valueOf(remainder));
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
Powered by Hackadelic Sliding Notes 1.6.5
import org.pentaho.di.core.database.Database;
import java.util.List;
import java.util.Arrays;
private Database db = null;
private FieldHelper outputField = null;
private FieldHelper tableField = null;
private List existingTables = null;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
if (first){
first = false;
existingTables = Arrays.asList(db.getTablenames());
tableField = get(Fields.In, "table_name");
outputField = get(Fields.Out, "table_exists");
}
r = createOutputRow(r, data.outputRowMeta.size());
if (existingTables.contains(tableField.getString(r))){
outputField.setValue(r, Long.valueOf(1));
}
else{
outputField.setValue(r, Long.valueOf(0));
}
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface)
{
if (parent.initImpl(stepMetaInterface, stepDataInterface)){
try{
db = new Database(this.parent, getTransMeta().findDatabase("TestDB"));
db.shareVariablesWith(this.parent);
db.connect();
return true;
}
catch(KettleDatabaseException e){
logError("Error connecting to TestDB: "+ e.getMessage());
setErrors(1);
stopAll();
}
}
return false;
}
public void dispose(StepMetaInterface smi, StepDataInterface sdi)
{
if (db != null) {
db.disconnect();
}
parent.disposeImpl(smi, sdi);
}
Powered by Hackadelic Sliding Notes 1.6.5
import java.util.*;
private ArrayList keys = null;
private int idx = 0;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
if (first){
first = false;
// get the system property names, output is done one at a time later
keys = Collections.list(System.getProperties().propertyNames());
idx = 0;
}
if (idx >= keys.size()) {
setOutputDone();
return false;
}
// create a row
Object[] r = RowDataUtil.allocateRowData(data.outputRowMeta.size());
// Set key and value in a new output row
get(Fields.Out, "key").setValue(r, keys.get(idx));
get(Fields.Out, "value").setValue(r, System.getProperties().get(keys.get(idx)));
idx++;
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
Powered by Hackadelic Sliding Notes 1.6.5




[...] PDI. NOTE: If you’re using Kettle 4.0 or later, you also have the option to use the new User Defined Java Class [...]
Great article, thank you. Are there special consideration when a UDJC is used in a Mapping (sub-transformation)? My stand alone versions work as expected, but in a sub transformation they fail to generate output.
I was asked to take a look at this problem earlier today and I replied on both the bug and the forum with some solutions that might help you.
How to pass filename parameter to Java class dynamically, for a kettle transformation as an input.
how to import parameters from kettle.properties file
Hi Sam,
you can access all Variables (also those that come from kettle.properties) by calling getVariable(String variableName) in the user defined java class step. This and other useful methods are showcased in the “Code Snippets” section on the left side of the step dialog.
Cheers
Slawo
I am having trouble with UDJC setVariable() and getVariable(). I have already posted one thread in Pentaho forum, here is the link for details:
http://forums.pentaho.com/showthread.php?80981-setVariable()-Error-in-UDJC.&p=253141#post253141
Please let me know what is wrong with my code.
Mike.
I am having trouble with UDJC setVariable() and getVariable().
I have already posted one thread in Pentaho forum, here is the link for details:
http://forums.pentaho.com/showthread.php?80981-setVariable()-Error-in-UDJC.&p=253141#post253141
Please let me know what is wrong with my code.
Mike
Hey Mike,
I just answered your forum post.
Cheers
Slawo
Hey I just found your Field Transformation example and tried to test the following code snipet to get the value from an input field, but unfortunaley I cant use the “get(Fields.In, “testfield”).getString(r);” method. She didnt exist. What did I wrong? Just for info, I am a newbie in pentaho java plugin development but I need to program a small java plugin for my bachelor thesis.
Thx for helping
Kind regards
Daniel
Hey Daniel,
please post your existing transformation on the pentaho forums. I’ll gladly have a look and help out. There are many knowledgable people reading the forums, so chances are good, that somebody even beats me to it
Cheers
Slawo
This is my java class code,
//################code
import java.util.HashMap;
import java.util.Iterator;
import java.util.TreeMap;
import java.util.Map.Entry;
private FieldHelper outputField = null;
private FieldHelper inputField1 = null;
private FieldHelper inputField2 = null;
private FieldHelper inputField3 = null;
private FieldHelper inputField4 = null;
private FieldHelper startIp = null;
private FieldHelper endIp = null;
private FieldHelper areaId = null;
private FieldHelper subAreaId = null;
private FieldHelper ispId = null;
private TreeMap ipInt2AreaIsp;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
if (first){
first = false;
// get the input and output fields
outputField = get(Fields.Out, “areaId”);
inputField1 = get(Fields.In, “int1″); //########line 61
inputField2 = get(Fields.In, “int2″);
inputField3 = get(Fields.In, “int3″);
inputField4 = get(Fields.In, “int4″);
startIp=get(Fields.Info,”Field_000″);
endIp=get(Fields.Info,”Field_001″);
areaId=get(Fields.Info,”Field_002″);
subAreaId=get(Fields.Info,”Field_003″);
ispId=get(Fields.Info,”Field_004″);
RowSet infoStream = findInfoRowSet(“ip_table”);
ipInt2AreaIsp=new TreeMap();
Object[] infoRow = null;
while((infoRow = getRowFrom(infoStream)) != null){
long field1=startIp.getInteger(infoRow);
long field3=areaId.getInteger(infoRow);
ipInt2AreaIsp.put(field1,field3);
}
}
long ipInt=0;
long ipInt1=inputField1.getInteger(r).longValue();
long ipInt2=inputField2.getInteger(r).longValue();
long ipInt3=inputField3.getInteger(r).longValue();
long ipInt4=inputField4.getInteger(r).longValue();
ipInt = ipInt1<<24+ipInt2<<16+ipInt3<<8+ipInt4;
Entry entry=ipInt2AreaIsp.floorEntry(ipInt);
long area=-1;
if(entry!=null)
area=((Long)entry.getValue()).longValue();
// write result to stream
r = createOutputRow(r, data.outputRowMeta.size());
outputField.setValue(r, Long.valueOf(area));
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
//#####################end
I encountered errors like this
//###############error
Unexpected conversion error while converting value [int1 Integer] to an Integer
[B cannot be cast to java.lang.Long
org.pentaho.di.trans.steps.userdefinedjavaclass.FieldHelper.getInteger(FieldHelper.java:69)
Processor.processRow(Processor.java:61) //the line 61 is tagged int the code
//####################end
I do not know what is wrong with it.Please help me.Thanks!
Hey there,
it’s difficult to say without the transformation to look at. My first guess would be that maybe the fields you’re reading from are coming from a CSV file, and lazy conversion is enabled on them, in which case they would be represented internally by a byte array. If that’s the case, try disabling lazy conversion and see what happens. If there’s another problem, I’d recommend posting a short transformation that shows the issue on the Pentaho Forums. You’d have to register first, but the community is really good at helping with issues like this.
Cheers
Slawo
Thank you very much! The problem is solved. Your guess is exactly the point.
Hey, I am using pentaho 3.2 version.
In my transformation , I am reading a CSV of multiple columns (lets say,some headers like COLUMN1, COLUMN2 etc)..
I am reading another CSV which describes the removal rules for rows..(like Remove if COLUMN1 is BLANK) – So i have written code to read this CSV and stored the column values inside an array. (So the columns used in removal rules will be stored in array called ‘fieldArray’)
In the code , iam checking if(fieldArray[i]==null) { then skip record },but here the fieldArray(i) replaces the STRING “COLUMN1,COLUMN2 etc” .. Actually i need to check the value of COLUMN1 in each row here..
Also I have tried to access the field using
var idx = getInputRowMeta().indexOfValue(“COLUMN1″); – it is returning index like 52,53..
But when i am using getRow() and row , its says its not defined..
Could you please tell me a better approach to solve this out.Please..!!!
Hi,
I am using pentaho 4.1.0 version.
I am unable to run the db_access.ktr example,could you please tell me where you are specifying the database details.
how to access the db data and do calculations with retreived data using pentaho 4.1.0,could you please help me out in this..
Hi Ramya,
you seem to want to do some calculations on data that comes from some database. This is basic Kettle functionality. The Java step is for cases that need special consideration, like connecting to some unsupported data source, or doing complex calculations using some third party library for example.
For basic Kettle usage I recommend looking at these:
http://www.amazon.com/Pentaho-3-2-Data-Integration-Beginners/dp/1847199542/ref=sr_1_4?ie=UTF8&qid=1314876836&sr=8-4
http://www.amazon.com/Pentaho-Data-Integration-4-Cookbook/dp/1849515247/ref=sr_1_3?ie=UTF8&qid=1314876836&sr=8-3
http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177/ref=sr_1_1?ie=UTF8&qid=1314876836&sr=8-1
Cheers
Slawo
Hi Slawo ,
My issue resolved ..I have declared the array in java script step and the array was reset for each row coming in .. I have added a ‘start script’ to my script and moved all my declarations to this ..
guys,
I m doomed by a profblem. i need to design an ETL kind of a feed to extract the href links from an HTML page as a string and get a part of the resulting string. Is there a way to do that in kettle.. I could have done that using a script.. but I am exploring a way to do this in kettle.
I need this kinda urgently.. Pls help me if you guys know the solution..
-Skumar
Hi skumar,
if you need some quick and dirty kettle solution you,d probably want to read the file contents into a field, split the string on ‘ If you want to reach a bigger audience, I'd suggest getting an account on the Kettle forum.
Cheers
Slawo
Hi,
Can we import an external jar in kettle?
Hi there,
usually it’s sufficient to put it into the libext folder.
Cheers
Slawo
Hi, These examples are too good and very informative. I am newbie to pentaho and got a doubt. I am just extracting fields from one tabe and transporting into another table (Both are based on MYSQL). The issue in from the table input step, the empty fields are considered as NULL fields for the TABLE OUTPUT STEP and kettle throws me an error saying “Column x cannot be null”. I am trying to include a plugin called IF FIELD IS NULL. But could not get it working. Please help me on this
I’ve run info_steps.ktr under Kettle 4.2.1 and it runs fine. However, if I open the ‘User Defined Java Expression’ and click the ‘Test Class’ button, I get the following error:
2012/01/05 15:35:34 – Spoon – The transformation has finished!!
2012/01/05 15:47:06 – test values – PREVIEW – Dispatching started for transformation [test values - PREVIEW]
2012/01/05 15:47:06 – ## TEST DATA ##.0 – Finished processing (I=0, O=0, R=0, W=10, U=0, E=0)
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : Unexpected error
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : org.pentaho.di.core.exception.KettleStepException:
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : Unable to find Info field helper for field name ‘regex’
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) :
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : at org.pentaho.di.trans.steps.userdefinedjavaclass.TransformClassBase.get(TransformClassBase.java:730)
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : at Processor.processRow(Processor.java:29)
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : at org.pentaho.di.trans.steps.userdefinedjavaclass.UserDefinedJavaClass.processRow(UserDefinedJavaClass.java:1182)
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : at org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
2012/01/05 15:47:06 – test values.0 – ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : at java.lang.Thread.run(Thread.java:619)
2012/01/05 15:47:06 – test values.0 – Finished processing (I=0, O=0, R=1, W=0, U=0, E=1)
2012/01/05 15:47:06 – test values – PREVIEW – test values – PREVIEW
2012/01/05 15:47:06 – test values – PREVIEW – test values – PREVIEW
This, in turn, is hiding other errors I’ve made. Is there a way around this error?
Thanks,
Hal
How can i get the excel file at run time from user.
1) I need to ask user which file he want to select for excel input.As we have in html.by selecting file appropriate data will be populated.
2) the sheet name and field are the same as we define in excel input.
Hi,
Thanks for the excellent examples above! Definitely very helpful.
I think I might have hit a bug/issue. I am using PDI 4.2.0. I am trying to use a similar transform as info_steps.ktr . In the info step, I am passing a list of field names. The data input stream on the left is coming from a database query (Table Input component). The query is complex and so it doesn’t immediately provide the data. It takes a minute to start streaming data from the query.
In this scenario, I noticed that the getRow() call at the top is actually getting the first row from the Info step! I realized this by putting in logDebug statements and printing out the data returned from getRow(). My code to read the Info step is within the “if (first) { … }” conditional so it can’t be run before the getRow() call. I verified that I had specified the Info step correctly and also ran the Test Class to verify my code.
I stared at the PDI source code for getRow() and it seems to me that this is indeed happening; it doesn’t seem to make a distinction between the data step and the info step for input. In your example, this issue wasn’t hit probably because the data from data step is immediately available from the first getRow() call.
I am looking for a workaround e.g. how can I override getRow() to get data from non-Info steps only? Or any other suggested workarounds would be fine too.
Thank you.
Hi There,
definitely. You hit a bug that I also stumbled upon some time ago. It’s documented as http://jira.pentaho.com/browse/PDI-5115
Just get a jira account and vote for it
Cheers
Slawo
I am using a java program to call a kettle transformation, not that i want to use the UDJC, is there a way to use a third party class in the UDJC as libext does not come into my scenario.
I understand that third party class in the UDJC will get loaded if it is in libext of kettle, but i am using a java program to run the kettle script, any ideas or pointers would be appreciable , Thanks
Hey Sharath,
you just need to make sure that the classes you need are visible to the Kettle classes. In most scenarios it’s sufficient to make sure they are in the classpath. If you have a more sophisticated setup, you need to make sure that the classes are visible by the class loader that is used for the PDI classes.
Cheers
Slawo
Hi Slawomir,
I have to use the step “user-defined java class” to encrypt the email data. We have already the Java program written for it. I need to embed this program in kettle, where I will pass row by row the email Id’s and the resultant should be the encypted form. The original program is :
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.security.spec.KeySpec;
import javax.crypto.Cipher;
import javax.crypto.SecretKey;
import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.DESedeKeySpec;
import org.apache.commons.codec.binary.Base64;
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
public class EncryptionUtils {
public static final String ENCRYPTION_SCHEME = “DESede”;
public static final String ENCRYPTION_KEY = “ABC ENCRYPTION KEYX”;
private static final String UNICODE_FORMAT = “UTF8″;
private static KeySpec keySpec;
private static SecretKeyFactory keyFactory;
private static Cipher cipher;
private static final char[] HEX = { ’0′, ’1′, ’2′, ’3′, ’4′, ’5′, ’6′, ’7′, ’8′, ’9′, ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’ };
static {
try {
byte[] keyAsBytes = ENCRYPTION_KEY.getBytes(UNICODE_FORMAT);
keySpec = new DESedeKeySpec(keyAsBytes);
keyFactory = SecretKeyFactory.getInstance(ENCRYPTION_SCHEME);
cipher = Cipher.getInstance(ENCRYPTION_SCHEME);
} catch (Exception e) {
e.printStackTrace();
}
}
public static String encrypt(String unencryptedString) {
if (unencryptedString == null || unencryptedString.trim().length() == 0)
return unencryptedString;
try {
SecretKey key = keyFactory.generateSecret(keySpec);
cipher.init(Cipher.ENCRYPT_MODE, key);
byte[] cleartext = unencryptedString.getBytes(UNICODE_FORMAT);
byte[] ciphertext = cipher.doFinal(cleartext);
BASE64Encoder base64encoder = new BASE64Encoder();
return base64encoder.encode(ciphertext);
} catch (Exception e) {
// Unable to encrypt
return unencryptedString;
}
}
public static String encrypt(int unencryptedInt) {
return encrypt(String.valueOf(unencryptedInt));
}
public static String decrypt(String encryptedString) {
if (encryptedString == null || encryptedString.trim().length() <= 0)
return encryptedString;
try {
SecretKey key = keyFactory.generateSecret(keySpec);
cipher.init(Cipher.DECRYPT_MODE, key);
BASE64Decoder base64decoder = new BASE64Decoder();
byte[] cleartext = base64decoder.decodeBuffer(encryptedString);
byte[] ciphertext = cipher.doFinal(cleartext);
return new String(ciphertext, UNICODE_FORMAT);
} catch (Exception e) {
// Unable to decrypt
return encryptedString;
}
}
public static String md5Encode(String input, String salt) {
return md5Encode(mergePasswordAndSalt(input, salt));
}
private static String mergePasswordAndSalt(String input, String salt) {
if (input == null) {
input = "";
}
if (StringUtils.isEmpty(salt)) {
return input;
} else {
return input + "{" + salt + "}";
}
}
public static String md5Encode(String input) {
MessageDigest messageDigest;
try {
messageDigest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e1) {
throw new IllegalArgumentException("No such algorithm MD5");
}
byte[] digest;
try {
digest = messageDigest.digest(input.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e) {
throw new IllegalStateException("UTF-8 not supported!");
}
return hexEncode(digest);
}
public static String hexEncode(byte[] bytes) {
final int nBytes = bytes.length;
char[] result = new char[2 * nBytes];
int j = 0;
for (int i = 0; i >> 4];
// Bottom 4
result[j++] = HEX[(0x0F & bytes[i])];
}
return new String(result);
}
public static String base64UrlDecode(String input) {
Base64 decoder = new Base64(true);
byte[] decodedBytes = decoder.decode(input);
return new String(decodedBytes);
}
public static void main(String[] args) {
System.out.println(md5Encode(“15356343949d8bf39f43ea56aafcc”));
}
public static String getSecureSuborderCode(String suborderCode, String mobile) {
return MD5ChecksumUtils.md5Encode(suborderCode, mobile);
}
}
I am suppose to call encypt(String) function for the result. I have tried to put this in the step as follows:
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.security.spec.KeySpec;
import javax.crypto.Cipher;
import javax.crypto.SecretKey;
import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.DESedeKeySpec;
import org.apache.commons.codec.binary.Base64;
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
public class EncryptionUtils {
public static final String ENCRYPTION_SCHEME = “DESede”;
public static final String ENCRYPTION_KEY = “SNAPDEAL ENCRYPTION KEYX”;
private static final String UNICODE_FORMAT = “UTF8″;
private static KeySpec keySpec;
private static SecretKeyFactory keyFactory;
private static Cipher cipher;
private static final char[] HEX = { ’0′, ’1′, ’2′, ’3′, ’4′, ’5′, ’6′, ’7′, ’8′, ’9′, ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’ };
static {
try {
byte[] keyAsBytes = ENCRYPTION_KEY.getBytes(UNICODE_FORMAT);
keySpec = new DESedeKeySpec(keyAsBytes);
keyFactory = SecretKeyFactory.getInstance(ENCRYPTION_SCHEME);
cipher = Cipher.getInstance(ENCRYPTION_SCHEME);
} catch (Exception e) {
e.printStackTrace();
}
}
public static String encrypt(String unencryptedString) {
if (unencryptedString == null || unencryptedString.trim().length() == 0)
return unencryptedString;
try {
SecretKey key = keyFactory.generateSecret(keySpec);
cipher.init(Cipher.ENCRYPT_MODE, key);
byte[] cleartext = unencryptedString.getBytes(UNICODE_FORMAT);
byte[] ciphertext = cipher.doFinal(cleartext);
BASE64Encoder base64encoder = new BASE64Encoder();
return base64encoder.encode(ciphertext);
} catch (Exception e) {
// Unable to encrypt
return unencryptedString;
}
}
public static String encrypt(int unencryptedInt) {
return encrypt(String.valueOf(unencryptedInt));
}
public static String decrypt(String encryptedString) {
if (encryptedString == null || encryptedString.trim().length() <= 0)
return encryptedString;
try {
SecretKey key = keyFactory.generateSecret(keySpec);
cipher.init(Cipher.DECRYPT_MODE, key);
BASE64Decoder base64decoder = new BASE64Decoder();
byte[] cleartext = base64decoder.decodeBuffer(encryptedString);
byte[] ciphertext = cipher.doFinal(cleartext);
return new String(ciphertext, UNICODE_FORMAT);
} catch (Exception e) {
// Unable to decrypt
return encryptedString;
}
}
public static String md5Encode(String input, String salt) {
return md5Encode(mergePasswordAndSalt(input, salt));
}
private static String mergePasswordAndSalt(String input, String salt) {
if (input == null) {
input = "";
}
if (StringUtils.isEmpty(salt)) {
return input;
} else {
return input + "{" + salt + "}";
}
}
public static String md5Encode(String input) {
MessageDigest messageDigest;
try {
messageDigest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e1) {
throw new IllegalArgumentException("No such algorithm MD5");
}
byte[] digest;
try {
digest = messageDigest.digest(input.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e) {
throw new IllegalStateException("UTF-8 not supported!");
}
return hexEncode(digest);
}
public static String hexEncode(byte[] bytes) {
final int nBytes = bytes.length;
char[] result = new char[2 * nBytes];
int j = 0;
for (int i = 0; i >> 4];
// Bottom 4
result[j++] = HEX[(0x0F & bytes[i])];
}
return new String(result);
}
public static String base64UrlDecode(String input) {
Base64 decoder = new Base64(true);
byte[] decodedBytes = decoder.decode(input);
return new String(decodedBytes);
}
public static void main(String[] args) {
System.out.println(md5Encode(“15356343949d8bf39f43ea56aafcc”));
}
public static String getSecureSuborderCode(String suborderCode, String mobile) {
return MD5ChecksumUtils.md5Encode(suborderCode, mobile);
}
}
private final HashMap fieldsToUpdate = new HashMap();
private int rowsLeftForGenerateMode = -1;
private int outputRowSize = 0;
private String encypt_email;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
// First, get a row from the default input hop
Object[] r = getRow();
// If the row object is null, we are done processing.
if (r == null && !first) {
setOutputDone();
return false;
}
// If the global “first” flag is true, perform some initialization that can only happen
// once we have read the first row of input data
if (first) {
first = false;
// Set up the list of fields that will be available after this step
// Normally, this is simpler, but in the HelloWorld sample, I don’t know if
// there is an input step connected or not.
if (r == null) {
rowsLeftForGenerateMode = 100;
}
outputRowSize = data.outputRowMeta.size();
// Again, an extra complicated block of code to make up for the fact that I don’t know
// how you are connecting this sample to an existing transformation.
for (int i = 0; i 0);
}
}
When I say test class, it says the ” No applicable constructor/method found for actual parameters “org.pentaho.di.core.row.ValueMetaInterface”; candidates are: “java.lang.String Processor$EncryptionUtils.encrypt(java.lang.String)”, “java.lang.String Processor$EncryptionUtils.encrypt(int)” . Please help.
Thanks,
Ramya T
Dear Ramya,
first thing that comes to mind is trying to make the encryption method static and putting a jar containing the class into kettle’s libext folder. Having done that you should then be able to simply call the encryption function from UDJC, UDJE or JavaScript steps, whatever is most convenient. If there is a problem with the code, I’d suggest trying to isolate the issue on a smaller code sample, and asking for advice on the Kettle forum and a forum specific to java development.
Best
Slawo
Hi Slawo, Your articles are precise and clear. Great job. I am new to Pentaho, and I am already a fan of yours
Couple questions on database.
1. Where is the API for org.pentaho.di.core.database? I was looking in http://javadoc.pentaho.com/kettle/ but nothing there.
2. How can I get java.sql.Connection object from the Pentaho database object? When performance is needed, I am sure this will be very useful object to optimize.
3. Are there any articles on database performance optimization, especially when dealing with voluminous tables from Pentaho point of view? For e.g., we cannot get some tables in ArrayList to lookup.
Hey Nili,
Glad to hear you like the blog. I should be writing more again
1. If you want the code, you’d best check out Kettle trunk from svn://svn.pentaho.org/svnkettleroot
2. I think there is a simple accessor there, just check out the Database class
3. I think it’s best if you play around with PDI and the code base a bit. PDI processes rows in a multithreaded pipe, so voluminous tables as such don’t pose a problem. Experiment around and you’ll get the hang of it.
Questions around PDI are best posted at http://forums.pentaho.com/forumdisplay.php?135-Pentaho-Data-Integration-Kettle
There’s a lot of knowledgable folks on the forum
Cheers
Slawo
i have a simple transformation to concat few strings and make a new string.
its working fine.
but when i m using same class in my existing transformation with 100 more fields, its thrwing exceptions.
public boolean
processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null)
{setOutputDone();return false;}
if (first)
{first = false;}
Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());
String value1 = get(Fields.In,”first”).getString(r);
String value2 = get(Fields.In, “second”).getString(r);
Long value3 = get(Fields.In, “start”).getInteger(r);
Long value4 = get(Fields.In, “stop”).getInteger(r);
String concat = value1 + “@” + value2 + “@” + value3 + “@” +value4;
get(Fields.Out, “first_C”).setValue(outputRow, concat);
get(Fields.Out, “from”).setValue(outputRow, value3);
get(Fields.Out, “to”).setValue(outputRow, value4);
get(Fields.Out, “second_C”).setValue(outputRow, concat);
putRow(data.outputRowMeta, outputRow); return true;
}
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : Unexpected error :
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : org.pentaho.di.core.exception.KettleValueException:
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : Unexpected conversion error while converting value [stop Integer] to an Integer
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : java.lang.String cannot be cast to java.lang.Long
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : org.pentaho.di.core.row.ValueMeta.getInteger(ValueMeta.java:1505)
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : org.pentaho.di.trans.steps.userdefinedjavaclass.FieldHelper.getInteger(FieldHelper.java:69)
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : Processor.processRow(Processor.java:18)
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : org.pentaho.di.trans.steps.userdefinedjavaclass.UserDefinedJavaClass.processRow(UserDefinedJavaClass.java:1182)
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
2012/03/15 11:14:55 – UDJC.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : java.lang.Thread.run(Unknown Source)
Hi Sid,
my guess would be that your big transformation with 100+ fields already has a field named “first_C”, “from”, “to” or “second_C” and the UDJC is putting stuff into that field. You should make sure your field names are unique.
Cheers
Slawo
what is this error??
2012/03/26 12:26:48 – User Defined Java Class.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : Error initializing UserDefinedJavaClass:
2012/03/26 12:26:48 – User Defined Java Class.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : org.pentaho.di.core.exception.KettleException:
2012/03/26 12:26:48 – User Defined Java Class.0 – ERROR (version 4.1.0.1, build 15865 from 2011-10-12 10.14.21 by buildguy) : null
Sid,
with no additional information it is really hard to tell. There should be more in the Kettle log or console output.
Cheers
Slawo
If you are using getRow() to get the next input row, Kettle sometimes confuses input and info steps (see http://jira.pentaho.com/browse/PDI-5115 ). Workaround is to use something like
Object[] r = getRowFrom(findInputRowSet("load data"));
Hi,
I have created simple transformation wich will give me the file name from field,
I have written step which will read the file name and use the file for loading,
I have written to following code to get the file name from previous row, but I am getting
The source step to read from [TextFileOutput] couldn’t be found.
String stepName = meta.getAcceptingStepName();
RowSet rowSet = findInputRowSet(meta.getAcceptingStepName());
When debugged I am getting empty row set.
Please help me, I am meeting my dead lines….