The User Defined Java Class Step

info_steps

Today I would like to talk about the “User Defined Java Class” a.k.a. UDJC step introduced in Kettle 4.0. This step is incredibly versatile. It allows you to put arbitrary processing code into the ETL without the penalty of a performance hit. This article shows how to use the step in different scenarios, explaining each of . . . → Read More: The User Defined Java Class Step

Writing custom output formats in Pentaho Kettle

java_output_serialization

Sometimes an ETL process needs to generate files in very specific non row based formats. This can be standard files like EDIFACT record files or maybe files using some ancient format you need to feed to a legacy system. In this post I would like to show some techniques to create those files using Pentaho Kettle . . . → Read More: Writing custom output formats in Pentaho Kettle

Developing a custom Kettle Plugin: A Simple Transformation Step

screen shot template plugin

This article shows how to develop a simple plugin which provides a custom transformation step for Kettle 4.0. The transformation step should accept any row stream and append a string field at the end, filling it with a fixed value. The user should be able to define the name of the added field. For starters, that should be enough. Keeping the step functionality at a minimum allows me to explain how the plugin interfaces with Kettle with as little distraction as possible. . . . → Read More: Developing a custom Kettle Plugin: A Simple Transformation Step

Squeezing the most out of the JavaScript Step in Pentaho Kettle

set script type

If you read the articles about using Java in Kettle and manually generating rows, you have seen constant reminders about the fact that scripted code runs slower than a compiled plugin. But if for some reason you are stuck with a scripted solution, it is good to know a few facts about the JavaScript step that . . . → Read More: Squeezing the most out of the JavaScript Step in Pentaho Kettle

Generating Rows using JavaScript in Pentaho Kettle

Generate Rows Preview

If you have read the article about using Java code in Kettle, you might be wondering whether it is possible to generate rows using the Modified JavaScript Value step in a Kettle transformation. In other words, whether you could use JavaScript to create a step, which generates rows just as the table input step and the Excel input step do.  Well, it is possible. For purposes of illustration let’s create a short example that will output all of our Java system properties as rows. Download the example transformation if you like.

First things first. The Modified JavaScript Value step is not an input step as such, and will not execute without receiving some input first. Fortunately it is easy to create a single empty row using the Generate Rows input step. Just leave all fields empty and limit the output to only one row. If we connect this to a JavaScript step, it will start executing.

Continue reading Generating Rows using JavaScript in Pentaho Kettle

Using Java in Pentaho Kettle

MD5 transformation screenshot

Sometimes it would be nice to access a Java library directly from Kettle. You might find it useful for  validation, lookup or custom cryptography support, just to give a few examples. Sometimes even basic access to data is not as straightforward as getting a file dump or a using a database connection. Some data sources might be encapsulated in an application, and the only way to get your hands on it, is using a custom Java client. This article explains how you can directly utilize your Java classes in Kettle a.k.a. PDI.

NOTE: If you’re using Kettle 4.0 or later, you also have the option to use the new User Defined Java Class step.

Continue reading Using Java in Pentaho Kettle