Building a detailed Date Dimension with Pentaho Kettle

teaser

Having rich date dimensions in a data warehouse often enables sophisticated business relevant analytical queries. This post shows a way to generate a detailed date dimension table that includes fixed date and variable date holidays, working days, special events and week of year information using the Kettle ETL tool, also known as Pentaho PDI. . . . → Read More: Building a detailed Date Dimension with Pentaho Kettle

Accessing Previous Row Values in Kettle

stock_transformation

Many people resort to JavaScript hacking when faced with the requirement to access a previous row’s value in Kettle. In most cases I find that to be unnecessary. And while there isn’t anything bad about the JavaScript step as such, it has the problem of a somewhat lower performance and it also adds complexity to the . . . → Read More: Accessing Previous Row Values in Kettle

GeoIP lookup using MaxMind’s Country Database and Kettle

geoip_screenshot

Reliable location information is a valuable asset when looking at internet traffic. Among other uses it can be utilized for fraud prevention or help in estimating foreign market potential. This article explains how you can lookup location information for an IP address using Kettle and MaxMind’s free GeoIP database.

Edit: As Daniel Einspanjer points out, there’s a . . . → Read More: GeoIP lookup using MaxMind’s Country Database and Kettle

Using Named Parameters in Kettle

settings dialog

Kettle’s named parameters often enable very elegant solutions for ETL requirements. This post gives an introduction to the named parameters feature of Kettle.

It is safe to think of named parameters as variables with their initial values assigned before the ETL process starts. In fact, they are available as variables when the ETL process executes. Their values can be specified in the launch dialog inside spoon or on the command line when running kitchen or pan. Continue reading Using Named Parameters in Kettle

Error Handling in the JavaScript Step

error_handling

If you are familiar with the error handling feature of Kettle transformation steps, you might be wondering how it can be utilized in a JavaScript step. The user interface mechanics are the same as with any other step that supports the error handling feature. It can be enabled and configured by right clicking the JavaScript step . . . → Read More: Error Handling in the JavaScript Step

Custom Database Lookups using the JavaScript Step and fireToDb

fire2db

Suppose you need to do a non-trivial lookup operation.  For arguments sake, let’s assume you need to check the database for a set of regular expressions, and you need to determine for a row field how many of those expressions match. In fact, a similar scenario has been brought up in the Pentaho forums in this post. One way to solve this is to use a JavaScript step. When initialized, it queries the database for the regular expressions. For each passing row it then loops over the expressions and counts how many match, writing the result to a row field. Sounds easy enough, but how do you query the database using JavaScript? . . . → Read More: Custom Database Lookups using the JavaScript Step and fireToDb

Accessing Kettle Internals with JavaScript

dbinfo

On rare occasions an ETL requirement comes up that is not easily implemented using the Kettle standard steps. Suppose for example that each record must be tagged with information about the database it is coming from. The source database is defined in the DB connection settings, but how to extract its current settings (i.e. type, host, port, dbname)? . . . → Read More: Accessing Kettle Internals with JavaScript

Squeezing the most out of the JavaScript Step in Pentaho Kettle

set script type

If you read the articles about using Java in Kettle and manually generating rows, you have seen constant reminders about the fact that scripted code runs slower than a compiled plugin. But if for some reason you are stuck with a scripted solution, it is good to know a few facts about the JavaScript step that . . . → Read More: Squeezing the most out of the JavaScript Step in Pentaho Kettle

Generating Rows using JavaScript in Pentaho Kettle

Generate Rows Preview

If you have read the article about using Java code in Kettle, you might be wondering whether it is possible to generate rows using the Modified JavaScript Value step in a Kettle transformation. In other words, whether you could use JavaScript to create a step, which generates rows just as the table input step and the Excel input step do.  Well, it is possible. For purposes of illustration let’s create a short example that will output all of our Java system properties as rows. Download the example transformation if you like.

First things first. The Modified JavaScript Value step is not an input step as such, and will not execute without receiving some input first. Fortunately it is easy to create a single empty row using the Generate Rows input step. Just leave all fields empty and limit the output to only one row. If we connect this to a JavaScript step, it will start executing.

Continue reading Generating Rows using JavaScript in Pentaho Kettle

Using Java in Pentaho Kettle

MD5 transformation screenshot

Sometimes it would be nice to access a Java library directly from Kettle. You might find it useful for  validation, lookup or custom cryptography support, just to give a few examples. Sometimes even basic access to data is not as straightforward as getting a file dump or a using a database connection. Some data sources might be encapsulated in an application, and the only way to get your hands on it, is using a custom Java client. This article explains how you can directly utilize your Java classes in Kettle a.k.a. PDI.

NOTE: If you’re using Kettle 4.0 or later, you also have the option to use the new User Defined Java Class step.

Continue reading Using Java in Pentaho Kettle