By Slawomir Chodnicki, on October 13th, 2010
 Whenever you’re working on an ETL project and the responsibilities of each job, transformation, and execution step are becoming clearly defined, it might be a good idea to write down some documentation for the ETL solution. If that sounds like something that may be useful (or even required by your client or boss) but you just . . . → Read More: Documentation for Kettle ETL
By Slawomir Chodnicki, on September 1st, 2010
 Having rich date dimensions in a data warehouse often enables sophisticated business relevant analytical queries. This post shows a way to generate a detailed date dimension table that includes fixed date and variable date holidays, working days, special events and week of year information using the Kettle ETL tool, also known as Pentaho PDI. . . . → Read More: Building a detailed Date Dimension with Pentaho Kettle
By Slawomir Chodnicki, on August 31st, 2010
 Many people resort to JavaScript hacking when faced with the requirement to access a previous row’s value in Kettle. In most cases I find that to be unnecessary. And while there isn’t anything bad about the JavaScript step as such, it has the problem of a somewhat lower performance and it also adds complexity to the . . . → Read More: Accessing Previous Row Values in Kettle
By Slawomir Chodnicki, on August 26th, 2010
 When doing ETL work, sometimes you get to work with data inputs with little or no consistency guarantees. If you choose to do the data validation in Kettle, there are a few options. Among other things you may choose to verify data using the validator step, flag the rows and fields based on some calculation and . . . → Read More: Data Validation and Monitoring with Pentaho Kettle
By Slawomir Chodnicki, on July 27th, 2010
 Date dimensions are among the most important dimensions of many Mondrian cubes. The usefulness of a cube often depends on the way the date dimension has been modeled. This post shows how to create a basic date dimension and how it can be augmented with properties to suit specific analysis needs. If at some point you . . . → Read More: A Simple Date Dimension for Mondrian Cubes
By Slawomir Chodnicki, on July 21st, 2010
 Reliable location information is a valuable asset when looking at internet traffic. Among other uses it can be utilized for fraud prevention or help in estimating foreign market potential. This article explains how you can lookup location information for an IP address using Kettle and MaxMind’s free GeoIP database.
Edit: As Daniel Einspanjer points out, there’s a . . . → Read More: GeoIP lookup using MaxMind’s Country Database and Kettle
By Slawomir Chodnicki, on July 14th, 2010
 Kettle processes sometimes need to upload files to remote machines. Uploading is usually not much of an issue, since Kettle provides several steps to upload files using different transmission protocols. The upload steps have their limitations however when trying to upload an entire folder structure. None of the built in steps accepts a directory that it . . . → Read More: Bulk Uploads with Kettle
By Slawomir Chodnicki, on July 9th, 2010
 Recently a question about sub-transformations appeared on the Kettle forum, so I thought I’ll honor the occasion and write a small tutorial on how to use those. They are a nice feature for reusing whole transformations, so if you find yourself copying and pasting the same steps into multiple transformations, mappings a.k.a. sub-transformations might be a . . . → Read More: Sub-Transformations a.k.a Mappings
By Slawomir Chodnicki, on July 7th, 2010

The previous posts on Kettle plugin development focus on transformation steps. It is also possible to extend Kettle with custom job entries. This post introduces a plugin that provides a job entry which can trigger a report on JasperServer 3.7 Community Edition. Scheduling reports can be a tricky thing. If you keep your reports on JasperServer, . . . → Read More: Developing a Custom Kettle Plugin: Triggering a Report on JasperServer
By Slawomir Chodnicki, on July 3rd, 2010
 Sometimes a Kettle job needs to be executed on a tight schedule, every few minutes for example. Occasionally it is undesirable to have multiple instances of the job run in parallel. This might happen in case a run takes longer than usual, and the subsequent run starts before the current one finishes. This post shows a . . . → Read More: Prevent running multiple instances of a Kettle Job
|
|