Write ETL that writes ETL – Creating Crosstabs with Kettle

write_etl

In this post I’d like to demonstrate a technique for creating dynamic Kettle transformations. What do I mean by dynamic in this case? Imagine a transformation dynamically creating or changing another transformation before it is executed. Why would anyone want to do this? In this post I’ll take up the example of crating crosstabs in Kettle, . . . → Read More: Write ETL that writes ETL – Creating Crosstabs with Kettle

Releasing the Ruby Scripting Step for Kettle

ruby_featured_image

In this post I would like to announce the release of the Ruby Scripting Step v1.0 for the community edition of Kettle 4.x. Grab it from the downloads section of the github page.

The plugin allows to easily put ruby into your ETL. It is bundled with many samples so it should be easy to get . . . → Read More: Releasing the Ruby Scripting Step for Kettle

Taking LucidDB for an OLAP Test-Drive

_lucid_db_logo_PMS_2010

This article shows how to quickly load a simple Mondrian cube into LucidDB and how to view the cube in several OLAP front-ends. If you would like to try the column-oriented DB for your OLAP, this post may help to get up and running.

Getting LucidDB

Get your copy of LucidDB from luciddb.org and set it up . . . → Read More: Taking LucidDB for an OLAP Test-Drive

Dynamic SQL Queries in PDI a.k.a. Kettle

featured

When doing ETL work every now and then the exact SQL query you want to execute depends on some input parameters determined at runtime. This requirement comes up most frequently when SELECTing data. This article shows the techniques you can employ with the “Table Input” step in PDI to make it execute dynamic or parametrized queries. . . . → Read More: Dynamic SQL Queries in PDI a.k.a. Kettle

Using an on-demand in-memory SQL database in PDI

h2_logo

Anybody who finds themselves working on a client’s environment will usually face the fact that access to databases is restricted to what’s absolutely required to get the job done. The source files and target systems will be available, but creating helper tables or databases may be completely out of the question, or it may involve overcoming . . . → Read More: Using an on-demand in-memory SQL database in PDI

Processing the void – detecting and handling empty row streams in PDI

detect_empty_stream

ETL processes sometimes need to generate data, even if there’s no input. This may be a bit puzzling at times, since an usual ETL row stream produces nothing if there’s no input. In some scenarios an ETL process is supposed to generate some sort of aggregation, which implies it should report a value of 0 even . . . → Read More: Processing the void – detecting and handling empty row streams in PDI

Tracking Transformation Progress becomes easier in PDI

transformation_progress

Tracking the progress of a transformation in PDI Spoon usually involves closely observing the numbers displayed in the step metrics tab. The step metrics tab dispays information about each step’s processed rows and input/output buffers, which makes it an important tool for understanding step performance at a glance as well as understanding the progress of a . . . → Read More: Tracking Transformation Progress becomes easier in PDI

Analyzing Hierarchical Data Using Bridge Tables

branch_select

Relational applications often model generic hierarchies of variable depth (tree-like structures) by maintaining a parent id that points to the immediate parent of the record. This approach is called the “Adjacency List” model. This article covers how to effectively analyze such hierarchies using a bridge table and how to create a bridge table using Kettle a.k.a. . . . → Read More: Analyzing Hierarchical Data Using Bridge Tables

Using the Excel Writer Step

transformation

The Excel Writer plugin offers support for Excel template files that can be filled in a variety of ways using Kettle a.k.a. PDI. In this post I would like to show how to fill an Excel report template file that has pre-styled cells, formulas and charts on different sheets. The entire report is filled within a . . . → Read More: Using the Excel Writer Step

Dealing with comma separated fields in Kettle

transformation

Every now and then a field pops up that has comma-separated values in it. It is often used when it seems less convenient to properly model a 1:n relationship. In most cases the field contains a set of IDs or some subset of a fixed set of values. Working with these fields can be tricky. This post shows how to access individual values of comma separated fields in Kettle a.k.a. PDI without resorting to custom parsing with JavaScript or Java code. It also shows how to effectively create those fields, should you ever need to. Continue reading Dealing with comma separated fields in Kettle