By Slawomir Chodnicki, on October 12th, 2011
 This article introduces clustering concepts supported by Kettle a.k.a. PDI. If you need to replicate data to several physical databases, or would like to learn about scale-out options for record processing, this article may be for you. As usual, the downloads section has the demo transformations for this article. . . . → Read More: Clustering in Kettle
By Slawomir Chodnicki, on September 9th, 2011
 This article introduces partitioning concepts supported by Kettle a.k.a. PDI. If you need to partition records over several tables, or would like to learn about increasing the parallelism of your transformations, this article may be for you. . . . → Read More: Partitioning in Kettle
By Slawomir Chodnicki, on May 14th, 2011
 Regular expressions are a very useful tool for a variety of string related tasks. In Kettle they are frequently used for extraction and manipulation tasks, as well as for specifying groups of file names. This post gives an introduction to regular expressions in general as well as some applications within Kettle a.k.a. PDI. Since the built-in . . . → Read More: An Introduction to Regular Expressions
By Slawomir Chodnicki, on March 12th, 2011
 This article shows how to quickly load a simple Mondrian cube into LucidDB and how to view the cube in several OLAP front-ends. If you would like to try the column-oriented DB for your OLAP, this post may help to get up and running.
Getting LucidDB
Get your copy of LucidDB from luciddb.org and set it up . . . → Read More: Taking LucidDB for an OLAP Test-Drive
By Slawomir Chodnicki, on December 21st, 2010

Relational applications often model generic hierarchies of variable depth (tree-like structures) by maintaining a parent id that points to the immediate parent of the record. This approach is called the “Adjacency List” model. This article covers how to effectively analyze such hierarchies using a bridge table and how to create a bridge table using Kettle a.k.a. . . . → Read More: Analyzing Hierarchical Data Using Bridge Tables
By Slawomir Chodnicki, on December 2nd, 2010
 The Excel Writer plugin offers support for Excel template files that can be filled in a variety of ways using Kettle a.k.a. PDI. In this post I would like to show how to fill an Excel report template file that has pre-styled cells, formulas and charts on different sheets. The entire report is filled within a . . . → Read More: Using the Excel Writer Step
By Slawomir Chodnicki, on November 25th, 2010
 Every now and then a field pops up that has comma-separated values in it. It is often used when it seems less convenient to properly model a 1:n relationship. In most cases the field contains a set of IDs or some subset of a fixed set of values. Working with these fields can be tricky. This post shows how to access individual values of comma separated fields in Kettle a.k.a. PDI without resorting to custom parsing with JavaScript or Java code. It also shows how to effectively create those fields, should you ever need to. Continue reading Dealing with comma separated fields in Kettle
By Slawomir Chodnicki, on October 2nd, 2010

Today I would like to talk about the “User Defined Java Class” a.k.a. UDJC step introduced in Kettle 4.0. This step is incredibly versatile. It allows you to put arbitrary processing code into the ETL without the penalty of a performance hit. This article shows how to use the step in different scenarios, explaining each of . . . → Read More: The User Defined Java Class Step
By Slawomir Chodnicki, on September 1st, 2010
 Having rich date dimensions in a data warehouse often enables sophisticated business relevant analytical queries. This post shows a way to generate a detailed date dimension table that includes fixed date and variable date holidays, working days, special events and week of year information using the Kettle ETL tool, also known as Pentaho PDI. . . . → Read More: Building a detailed Date Dimension with Pentaho Kettle
By Slawomir Chodnicki, on September 1st, 2010
 When looking at time series that exhibit strong deviations, it is sometimes hard to get the general picture of the development. A practical approach is to smooth out the values by calculating a sliding average. To illustrate, I’d like to look at the example cube from the article about date dimensions, and select the sales per . . . → Read More: Using Sliding Averages in MDX
|
|