Performance considerations are often essential in ETL implementations, especially when the job in question is executed frequently, or a series of jobs must execute within a fixed time frame. This article focuses on harnessing the multi-threaded nature of a Kettle transformation to optimize its performance.
Suppose each step in the transformation is already set up for best . . . → Read More: Multi-Threading in Kettle Transformations
This article shows how to develop a simple plugin which provides a custom transformation step for Kettle 4.0. The transformation step should accept any row stream and append a string field at the end, filling it with a fixed value. The user should be able to define the name of the added field. For starters, that should be enough. Keeping the step functionality at a minimum allows me to explain how the plugin interfaces with Kettle with as little distraction as possible. . . . → Read More: Developing a custom Kettle Plugin: A Simple Transformation Step
For most ETL processes it is desirable to minimize their running time. A common technique to cut some time from the overall process is to identify steps that do not depend on each other and let them run in parallel threads. This post explains how to set up jobs for parallel processing in Pentaho Kettle. Download the example job file. . . . → Read More: Parallel processing in Kettle Jobs