Dealing with Proxy Problems in ETL Processes

Screenshot ETL behind Proxy

Sometimes your ETL processes need to access systems external to your network. Suppose your ETL process needs to download a ZIP file from a business partner over the internet using SFTP. If your company has a proxy infrastructure which is prohibiting direct access to the internet, you might run into trouble with ETL tools that do not support proxies. Pentaho Kettle 3.2.0 does not support a generic proxy configuration in its SFTP transfer job entry, for example. Depending on the platform you are using you might be able to use a generic proxy transparently (a SOCKS5 proxy for example). For Java based tools you might specify the proxy settings on the command line using -DproxySet=true -DproxyHost=xxxx -DproxyPort=xxxx. But sometimes all of these techniques will not work. When looking for workarounds you might consider using the classic unix tool expect in combination with the tsocks library to do the download. Both are available as standard packages for linux/unix systems including mac osx. For unix derivates you are likely to find these tools available in your package manager, for mac osx you can use macports to install them.

Continue reading Dealing with Proxy Problems in ETL Processes