Edi2Xml Plugin

Interpreting edifact text data can be a daunting task. The official standard is a moving target as new versions appear every 6 months. In addition to that the standard allows for for quite a few degrees of freedom. Field definitions allow for the content or data formats to be “agreed upon by the participants of the data exchange”. This essentially means that everybody can roll their own, and that it is hard to come up with a generic interpretation of fields encoded in edifact messages.

This article introduces the Edi2Xml PDI plugin, which parses edifact text and converts it to an XML structure, which is then more accessible and easily processed by the GetDataFromXML step using XPath. Get it from the download section.

Note: the plugin has been accepted into the PDI trunk code branch and will be available as a standard step in PDI 4.4. It is already available in Trunk CI Builds

A Sample Problem

The following piece of edifact data should serve as a sample. It shows an edifact message, encoding an order with four orderline items:

UNA:+.? '
  UNB+UNOC:4+STYLUSSTUDIO:1+DATADIRECT:1+20051107:1159+6002'
  UNH+SSDD1+ORDERS:D:03B:UN:EAN008'
  BGM+220+BKOD99+9'
  DTM+137:20051107:102'
  NAD+BY+5412345000176::9'
  NAD+SU+4012345000094::9'
  LIN+1+1+0764569104:IB'
  QTY+1:25'
  FTX+AFM+1++XPath 2.0 Programmer?'s Reference'
  LIN+2+1+0596520123:IB'
  QTY+1:6'
  FTX+AFM+1++Version Control with Git'
  LIN+3+1+1884777716:IB'
  QTY+1:16'
  FTX+AFM+1++Server-Based Java Programming'
  LIN+4+1+0596006756:IB'
  QTY+1:10'
  FTX+AFM+1++Enterprise Service Bus'
  UNS+S'
  CNT+2:4'
  UNT+22+SSDD1'
  UNZ+1+6002'

I am assuming a certain familiarity with edifact messages. This particular message works from the 03b standard documented here.

I’d jump directly to showing the XML the Edi2Xml step will convert this to:

<?xml version="1.0" encoding="UTF-8"?>
<edifact>
	<UNB>
		<element>
			<value>UNOC</value>
			<value>4</value>
		</element>
		<element>
			<value>STYLUSSTUDIO</value>
			<value>1</value>
		</element>
		<element>
			<value>DATADIRECT</value>
			<value>1</value>
		</element>
		<element>
			<value>20051107</value>
			<value>1159</value>
		</element>
		<element>
			<value>6002</value>
		</element>
	</UNB>
	<UNH>
		<element>
			<value>SSDD1</value>
		</element>
		<element>
			<value>ORDERS</value>
			<value>D</value>
			<value>03B</value>
			<value>UN</value>
			<value>EAN008</value>
		</element>
	</UNH>
	<BGM>
		<element>
			<value>220</value>
		</element>
		<element>
			<value>BKOD99</value>
		</element>
		<element>
			<value>9</value>
		</element>
	</BGM>
	<DTM>
		<element>
			<value>137</value>
			<value>20051107</value>
			<value>102</value>
		</element>
	</DTM>
	<NAD>
		<element>
			<value>BY</value>
		</element>
		<element>
			<value>5412345000176</value>
			<value></value>
			<value>9</value>
		</element>
	</NAD>
	<NAD>
		<element>
			<value>SU</value>
		</element>
		<element>
			<value>4012345000094</value>
			<value></value>
			<value>9</value>
		</element>
	</NAD>
	<LIN>
		<element>
			<value>1</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value>0764569104</value>
			<value>IB</value>
		</element>
	</LIN>
	<QTY>
		<element>
			<value>1</value>
			<value>25</value>
		</element>
	</QTY>
	<FTX>
		<element>
			<value>AFM</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value></value>
		</element>
		<element>
			<value>XPath 2.0 Programmer&apos;s Reference</value>
		</element>
	</FTX>
	<LIN>
		<element>
			<value>2</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value>0596520123</value>
			<value>IB</value>
		</element>
	</LIN>
	<QTY>
		<element>
			<value>1</value>
			<value>6</value>
		</element>
	</QTY>
	<FTX>
		<element>
			<value>AFM</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value></value>
		</element>
		<element>
			<value>Version Control with Git</value>
		</element>
	</FTX>
	<LIN>
		<element>
			<value>3</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value>1884777716</value>
			<value>IB</value>
		</element>
	</LIN>
	<QTY>
		<element>
			<value>1</value>
			<value>16</value>
		</element>
	</QTY>
	<FTX>
		<element>
			<value>AFM</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value></value>
		</element>
		<element>
			<value>Server-Based Java Programming</value>
		</element>
	</FTX>
	<LIN>
		<element>
			<value>4</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value>0596006756</value>
			<value>IB</value>
		</element>
	</LIN>
	<QTY>
		<element>
			<value>1</value>
			<value>10</value>
		</element>
	</QTY>
	<FTX>
		<element>
			<value>AFM</value>
		</element>
		<element>
			<value>1</value>
		</element>
		<element>
			<value></value>
		</element>
		<element>
			<value>Enterprise Service Bus</value>
		</element>
	</FTX>
	<UNS>
		<element>
			<value>S</value>
		</element>
	</UNS>
	<CNT>
		<element>
			<value>2</value>
			<value>4</value>
		</element>
	</CNT>
	<UNT>
		<element>
			<value>22</value>
		</element>
		<element>
			<value>SSDD1</value>
		</element>
	</UNT>
	<UNZ>
		<element>
			<value>1</value>
		</element>
		<element>
			<value>6002</value>
		</element>
	</UNZ>
</edifact>

Please note how every edifact segment gets its own tag, each field within a segment gets an element tag and each value in a field gets a value tag. Multivalued fields get multiple value tags.

Now that the heavy-lifting of parsing edifact has been done, the GetDataFromXML step can be used to extract information about that order. The following configuration on the GetDataFromXML step extracts a row per order line item:

The XPath expressions have to reflect the flat structure inherent to the original message. Using the token feature makes it easy to match segments belonging together. The ITEMNUMBER field in the above example is used to do that.

The following shows a complete sample transformation extracting the data:

Note: to simplify the XPath expressions for multi-message edifact transmissions, it may be an advantage to split the multi-message string into multiple rows on message boundaries before converting them to XML.

Limitations: Currently the plugin only supports the default UNA settings, namely UNA:+.? ‘ or UNA:+,? ‘. If the UNA header is missing UNA:+.? ‘ is assumed.


Download

Get the plugin for PDI 4.x here. To install, extract the zip file and copy the Edi2Xml folder to your data-integration/plugins/steps directory. Please check the samples folder in the zip file. It contains the demo transformation used in this article.

Cheers
Slawo