An EDI file may seem like a random jumble of characters at first glance. On closer examination, however, a well-thought-out schematic structure is revealed, which makes the processing of the message by computer programs possible in the first place. Behind an EDI file is a specific EDI standard that dictates how the file must be structured. Typical standard formats that can be used here are, for example, EDIFACT, XML, CSV files or flat file formats.
An EDI standard usually builds on the following four principles:
The syntax rules define the allowed characters and the allowed order in which the individual characters may be used.
Within an EDI file most of the information is accurately identified using codes – for example, currency codes, country codes, but also codes for identifying a particular date format, etc.
The message design defines the structure of a particular message type. Message types include, for example, purchase orders, delivery notes, invoices, etc.
Identification of values in an EDI message
Depending on the standard used, there are three ways in which a value can be identified in an EDI file:
a) Implicitly by the position in the message. This technique is used in flat file formats and CSV. The exact position and semantic of a given value is defined in the accompanying documentation. For example: A line beginning with the characters
100 stands for the header line. The characters from position 4 – 17 in a line beginning with
100 represent the GLN of the sender, etc.
b) Implicitly through the use of separators. Using a set of predefined separator characters, a message structure is explicitly defined. The message structure defines which building blocks are to be used and in which order the building blocks must be aligned to form a correct EDI message, e.g., an orders message. This technique is used in EDIFACT files.
c) Explicitly through the application of metadata. With the help of additional data, the meaning of the individual information fragments in a file is specified more precisely. This technique is used in XML, for example, where the actual information is enclosed by means of markup elements and attributes – e.g.:
<InvoiceRecipient id="IV4394">Recipient of invoice</InvoiceRecipient>
UN/EDIFACT – A standard of the United Nations
UN/EDIFACT is the abbreviation for United Nations Electronic Data Interchange for Administration, Commerce, and Transport. The standardization organization behind UN/EDIFACT is UN/CEFACT (United Nations Center for Trade Facilitation and Electronic Business). UN/EDIFACT is one of the most widely used EDI standards today alongside ANSI ASC X12. ANSI ASC X12 is very common in the North American market, whereas in Europe UN/EDIFACT prevails (or one of the various subdialects).
As the following figure shows, the UN/EDIFACT standard is based on four different pillars.
The syntax defines the exact rules for message building, as well as the characters used to separate individual message segments and elements.
A data element is the smallest unit in an EDIFACT file.
Groups of similar data elements form so-called segments.
Messages represent an ordered sequence of segments – for example, a DESADV file represents a delivery note.
Additionally, the EDIFACT standard defines message delivery requirements. For example, the exact structure of a specific message exchange, which in turn may contain several EDIFACT files.
The exact structure of an EDIFACT message is defined in official standard documents, which are also available online. UN/CEFACT approves two EDIFACT standard versions per year, each marked with the year followed by “A” (for the first release of a year) and “B” (for the second release of a year).
D01B is therefore the second standard version from the year 2001.
In addition, separate subsets of the official UN/EDIFACT standard exist for the various industry sectors and domains. In the consumer goods sector the EANCOM subset is very common, which is also used for the example of this article. EANCOM is the world’s most widely used standard for electronic data interchange. The most frequently used EANCOM message types are ORDERS, DESADV and INVOIC.
Structure of an EDIFACT file
An EDIFACT file follows an exact hierarchy, which is denoted below.
The topmost unit of an EDIFACT message is the Interchange (UNB), which can be thought of as an envelope. The interchange defines the message recipient, the message sender, the message number, the message date, etc.
An interchange can in turn contain several individual Groups (UNG) representing message groups. Alternatively, an interchange can also contain individual messages (concrete messages). Mixing of individual messages and message groups within an interchange is not allowed.
A message itself is enclosed by a header (UNH) and a trailer segment (UNT). A message group is surrounded by an UNG and UNE segment.
Within a message, there are several segments and segment groups, which represent individual related message parts (for example, information about the biller, a specific invoice line, etc.). A segment group is initiated by a so-called trigger element.
Segments consist of data elements and composite data elements.
The smallest unit of an EDIFACT file are simple data elements.
Simple Data Elements
Simple data elements form the basic building blocks of an EDIFACT file and represent – as the name already suggests – simple data values.
An example for a simple data element is a party name.
Description: Name of a party.
The abbreviation an..35 means, that a maximum of 35 alphanumeric characters may be used for the party name.
Simple data element with code list
For a simple data element with code list no free text can be used, but a list of predefined values (= codes) must be used.
An example for a simple data element with code list is a coded document name.
Description: Code specifying the document name.
1 Certificate of analysis Certificate providing the values of an analysis. 2 Certificate of conformity Certificate certifying the conformity to predefined definitions. 3 Certificate of quality Certificate certifying the quality of goods, services etc. 4 Test report Report providing the results of a test session. …
Composite Data Element
A composite data element consists of individual simple data elements and represents data with additional metadata (= additional data describing the actual data).
The individual components within a composite data element (typically simple data elements and simple data elements with code lists) are separated using the
An example for a composite data element is a duty/tax or fee type.
|5153||Duty or tax or fee type name code||C||an..3|
|1131||Code list identification code||C||n..17|
|3055||Code list responsible agency code||C||an..3|
|5152||Duty or tax or fee type name||C||an..35|
The following example shows an exemplary duty/tax/fee type.
AAA:52:1:tax type xyz
AAA = Petroleum tax
52 = Value added tax identification
1 = CCC (Customs Co-operation Council)
tax type xyz = Free text description of tax
Segments consist of simple data elements and composite data elements and represent compound data, such as an address.
The individual data elements in a segment are separated by the
+ character. A segment starts with the three-digit segment identifier and ends with the
The exact structure of a segment is described by means of so-called segment tables. The following segment table describes the structure of the LIN (line item) segment segment .
010 1082 LINE ITEM IDENTIFIER C 1 an..6 020 1229 ACTION REQUEST/NOTIFICATION DESCRIPTION CODE C 1 an..3 030 C212 ITEM NUMBER IDENTIFICATION C 1 7140 Item identifier C an..35 7143 Item type identification code C an..3 1131 Code list identification code C an..17 3055 Code list responsible agency code C an..3 040 C829 SUB-LINE INFORMATION C 1 5495 Sub-line indicator code C an..3 1082 Line item identifier C an..6 050 1222 CONFIGURATION LEVEL NUMBER C 1 n..2 060 7083 CONFIGURATION OPERATION CODE C 1 an..3
M indicates “mandatory”, i.e., the simple data element or composite data element must be specified.
C stands for “conditional” and means that the data element can optionally be specified. The values
an..17, etc. represent the number of permitted alphanumeric characters.
Assume we want to represent the following line item information using the LIN segment.
Line item number: 2
Type: article has been added
The resulting EDIFACT segment is:
Segment groups are used to aggregate several individual segments into groups of related segments.
For example, the following segment group allows to specify contact details by combining the segments CTA (Contact information) and COM (Communication contact).
0220 ----- Segment group 5 ------------------ C 5----------+| 0230 CTA Contact information M 1 || 0240 COM Communication contact C 5----------++
Some possible segment sequences would be for example:
CTA-CTA-CTA-COM-COM-CTA-COM CTA CTA-COM-CTA-CTA ...
As indicated by
C 5, the segment group itself is optional and may occur up to 5 times. The segment group is initiated by a so-called trigger segment. It is the first element within the segment group that usually has cardinality
M 1 (that is, it must occur exactly once).
A message represents a related sequence of segments and represents a concrete business document – for example, an DESADV message (dispatch advice). The following section shows the first part of a DESADV definition.
0010 UNH Message header M 1 0020 BGM Beginning of message M 1 0030 DTM Date/time/period C 10 0040 ALI Additional information C 5 0050 MEA Measurements C 5 0060 MOA Monetary amount C 5 0070 CUX Currencies C 9 0080 ----- Segment group 1 ------------------ C 10----------+ 0090 RFF Reference M 1 | 0100 DTM Date/time/period C 1-----------+ 0110 ----- Segment group 2 ------------------ C 99----------+ 0120 NAD Name and address M 1 | 0130 LOC Place/location identification C 10 | | 0140 ----- Segment group 3 ------------------ C 10---------+| 0150 RFF Reference M 1 || 0160 DTM Date/time/period C 1----------+| | 0170 ----- Segment group 4 ------------------ C 10---------+| 0180 CTA Contact information M 1 || 0190 COM Communication contact C 5----------++ 0200 ----- Segment group 5 ------------------ C 10----------+ 0210 TOD Terms of delivery or transport M 1 | 0220 LOC Place/location identification C 5 | 0230 FTX Free text C 5-----------+ ...
EDIFACT sample dispatch advice
In the following EDIFACT dispatch advice example we will represent the delivery structure shown in the figure below.
Based on the previous concepts, the example now shows a concrete EDIFACT message for a dispatch advice. To increase readability, line breaks have been added after each segment. In a regular EDIFACT file no line breaks shall be used.
UNA:+.? ' UNB+UNOA:3+8773456789012:14+9123456789012:14+140218:1552+MSGNR4711++++++1' UNH+1+DESADV:D:96A:UN:EAN005' BGM+351+DOCNR4712+9' DTM+137:20180218:102' DTM+2:20180220:102' NAD+SU+9983083940382::9' NAD+BY+5332357469542::9' NAD+DP+3839204835454::9' CPS+1' PAC+1++PK' PCI+33E' GIN+BJ+342603046212321014' CPS+2+1' PAC+11++CT' PCI+33E' GIN+BJ+342603046212341547' LIN+1++4260304623843:EN' QTY+12:110:PCE' RFF+ON:8493848394:1' CPS+3+1' PAC+22++CT' PCI+33E' GIN+BJ+342603046212378547' LIN+2++4260304622123:EN' QTY+12:330:PCE' RFF+ON:8493848394:2' CPS+4+1' PAC+45++CT' PCI+33E' GIN+BJ+342603046212332145' LIN+3++4260304624412:EN' QTY+12:450:PCE' RFF+ON:8493848394:3' CNT+2:3' UNT+34+1' UNZ+1+MSGNR4711'
In the following, we examine the structure of the individual segments in detail.
The UNA segment stands for the “Service String Advice” and describes the separators used in the message. Usually, the following separators are used (syntax version 3).
: Composite element delimiter
+ Data element delimiter
. Character reserved for the decimal comma
? release character (escape character)
' Segment delimiter
The UNB segment represents the interchange header, and contains information about the message sender, the message recipient, the message date, and so on.
UNOA stands, for example, for the character set used. Examples for permitted character sets are:
UNOA = UN/ECE level A; complies with ISO 646 – also called International Alphabet No. 5 – except lowercase letters.
UNOB = UN/ECE level B; like UNOA but also lowercase letters.
UNOC = UN/ECE level C; complies with ISO8859-1
UNOD = UN/ECE level D; complies with ISO8859-2
UNOE = UN/ECE level E (Cyrillic)
UNOF = UN/ECE level F (Greek)
3 identifies the UN/EDIFACT syntax version. There are four different UN/EDIFACT syntax versions. Nowadays mostly syntax version 3 and 4 are used.
8773456789012:14 corresponds to the sender of the message.
9123456789012:14 corresponds to the recipient of the message.
14 indicates that the number is a GLN.
140218:1552 stands for February 18, 2014, 15:52.
MSGNR4711 is the unique number of the interchange. It is used in particular in the context of message routing for the unique identification of a message transmission.
1 indicates that the test indicator is set and that the given interchange is a test message. This information is also important for message routing, because the recipient can distinguish production messages from test messages and they can be processed accordingly different.
The UNH segment represents the header of a document. The number
1 is the unique number of the document within the interchange. The number is assigned by the sender.
DESADV:D:96A:UN indicates that the document is a dispatch advice and that the document type is from UN/EDIFACT Directory D96A.
EAN005 indicates that it is an EANCOM document type and identifies the EANCOM version of the D96A EANCOM standard used.
The BGM (beginning of message) segment initiates the actual document.
351 corresponds to the concrete document subtype. Examples for permitted values are:
351 = Despatch advice
35E = Returns advice (EAN Code)
YA5 = Cross dock despatch advice (EAN Code)
DOCNR4712 is the unique number of the document given by the sender.
The DTM segment is used to specify date and time information.
The first part of this composite data element identifies the type of the date (date/time/period qualifier). For example:
137 = Document/message date/time. Date/time when a document/message has been issued.
2 = Delivery date/time, requested date on which buyer requests goods to be delivered
The second part represents the actual date value:
20180218 for example, February 18, 2018.
The third part specifies the pattern for the date (date/time/period format qualifier).
102 corresponds to CCYYMMDD
NAD+SU+9983083940382::9' NAD+BY+5332357469542::9' NAD+DP+3839204835454::9'
The NAD segment is used to indicate the names and addresses of the companies involved. Instead of names and addresses, however, in most cases the unique identification by means of numbers, such as the GLN (global location number), is used.
The first part of the composite data element identifies the concrete type of business partner (party qualifier). Examples for permitted values are:
BY = Buyer
DP = Delivery party
SU = Supplier
WH = Warehouse keeper
The second part represents the 13-digit GLN and
9 indicates, that the provided number is a GLN number.
A dispatch advice is structured using the concept of consignment packing sequences. Thereby, a CPS represents a specific layer in the hierarchy of a shipment, e.g., a pallet, a box, a carton, etc.
1 indicates the hierarchy level, which in this case is level
1. Further CPS sequences may then refer to the upper layer, using the second digit. For example:
indicates consignment packing sequence
2. The parent layer is consignment packing sequence
The PAC segment is used to specify the number and the type of packages. In the example above
1 packaging of type
PK (package) is denoted. Other packaging types are for instance:
09 = Returnable pallet (EAN Code)
201 = Pallet ISO 1 – 1/1 EURO Pallet (EAN Code)
PK = Package
SL = Slipsheet
The PCI segment is used to specify markings, which are used to uniquely identify the packages in the shipment. In retail mostly SSCC (serial shipping container codes) are used.
33E indicates: marked with serial shipping container code.
GIN stands for goods identify number and is used to specify the code, which is attached to the packaging.
The LIN segment represents a line item position in a dispatch advice. Thereby,
1 is the line item number assigned by the sender.
The following number
4260304623843 represents the item number. The third part
EN indicates the type of number – in this case a GTIN (global trade item number) was used.
The quantity segment is used for the definition of shipping quantities.
The first part of the composite data element indicates the type of quantity.
12 = Despatch quantity
21 = Ordered quantity
59 = Numbers of consumer units in the traded unit
The second part is used to provide the actual quantity – in this case
The third part indicates the unit of measure in which the quantity is provided. Possible values are for instance:
PCE = Piece
KGM = Kilogram
PND = Pound
The RFF segment is used to provide reference numbers.
ON indicates a reference to an order message.
8493848394 is the referenced order number and
1 is the position number in the referenced order message.
The CNT segment is used to specify control values that can be used to check the integrity of the message upon receipt.
In the example above
2 indicates, that the following value represents the number of line items in the message. The control value
3 for the number of line items is correct, since the message actually contains three line item element.
The UNT segment represents the message trailer.
The first number
34 indicates the number of segments in the message from the UNH to the UNT segment and is thus also a check digit. The number
1 must be the same message number as used in the UNH segment. This also serves to check the integrity of the EDI message.
The UNZ segment represents the interchange trailer and is the last segment in an EDI interchange.
The first number
1 represents the number of messages contained in the interchange. The second entry
MSGNR4711 is the same interchange number as in the UNB segment and also serves to check the integrity of the message.
Although incomprehensible at first glance, a closer look at an EDIFACT message shows the information hidden inside. In contrast to markup-based approaches such as XML, the size of an EDIFACT message is very small, since only coded information is transmitted and no space-intensive markup is used.
Although one might think that space does not matter with the storage capacities and Internet bandwidths available today, EDI message size is of major relevance. For example, Deutsche Telekom still calculates X.400 traffic on a kilobyte basis.
EDIFACT and its subsets are the most widely used EDI data exchange formats of companies worldwide. As a supplier to large companies or as a buyer of large companies, one therefore often has no choice but to use EDIFACT.