EDI standards overview - structure of an EDIFACT file
Posted by

Anyone who deals with electronic data interchange (EDI) today will sooner or later have to deal with EDIFACT. Although there has been a rapid increase in XML-based business document standards since the introduction of XML in the mid-1990s, EDIFACT remains the workhorse of electronic data interchange. This article will analyse the schematic structure of an EDIFACT file.

EDI standards

An EDI file may seem like a random jumble of characters at first glance. On closer examination, however, a well-thought-out schematic structure is revealed, which makes the processing of the message by computer programs possible in the first place. Behind an EDI file is a specific EDI standard that dictates how the file must be structured. Typical standard formats that can be used here are, for example, EDIFACT, XML, CSV files or flat file formats.

An EDI standard usually builds on the following four principles:

Syntax rules

The syntax rules define the allowed characters and the allowed order in which the individual characters may be used.

Codes

Within an EDI file most of the information is accurately identified using codes - for example, currency codes, country codes, but also codes for identifying a particular date format, etc.

Message design

The message design defines the structure of a particular message type. Message types include, for example, purchase orders, delivery notes, invoices, etc.

Identification of values ​​in an EDI message

Depending on the standard used, there are three ways in which a value can be identified in an EDI file:

a) Implicitly by the position in the message. This technique is used in flat file formats and CSV. The exact position and semantic of a given value is defined in the accompanying documentation. For example: A line beginning with the characters 100 stands for the header line. The characters from position 4 - 17 in a line beginning with 100 represent the GLN of the sender, etc.

b) Implicitly through the use of separators. Using a set of predefined separator characters, a message structure is explicitly defined. The message structure defines which building blocks are to be used and in which order the building blocks must be aligned to form a correct EDI message, e.g., an orders message. This technique is used in EDIFACT files.

c) Explicitly through the application of metadata. With the help of additional data, the meaning of the individual information fragments in a file is specified more precisely. This technique is used in XML, for example, where the actual information is enclosed by means of markup elements and attributes - e.g.:

  <InvoiceRecipient id="IV4394">Recipient of invoice</InvoiceRecipient>

UN/EDIFACT - A standard of the United Nations

UN/EDIFACT is the abbreviation for United Nations Electronic Data Interchange for Administration, Commerce, and Transport. The standardization organization behind UN/EDIFACT is UN/CEFACT (United Nations Center for Trade Facilitation and Electronic Business). UN/EDIFACT is one of the most widely used EDI standards today alongside ANSI ASC X12. ANSI ASC X12 is very common in the North American market, whereas in Europe UN/EDIFACT prevails (or one of the various subdialects).

As the following figure shows, the UN/EDIFACT standard is based on four different pillars.

Pillars of an EDI standard

Pillars of an EDI standard

Syntax

The syntax defines the exact rules for message building, as well as the characters used to separate individual message segments and elements.

Data elements

A data element is the smallest unit in an EDIFACT file.

Segments

Groups of similar data elements form so-called segments.

Messages

Messages represent an ordered sequence of segments - for example, a DESADV file represents a delivery note. Additionally, the EDIFACT standard defines message delivery requirements. For example, the exact structure of a specific message exchange, which in turn may contain several EDIFACT files.

EDIFACT standards

The exact structure of an EDIFACT message is defined in official standard documents, which are also available online. UN/CEFACT approves two EDIFACT standard versions per year, each marked with the year followed by “A” (for the first release of a year) and “B” (for the second release of a year). D01B is therefore the second standard version from the year 2001.

In addition, separate subsets of the official UN/EDIFACT standard exist for the various industry sectors and domains. In the consumer goods sector the EANCOM subset is very common, which is also used for the example of this article. EANCOM is the world’s most widely used standard for electronic data interchange. The most frequently used EANCOM message types are ORDERS, DESADV and INVOIC.

Structure of an EDIFACT file

An EDIFACT file follows an exact hierarchy, which is denoted below.

Structure of an EDIFACT file

Structure of an EDIFACT file

The topmost unit of an EDIFACT message is the Interchange (UNB), which can be thought of as an envelope. The interchange defines the message recipient, the message sender, the message number, the message date, etc.

An interchange can in turn contain several individual Groups (UNG) representing message groups. Alternatively, an interchange can also contain individual messages (concrete messages). Mixing of individual messages and message groups within an interchange is not allowed.

A message itself is enclosed by a header (UNH) and a trailer segment (UNT). A message group is surrounded by an UNG and UNE segment.

Within a message, there are several segments and segment groups, which represent individual related message parts (for example, information about the biller, a specific invoice line, etc.). A segment group is initiated by a so-called trigger element.

Segments consist of data elements and composite data elements.

The smallest unit of an EDIFACT file are simple data elements.

Simple Data Elements

Simple data elements form the basic building blocks of an EDIFACT file and represent - as the name already suggests - simple data values.

An example for a simple data element is a party name.

3036 - Party Name

Description: Name of a party.

Representation: an..35

Example:

 John Doe

The abbreviation an..35 means, that a maximum of 35 alphanumeric characters may be used for the party name.

Simple data element with code list

For a simple data element with code list no free text can be used, but a list of predefined values ​​(= codes) must be used.

An example for a simple data element with code list is a coded document name.

1001 Document name code

Description: Code specifying the document name.

Representation: an..3

Example:

1     Certificate of analysis
         Certificate providing the values of an analysis.

2     Certificate of conformity
         Certificate certifying the conformity to predefined
         definitions.

3     Certificate of quality
         Certificate certifying the quality of goods, services
         etc.

4     Test report
         Report providing the results of a test session.
…

Composite Data Element

A composite data element consists of individual simple data elements and represents data with additional metadata (= additional data describing the actual data).

The individual components within a composite data element (typically simple data elements and simple data elements with code lists) are separated using the : character.

An example for a composite data element is a duty/tax or fee type.

C241 - DUTY/TAX/FEE Type

Structure:

5153 Duty or tax or fee type name code C an..3
1131 Code list identification code C n..17
3055 Code list responsible agency code C an..3
5152 Duty or tax or fee type name C an..35

Example:

The following example shows an exemplary duty/tax/fee type.

AAA:52:1:tax type xyz

AAA = Petroleum tax
52 = Value added tax identification
1 = CCC (Customs Co-operation Council)
tax type xyz = Free text description of tax

Segments

Segments consist of simple data elements and composite data elements and represent compound data, such as an address.

The individual data elements in a segment are separated by the + character. A segment starts with the three-digit segment identifier and ends with the ' character.

The exact structure of a segment is described by means of so-called segment tables. The following segment table describes the structure of the LIN (line item) segment segment .

  010    1082 LINE ITEM IDENTIFIER                       C    1 an..6

  020    1229 ACTION REQUEST/NOTIFICATION DESCRIPTION
              CODE                                       C    1 an..3

  030    C212 ITEM NUMBER IDENTIFICATION                 C    1
         7140  Item identifier                           C      an..35
         7143  Item type identification code             C      an..3
         1131  Code list identification code             C      an..17
         3055  Code list responsible agency code         C      an..3

  040    C829 SUB-LINE INFORMATION                       C    1
         5495  Sub-line indicator code                   C      an..3
         1082  Line item identifier                      C      an..6

  050    1222 CONFIGURATION LEVEL NUMBER                 C    1 n..2

  060    7083 CONFIGURATION OPERATION CODE               C    1 an..3

M indicates “mandatory”, i.e., the simple data element or composite data element must be specified. C stands for “conditional” and means that the data element can optionally be specified. The values ​an..3, an..17, etc. represent the number of permitted alphanumeric characters.

Assume we want to represent the following line item information using the LIN segment.

Pineapples
Line item number: 2
GTIN: 9393398439325
Type: article has been added

The resulting EDIFACT segment is:

 LIN+2+1+9393398439325:EN'

Segment groups

Segment groups are used to aggregate several individual segments into groups of related segments.

For example, the following segment group allows to specify contact details by combining the segments CTA (Contact information) and COM (Communication contact).

  0220       ----- Segment group 5  ------------------ C   5----------+|
  0230   CTA Contact information                       M   1          ||
  0240   COM Communication contact                     C   5----------++

Some possible segment sequences would be for example:

 CTA-CTA-CTA-COM-COM-CTA-COM
 CTA
 CTA-COM-CTA-CTA
 ...

As indicated by C 5, the segment group itself is optional and may occur up to 5 times. The segment group is initiated by a so-called trigger segment. It is the first element within the segment group that usually has cardinality M 1 (that is, it must occur exactly once).

Messages

A message represents a related sequence of segments and represents a concrete business document - for example, an DESADV message (dispatch advice). The following section shows the first part of a DESADV definition.

  0010   UNH Message header                            M   1
  0020   BGM Beginning of message                      M   1
  0030   DTM Date/time/period                          C   10
  0040   ALI Additional information                    C   5
  0050   MEA Measurements                              C   5
  0060   MOA Monetary amount                           C   5
  0070   CUX Currencies                                C   9

  0080       ----- Segment group 1  ------------------ C   10----------+
  0090   RFF Reference                                 M   1           |
  0100   DTM Date/time/period                          C   1-----------+

  0110       ----- Segment group 2  ------------------ C   99----------+
  0120   NAD Name and address                          M   1           |
  0130   LOC Place/location identification             C   10          |
                                                                       |
  0140       ----- Segment group 3  ------------------ C   10---------+|
  0150   RFF Reference                                 M   1          ||
  0160   DTM Date/time/period                          C   1----------+|
                                                                       |
  0170       ----- Segment group 4  ------------------ C   10---------+|
  0180   CTA Contact information                       M   1          ||
  0190   COM Communication contact                     C   5----------++

  0200       ----- Segment group 5  ------------------ C   10----------+
  0210   TOD Terms of delivery or transport            M   1           |
  0220   LOC Place/location identification             C   5           |
  0230   FTX Free text                                 C   5-----------+
...

EDIFACT sample dispatch advice

In the following EDIFACT dispatch advice example we will represent the delivery structure shown in the figure below.

Sample delivery structure

Sample delivery structure

Based on the previous concepts, the example now shows a concrete EDIFACT message for a dispatch advice. To increase readability, line breaks have added after each segment. In a regular EDIFACT file no line breaks shall be used.

UNA:+.? '
UNB+UNOA:3+8773456789012:14+9123456789012:14+140218:1552+MSGNR4711++++++1'
UNH+1+DESADV:D:96A:UN:EAN005'
BGM+351+DOCNR4712+9'
DTM+137:20180218:102'
DTM+2:20180220:102'
NAD+SU+9983083940382::9'
NAD+BY+5332357469542::9'
NAD+DP+3839204835454::9'
CPS+1'
PAC+1++PK'
PCI+33E'
GIN+BJ+342603046212321014'
CPS+2+1'
PAC+11++CT'
PCI+33E'
GIN+BJ+342603046212341547'
LIN+1++4260304623843:EN'
QTY+12:110:PCE'
RFF+ON:8493848394:1'
CPS+3+1'
PAC+22++CT'
PCI+33E'
GIN+BJ+342603046212378547'
LIN+2++4260304622123:EN'
QTY+12:330:PCE'
RFF+ON:8493848394:2'
CPS+4+1'
PAC+45++CT'
PCI+33E'
GIN+BJ+342603046212332145'
LIN+3++4260304624412:EN'
QTY+12:450:PCE'
RFF+ON:8493848394:3'
CNT+2:3'
UNT+34+1'
UNZ+1+MSGNR4711'

In the following, we examine the structure of the individual segments in detail.

UNA segment

 UNA:+.? '

The UNA segment stands for the “Service String Advice” and describes the separators used in the message. Usually, the following separators are used (syntax version 3).

: Composite element delimiter
+ Data element delimiter
. Character reserved for the decimal comma
? release character (escape character)
remains empty
' Segment delimiter

UNB segment

UNB+UNOA:3+8773456789012:14+9123456789012:14+140218:1552+MSGNR4711++++++1'

The UNB segment represents the interchange header, and contains information about the message sender, the message recipient, the message date, and so on.

UNOA stands, for example, for the character set used. Examples for permitted character sets are:

UNOA = UN/ECE level A; complies with ISO 646 - also called International Alphabet No. 5 - except lowercase letters.
UNOB = UN/ECE level B; like UNOA but also lowercase letters.
UNOC = UN/ECE level C; complies with ISO8859-1
UNOD = UN/ECE level D; complies with ISO8859-2
UNOE = UN/ECE level E (Cyrillic)
UNOF = UN/ECE level F (Greek)

3 identifies the UN/EDIFACT syntax version. There are four different UN/EDIFACT syntax versions. Nowadays mostly syntax version 3 and 4 are used.

8773456789012:14 corresponds to the sender of the message.

9123456789012:14 corresponds to the recipient of the message.

The identifier 14 indicates that the number is a GLN.

140218:1552 stands for February 2, 2018, 15:52.

MSGNR4711 is the unique number of the interchange. It is used in particular in the context of message routing for the unique identification of a message transmission.

The last 1 indicates that the test indicator is set and that the given interchange is a test message. This information is also important for message routing, because the recipient can distinguish production messages from test messages and they can be processed accordingly different.

UNH segment

UNH+1+DESADV:D:96A:UN:EAN005'

The UNH segment represents the header of a document. The number 1 is the unique number of the document within the interchange. The number is assigned by the sender.

DESADV:D:96A:UN indicates that the document is a dispatch advice and that the document type is from UN/EDIFACT Directory D96A. EAN005 indicates that it is an EANCOM document type and identifies the EANCOM version of the D96A EANCOM standard used.

BGM segment

BGM+351+DOCNR4712+9'

The BGM (beginning of message) segment initiates the actual document.

351 corresponds to the concrete document subtype. Examples for permitted values ​​are:

351 = Despatch advice
35E = Returns advice (EAN Code)
YA5 = Cross dock despatch advice (EAN Code)

DOCNR4712 is the unique number of the document given by the sender.

DTM segment

DTM+137:20180218:102'
DTM+2:20180220:102'

The DTM segment is used to specify date and time information.

The first part of this composite data element identifies the type of the date (date/time/period qualifier). For example:

137 = Document/message date/time. Date/time when a document/message has been issued.
2 = Delivery date/time, requested date on which buyer requests goods to be delivered

The second part represents the actual date value:

20180218 for example, February 18, 2018.

The third part specifies the pattern for the date (date/time/period format qualifier).

102 corresponds to CCYYMMDD

NAD segment

  NAD+SU+9983083940382::9'
  NAD+BY+5332357469542::9'
  NAD+DP+3839204835454::9'

The NAD segment is used to indicate the names and addresses of the companies involved. Instead of names and addresses, however, in most cases the unique identification by means of numbers, such as the GLN (global location number), is used.

The first part of the composite data element identifies the concrete type of business partner (party qualifier). Examples for permitted values are:

BY = Buyer
DP = Delivery party
SU = Supplier
WH = Warehouse keeper

The second part represents the 13-digit GLN and 9 indicates, that the provided number is a GLN number.

CPS segment

CPS+1'

A dispatch advice is structured using the concept of consignment packing sequences. Thereby, a CPS represents a specific layer in the hierarchy of a shipment, e.g., a pallet, a box, a carton, etc.

The number 1 indicates the hierarchy level, which in this case is level 1. Further CPS sequences may then refer to the upper layer, using the second digit. For example:

CPS+2+1'

indicates consignment packing sequence 2. The parent layer is consignment packing sequence 1.

PAC segment

PAC+1++PK'

The PAC segment is used to specify the number and the type of packages. In the example above 1 packaging of type PK (package) is denoted. Other packaging types are for instance:

09 = Returnable pallet (EAN Code)
201 = Pallet ISO 1 - 1/1 EURO Pallet (EAN Code)
PK = Package
SL = Slipsheet

PCI segment

PCI+33E'

The PCI segment is used to specify markings, which are used to uniquely identify the packages in the shipment. In retail mostly SSCC (serial shipping container codes) are used.

The code 33E indicates: marked with serial shipping container code.

GIN segment

GIN+BJ+342603046212321014'

GIN stands for goods identify number and is used to specify the code, which is attached to the packaging.

LIN segment

LIN+1++4260304623843:EN'

The LIN segment represents a line item position in a dispatch advice. Thereby, 1 is the line item number assigned by the sender.

The following number 4260304623843 represents the item number. The third part EN indicates the type of number - in this case a GTIN (global trade item number) was used.

QTY segment

QTY+12:110:PCE'

The quantity segment is used for the definition of shipping quantities.

The first part of the composite data element indicates the type of quantity.

12 = Despatch quantity
21 = Ordered quantity
59 = Numbers of consumer units in the traded unit

The second part is used to provide the actual quantity - in this case 110.

The third part indicates the unit of measure in which the quantity is provided. Possible values are for instance:

PCE = Piece
KGM = Kilogram
PND = Pound

RFF segment

RFF+ON:8493848394:1'

The RFF segment is used to provide reference numbers.

ON indicates a reference to an order message. 8493848394 is the referenced order number and 1 is the position number in the referenced order message.

CNT segment

CNT+2:3'

The CNT segment is used to specify control values ​​that can be used to check the integrity of the message upon receipt.

In the example above 2 indicates, that the following value represents the number of line items in the message. The control value 3 for the number of line items is correct, since the message actually contains three line item element.

UNT segment

UNT+34+1'

The UNT segment represents the message trailer.

The first number 34 indicates the number of segments in the message from the UNH to the UNT segment and is thus also a check digit. The number 1 must be the same message number as used in the UNH segment. This also serves to check the integrity of the EDI message.

UNZ segment

UNZ+1+MSGNR4711'

The UNZ segment represents the interchange trailer and is the last segment in an EDI interchange.

The first number 1 represents the number of messages contained in the interchange. The second entry MSGNR4711 is the same interchange number as in the UNB segment and also serves to check the integrity of the message.

Summary

Although incomprehensible at first glance, a closer look at an EDIFACT message shows the information hidden inside. In contrast to markup-based approaches such as XML, the size of an EDIFACT message is very small, since only coded information is transmitted and no space-intensive markup is used.

Although one might think that space does not matter with the storage capacities and Internet bandwidths available today, EDI message size is of major relevance. For example, Deutsche Telekom still calculates X.400 traffic on a kilobyte basis.

EDIFACT and its subsets are the most widely used EDI data exchange formats of companies worldwide. As a supplier to large companies or as a buyer of large companies, one therefore often has no choice but to use EDIFACT.

Questions?

Do you have any more questions about EDIFACT? Please do contact us or use our chat – we’re more than happy to help!


Share this post:


Topics: EDIFACT  EANCOM 

Related articles: