Do you have any questions about EDIFACT? Please do contact us or use our chat ā weāre more than happy to help!
EDI standards
An EDI file may seem like a random jumble of characters at first glance. On closer examination, however, a well-thought-out schematic structure is revealed, which makes the processing of the message by computer programs possible in the first place. Behind an EDI file is a specific EDI standard that dictates how the file must be structured. Typical standard formats that can be used here are, for example, EDIFACT, XML, CSV files or flat file formats.
An EDI standard usually builds on the following four principles:
Syntax rules
The syntax rules define the allowed characters and the allowed order in which the individual characters may be used.
Codes
Within an EDI file most of the information is accurately identified using codes ā for example, currency codes, country codes, but also codes for identifying a particular date format, etc.
Message design
The message design defines the structure of a particular message type. Message types include, for example, purchase orders, delivery notes, invoices, etc.
Identification of values āāin an EDI message
Depending on the standard used, there are three ways in which a value can be identified in an EDI file:
a) Implicitly by the position in the message. This technique is used in flat file formats and CSV. The exact position and semantic of a given value is defined in the accompanying documentation. For example: A line beginning with the characters 100
stands for the header line. The characters from position 4 ā 17 in a line beginning with 100
represent the GLN of the sender, etc.
b) Implicitly through the use of separators. Using a set of predefined separator characters, a message structure is explicitly defined. The message structure defines which building blocks are to be used and in which order the building blocks must be aligned to form a correct EDI message, e.g., an orders message. This technique is used in EDIFACT files.
c) Explicitly through the application of metadata. With the help of additional data, the meaning of the individual information fragments in a file is specified more precisely. This technique is used in XML, for example, where the actual information is enclosed by means of markup elements and attributes ā e.g.:
<InvoiceRecipient id="IV4394">Recipient of invoice</InvoiceRecipient>
UN/EDIFACT ā A standard of the United Nations
UN/EDIFACT is the abbreviation for United Nations Electronic Data Interchange for Administration, Commerce, and Transport. The standardization organization behind UN/EDIFACT is UN/CEFACT (United Nations Center for Trade Facilitation and Electronic Business). UN/EDIFACT is one of the most widely used EDI standards today alongside ANSI ASC X12. ANSI ASC X12 is very common in the North American market, whereas in Europe UN/EDIFACT prevails (or one of the various subdialects).
As the following figure shows, the UN/EDIFACT standard is based on four different pillars.
Syntax
The syntax defines the exact rules for message building, as well as the characters used to separate individual message segments and elements.
Data elements
A data element is the smallest unit in an EDIFACT file.
Segments
Groups of similar data elements form so-called segments.
Messages
Messages represent an ordered sequence of segments ā for example, a DESADV file represents a delivery note.
Additionally, the EDIFACT standard defines message delivery requirements. For example, the exact structure of a specific message exchange, which in turn may contain several EDIFACT files.
EDIFACT standards
The exact structure of an EDIFACT message is defined in official standard documents, which are also available online. UN/CEFACT approves two EDIFACT standard versions per year, each marked with the year followed by āAā (for the first release of a year) and āBā (for the second release of a year). D01B
is therefore the second standard version from the year 2001.
In addition, separate subsets of the official UN/EDIFACT standard exist for the various industry sectors and domains. In the consumer goods sector the EANCOM subset is very common, which is also used for the example of this article. EANCOM is the worldās most widely used standard for electronic data interchange. The most frequently used EANCOM message types are ORDERS, DESADV and INVOIC.
Structure of an EDIFACT file
An EDIFACT file follows an exact hierarchy, which is denoted below.
The topmost unit of an EDIFACT message is the Interchange (UNB), which can be thought of as an envelope. The interchange defines the message recipient, the message sender, the message number, the message date, etc.
An interchange can in turn contain several individual Groups (UNG) representing message groups. Alternatively, an interchange can also contain individual messages (concrete messages). Mixing of individual messages and message groups within an interchange is not allowed.
A message itself is enclosed by a header (UNH) and a trailer segment (UNT). A message group is surrounded by an UNG and UNE segment.
Within a message, there are several segments and segment groups, which represent individual related message parts (for example, information about the biller, a specific invoice line, etc.). A segment group is initiated by a so-called trigger element.
Segments consist of data elements and composite data elements.
The smallest unit of an EDIFACT file are simple data elements.
Simple Data Elements
Simple data elements form the basic building blocks of an EDIFACT file and represent ā as the name already suggests ā simple data values.
An example for a simple data element is a party name.
3036 ā Party Name
Description: Name of a party.
Representation: an..35
Example:
John Doe
The abbreviation an..35 means, that a maximum of 35 alphanumeric characters may be used for the party name.
Simple data element with code list
For a simple data element with code list no free text can be used, but a list of predefined values āā(= codes) must be used.
An example for a simple data element with code list is a coded document name.
1001 Document name code
Description: Code specifying the document name.
Representation: an..3
Example:
1 Certificate of analysis
Certificate providing the values of an analysis.
2 Certificate of conformity
Certificate certifying the conformity to predefined
definitions.
3 Certificate of quality
Certificate certifying the quality of goods, services
etc.
4 Test report
Report providing the results of a test session.
ā¦
Composite Data Element
A composite data element consists of individual simple data elements and represents data with additional metadata (= additional data describing the actual data).
The individual components within a composite data element (typically simple data elements and simple data elements with code lists) are separated using the :
character.
An example for a composite data element is a duty/tax or fee type.
C241 ā DUTY/TAX/FEE Type
Structure:
5153 | Duty or tax or fee type name code | C | an..3 |
1131 | Code list identification code | C | n..17 |
3055 | Code list responsible agency code | C | an..3 |
5152 | Duty or tax or fee type name | C | an..35 |
Example:
The following example shows an exemplary duty/tax/fee type.
AAA:52:1:tax type xyz
AAA
= Petroleum tax
52
= Value added tax identification
1
= CCC (Customs Co-operation Council)
tax type xyz
= Free text description of tax
Segments
Segments consist of simple data elements and composite data elements and represent compound data, such as an address.
The individual data elements in a segment are separated by the +
character. A segment starts with the three-digit segment identifier and ends with the '
character.
The exact structure of a segment is described by means of so-called segment tables. The following segment table describes the structure of the LIN (line item) segment segment .
010 1082 LINE ITEM IDENTIFIER C 1 an..6 020 1229 ACTION REQUEST/NOTIFICATION DESCRIPTION CODE C 1 an..3 030 C212 ITEM NUMBER IDENTIFICATION C 1 7140 Item identifier C an..35 7143 Item type identification code C an..3 1131 Code list identification code C an..17 3055 Code list responsible agency code C an..3 040 C829 SUB-LINE INFORMATION C 1 5495 Sub-line indicator code C an..3 1082 Line item identifier C an..6 050 1222 CONFIGURATION LEVEL NUMBER C 1 n..2 060 7083 CONFIGURATION OPERATION CODE C 1 an..3
M
indicates āmandatoryā, i.e., the simple data element or composite data element must be specified. C
stands for āconditionalā and means that the data element can optionally be specified. The values āan..3
, an..17
, etc. represent the number of permitted alphanumeric characters.
Assume we want to represent the following line item information using the LIN segment.
Pineapples
Line item number: 2
GTIN: 9393398439325
Type: article has been added
The resulting EDIFACT segment is:
LIN+2+1+9393398439325:EN'
Segment groups
Segment groups are used to aggregate several individual segments into groups of related segments.
For example, the following segment group allows to specify contact details by combining the segments CTA (Contact information) and COM (Communication contact).
0220 ----- Segment group 5 ------------------ C 5----------+| 0230 CTA Contact information M 1 || 0240 COM Communication contact C 5----------++
Some possible segment sequences would be for example:
CTA-CTA-CTA-COM-COM-CTA-COM CTA CTA-COM-CTA-CTA ...
As indicated by C 5
, the segment group itself is optional and may occur up to 5 times. The segment group is initiated by a so-called trigger segment. It is the first element within the segment group that usually has cardinality M 1
(that is, it must occur exactly once).
Please see our article for more on the EDIFACT PRI segment.
Messages
A message represents a related sequence of segments and represents a concrete business document ā for example, an DESADV message (dispatch advice). The following section shows the first part of a DESADV definition.
0010 UNH Message header M 1 0020 BGM Beginning of message M 1 0030 DTM Date/time/period C 10 0040 ALI Additional information C 5 0050 MEA Measurements C 5 0060 MOA Monetary amount C 5 0070 CUX Currencies C 9 0080 ----- Segment group 1 ------------------ C 10----------+ 0090 RFF Reference M 1 | 0100 DTM Date/time/period C 1-----------+ 0110 ----- Segment group 2 ------------------ C 99----------+ 0120 NAD Name and address M 1 | 0130 LOC Place/location identification C 10 | | 0140 ----- Segment group 3 ------------------ C 10---------+| 0150 RFF Reference M 1 || 0160 DTM Date/time/period C 1----------+| | 0170 ----- Segment group 4 ------------------ C 10---------+| 0180 CTA Contact information M 1 || 0190 COM Communication contact C 5----------++ 0200 ----- Segment group 5 ------------------ C 10----------+ 0210 TOD Terms of delivery or transport M 1 | 0220 LOC Place/location identification C 5 | 0230 FTX Free text C 5-----------+ ...
EDIFACT sample dispatch advice
In the following EDIFACT dispatch advice example we will represent the delivery structure shown in the figure below.
Based on the previous concepts, the example now shows a concrete EDIFACT message for a dispatch advice. To increase readability, line breaks have been added after each segment. In a regular EDIFACT file no line breaks shall be used.
UNA:+.? ' UNB+UNOA:3+8773456789012:14+9123456789012:14+140218:1552+MSGNR4711++++++1' UNH+1+DESADV:D:96A:UN:EAN005' BGM+351+DOCNR4712+9' DTM+137:20180218:102' DTM+2:20180220:102' NAD+SU+9983083940382::9' NAD+BY+5332357469542::9' NAD+DP+3839204835454::9' CPS+1' PAC+1++PK' PCI+33E' GIN+BJ+342603046212321014' CPS+2+1' PAC+11++CT' PCI+33E' GIN+BJ+342603046212341547' LIN+1++4260304623843:EN' QTY+12:110:PCE' RFF+ON:8493848394:1' CPS+3+1' PAC+22++CT' PCI+33E' GIN+BJ+342603046212378547' LIN+2++4260304622123:EN' QTY+12:330:PCE' RFF+ON:8493848394:2' CPS+4+1' PAC+45++CT' PCI+33E' GIN+BJ+342603046212332145' LIN+3++4260304624412:EN' QTY+12:450:PCE' RFF+ON:8493848394:3' CNT+2:3' UNT+34+1' UNZ+1+MSGNR4711'
In the following, we examine the structure of the individual segments in detail.
UNA segment
UNA:+.? '
The UNA segment stands for the āService String Adviceā and describes the separators used in the message. Usually, the following separators are used (syntax version 3).
:
Composite element delimiter
+
Data element delimiter
.
Character reserved for the decimal comma
?
release character (escape character)
remains empty
'
Segment delimiter
UNB segment
UNB+UNOA:3+8773456789012:14+9123456789012:14+140218:1552+MSGNR4711++++++1'
The UNB segment represents the interchange header, and contains information about the message sender, the message recipient, the message date, and so on.
UNOA
stands, for example, for the character set used. Examples for permitted character sets are:
UNOA
= UN/ECE level A; complies with ISO 646 ā also called International Alphabet No. 5 ā except lowercase letters.
UNOB
= UN/ECE level B; like UNOA but also lowercase letters.
UNOC
= UN/ECE level C; complies with ISO8859-1
UNOD
= UN/ECE level D; complies with ISO8859-2
UNOE
= UN/ECE level E (Cyrillic)
UNOF
= UN/ECE level F (Greek)
3
identifies the UN/EDIFACT syntax version. There are four different UN/EDIFACT syntax versions. Nowadays mostly syntax version 3 and 4 are used.
8773456789012:14
corresponds to the sender of the message.
9123456789012:14
corresponds to the recipient of the message.
The identifier 14
indicates that the number is a GLN.
140218:1552
stands for February 18, 2014, 15:52.
MSGNR4711
is the unique number of the interchange. It is used in particular in the context of message routing for the unique identification of a message transmission.
The last 1
indicates that the test indicator is set and that the given interchange is a test message. This information is also important for message routing, because the recipient can distinguish production messages from test messages and they can be processed accordingly different.
UNH segment
UNH+1+DESADV:D:96A:UN:EAN005'
The UNH segment represents the header of a document. The number 1
is the unique number of the document within the interchange. The number is assigned by the sender.
DESADV:D:96A:UN
indicates that the document is a dispatch advice and that the document type is from UN/EDIFACT Directory D96A. EAN005
indicates that it is an EANCOM document type and identifies the EANCOM version of the D96A EANCOM standard used.
BGM segment
BGM+351+DOCNR4712+9'
The BGM (beginning of message) segment initiates the actual document.
351
corresponds to the concrete document subtype. Examples for permitted values āāare:
351
= Despatch advice
35E
= Returns advice (EAN Code)
YA5
= Cross dock despatch advice (EAN Code)
DOCNR4712
is the unique number of the document given by the sender.
DTM segment
DTM+137:20180218:102' DTM+2:20180220:102'
The DTM segment is used to specify date and time information.
The first part of this composite data element identifies the type of the date (date/time/period qualifier). For example:
137
= Document/message date/time. Date/time when a document/message has been issued.
2
= Delivery date/time, requested date on which buyer requests goods to be delivered
The second part represents the actual date value:
20180218
for example, February 18, 2018.
The third part specifies the pattern for the date (date/time/period format qualifier).
102
corresponds to CCYYMMDD
NAD segment
NAD+SU+9983083940382::9' NAD+BY+5332357469542::9' NAD+DP+3839204835454::9'
The NAD segment is used to indicate the names and addresses of the companies involved. Instead of names and addresses, however, in most cases the unique identification by means of numbers, such as the GLN (global location number), is used.
The first part of the composite data element identifies the concrete type of business partner (party qualifier). Examples for permitted values are:
BY
= Buyer
DP
= Delivery party
SU
= Supplier
WH
= Warehouse keeper
The second part represents the 13-digit GLN and 9
indicates, that the provided number is a GLN number.
CPS segment
CPS+1'
A dispatch advice is structured using the concept of consignment packing sequences. Thereby, a CPS represents a specific layer in the hierarchy of a shipment, e.g., a pallet, a box, a carton, etc.
The number 1
indicates the hierarchy level, which in this case is level 1
. Further CPS sequences may then refer to the upper layer, using the second digit. For example:
CPS+2+1'
indicates consignment packing sequence 2
. The parent layer is consignment packing sequence 1
.
PAC segment
PAC+1++PK'
The PAC segment is used to specify the number and the type of packages. In the example above 1
packaging of type PK
(package) is denoted. Other packaging types are for instance:
09
= Returnable pallet (EAN Code)
201
= Pallet ISO 1 ā 1/1 EURO Pallet (EAN Code)
PK
= Package
SL
= Slipsheet
PCI segment
PCI+33E'
The PCI segment is used to specify markings, which are used to uniquely identify the packages in the shipment. In retail mostly SSCC (serial shipping container codes) are used.
The code 33E
indicates: marked with serial shipping container code.
GIN segment
GIN+BJ+342603046212321014'
GIN stands for goods identify number and is used to specify the code, which is attached to the packaging.
LIN segment
LIN+1++4260304623843:EN'
The LIN segment represents a line item position in a dispatch advice. Thereby, 1
is the line item number assigned by the sender.
The following number 4260304623843
represents the item number. The third part EN
indicates the type of number ā in this case a GTIN (global trade item number) was used.
QTY segment
QTY+12:110:PCE'
The quantity segment is used for the definition of shipping quantities.
The first part of the composite data element indicates the type of quantity.
12
= Despatch quantity
21
= Ordered quantity
59
= Numbers of consumer units in the traded unit
The second part is used to provide the actual quantity ā in this case 110
.
The third part indicates the unit of measure in which the quantity is provided. Possible values are for instance:
PCE
= Piece
KGM
= Kilogram
PND
= Pound
RFF segment
RFF+ON:8493848394:1'
The RFF segment is used to provide reference numbers.
ON
indicates a reference to an order message. 8493848394
is the referenced order number and 1
is the position number in the referenced order message.
CNT segment
CNT+2:3'
The CNT segment is used to specify control values āāthat can be used to check the integrity of the message upon receipt.
In the example above 2
indicates, that the following value represents the number of line items in the message. The control value 3
for the number of line items is correct, since the message actually contains three line item element.
UNT segment
UNT+34+1'
The UNT segment represents the message trailer.
The first number 34
indicates the number of segments in the message from the UNH to the UNT segment and is thus also a check digit. The number 1
must be the same message number as used in the UNH segment. This also serves to check the integrity of the EDI message.
UNZ segment
UNZ+1+MSGNR4711'
The UNZ segment represents the interchange trailer and is the last segment in an EDI interchange.
The first number 1
represents the number of messages contained in the interchange. The second entry MSGNR4711
is the same interchange number as in the UNB segment and also serves to check the integrity of the message.
Summary
Although incomprehensible at first glance, a closer look at an EDIFACT message shows the information hidden inside. In contrast to markup-based approaches such as XML, the size of an EDIFACT message is very small, since only coded information is transmitted and no space-intensive markup is used.
Although one might think that space does not matter with the storage capacities and Internet bandwidths available today, EDI message size is of major relevance. For example, Deutsche Telekom still calculates X.400 traffic on a kilobyte basis.
EDIFACT and its subsets are the most widely used EDI data exchange formats of companies worldwide. As a supplier to large companies or as a buyer of large companies, one therefore often has no choice but to use EDIFACT.
Questions?
Do you have any more questions about EDIFACT? Please do contact us or use our chat ā weāre more than happy to help!