XML (Extensible Markup Language) is the backbone of modern data exchange. From XRechnung and ZUGFeRD to SOAP web services, RSS feeds, and Android app manifests, XML is everywhere in enterprise software development. Understanding XML is essential for any developer working with German e-invoicing, government procurement systems, or financial data interchange.
What Is XML?
XML is a markup language that encodes data in a hierarchical, text-based format that is both human-readable and machine-parseable. Unlike HTML — which describes how to display content — XML describes what data is and how it is structured. XML has no predefined tags; every tag you use describes the domain-specific meaning of the data it contains.
The W3C (World Wide Web Consortium) standardized XML 1.0 in 1998, and it has remained remarkably stable since. XML 1.1 relaxed some character restrictions for Unicode compliance, but XML 1.0 remains the dominant version used in production systems, including all XRechnung files.
The Anatomy of an XML Document
A well-formed XML document has the following structure:
- XML Declaration: An optional first line specifying the XML version and character encoding: <?xml version="1.0" encoding="UTF-8"?>. Always include this in XRechnung files.
- Root Element: Every XML document must have exactly one root element that contains all other elements. In XRechnung UBL, this is <Invoice>.
- Elements: Data containers defined by an opening tag (<ElementName>) and a closing tag (</ElementName>). Elements can be nested.
- Attributes: Key-value pairs inside opening tags that provide metadata: <amount currencyID="EUR">100.00</amount>.
- Text Content: The actual data value between opening and closing tags.
- Comments: Human-readable notes that are ignored by parsers: <!-- This is a comment -->.
- CDATA Sections: Sections where special characters do not need escaping: <![CDATA[<raw content here>]]>.
XML Rules: What Makes an XML Document Well-Formed?
A well-formed XML document must satisfy these rules:
- All elements must be properly closed. <br /> or <br></br>, not <br>.
- Tags are case-sensitive: <Invoice> and <invoice> are different elements.
- Elements must be properly nested — a child element must close before its parent closes.
- There must be exactly one root element.
- Attribute values must be quoted (single or double quotes).
- The characters <, >, &, ', and " must be escaped as <, >, &, ', and " in text content.
XML Namespaces: Critical for XRechnung
XML namespaces prevent naming conflicts when combining XML vocabularies from different sources. Since XRechnung combines elements from multiple standards (UBL, CCTS, XSD), namespaces are essential. A namespace is declared using the xmlns attribute:
In XRechnung UBL, you will see declarations like xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" (the default namespace for Invoice elements) alongside xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" (for aggregate components like PostalAddress) and xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" (for basic components like Amount).
When you see cbc:Amount or cac:PostalAddress in an XRechnung file, the prefix (cbc, cac) refers to a namespace URI. The full element name in namespace-aware parsing is the URI + local name, making it globally unique regardless of prefix.
XPath: Querying XML Documents
XPath is the query language for XML, used extensively in XSLT transformations and KOSIT validation rules. KOSIT's Schematron business rules use XPath to verify XRechnung content. Understanding basic XPath helps when debugging validation errors:
- / selects the root element.
- //element selects all elements with that name anywhere in the document.
- @attribute selects an attribute.
- element[condition] filters elements by condition.
- text() selects the text content of an element.
XML Schema (XSD) Validation
While well-formedness checks ensure syntactic validity, XML Schema (XSD) validation ensures that your document conforms to a specific structural definition. XRechnung schemas define which elements are mandatory, their data types, and their allowed values. Failing XSD validation means a field has the wrong type (e.g., a text value where a date is expected) or a mandatory element is missing.
Parsing XML in Common Programming Languages
Most modern programming languages have XML parsing built in or as standard libraries:
- Python: xml.etree.ElementTree (built-in) or lxml for full XPath/XSLT support. Use lxml for XRechnung processing.
- JavaScript/Node.js: The fast-xml-parser library (used in our browser tools) or xml2js for simple cases. Use DOMParser in browser environments.
- Java: JAXB for binding, SAX for streaming, DOM for tree-based parsing.
- C#/.NET: System.Xml namespace with XmlDocument (DOM) or XmlReader (streaming).
- Go: encoding/xml (built-in) for basic parsing; github.com/lestrrat-go/libxml2 for full XPath support.
Common XML Pitfalls in E-Invoicing
- BOM (Byte Order Mark): Adding a UTF-8 BOM (\xEF\xBB\xBF) at the start of an XML file can break parsers that do not expect it. Avoid BOM in XRechnung files.
- Encoding mismatch: Declaring encoding='UTF-8' but writing non-UTF-8 bytes will cause a parse error.
- Whitespace sensitivity: In some XML contexts, whitespace within tags is significant. Do not introduce spaces in element names or attribute names.
- Empty elements: <Element></Element> and <Element/> are semantically equivalent but some parsers or validators may prefer one form.
- Namespace prefix consistency: The prefix (cbc, cac) is arbitrary — only the namespace URI matters. You can rename prefixes freely as long as the URI remains the same.
Frequently Asked Questions
Should I use XML or JSON for new projects?
For new web APIs and microservices, JSON is typically preferred due to its smaller size and native JavaScript support. For document exchange, invoicing, configuration, and any use case requiring schemas, namespaces, or transformation pipelines, XML remains the better choice. XRechnung will remain XML-based for the foreseeable future.
Why is XRechnung XML so verbose?
XRechnung uses the UBL or CII schemas, which are designed for global interoperability and include namespaces and component libraries that add verbosity. This is intentional — the verbosity makes the schema extensible and prevents ambiguity in international document exchange. Compression (gzip) reduces transmission overhead when size matters.