Newer
Older
+++
date = "2015-09-08"
weight = 100
aliases = [
"/old-wiki/Guidelines/XML_parsing"
]
# XML parsing
XML is used for a few formats within Apertis, although not as many as
[JSON]( {{< ref "json_parsing.md" >}} ). It is more commonly used
[GSettings]( {{< ref "gsettings.md" >}} ) and
[D-Bus]( {{< ref "d-bus_services.md" >}} ). In situations where it
is parsed by Apertis code, the XML being parsed typically comes from
untrusted sources (untrusted web APIs or user input), so must be
validated extremely carefully to prevent exploits.
## Summary
- Use a standard library to parse XML, such as libxml2. ([Parsing
XML]( {{< ref "xml_parsing.md#parsing-xml" >}} ))
- Write an XML schema for each XML format in use. ([Schema
validation]( {{< ref "xml_parsing.md#schema-validation" >}} ))
- Use xmllint to validate XML documents. ([Schema
validation]( {{< ref "xml_parsing.md#schema-validation" >}} ))
## Parsing XML
XML should be parsed using a standard library, such as
[libxml2](http://xmlsoft.org/). That will take care of checking the XML
for well-formedness and safely parsing the values it contains. The
output from libxml2 is a hierarchy of parsed XML elements — the Apertis
code must extract the data it requires from this hierarchy. The
navigation of this hierarchy is still security critical, as the parsed
XML document may not conform to the expected format (the *schema* for
that document). Strings should be checked to see if they’re empty or
invalid UTF-8; integer parsing should check for failure or unparsable
characters; the parser should error if required elements aren’t
encountered or expected attributes are missing; etc.
## Schema validation
Ideally, all XML formats will have an accompanying [XML
schema](http://en.wikipedia.org/wiki/XML_schema) which describes the
expected structure of the XML files. If a schema exists for an XML
document which is stored in git (such as a
[`GtkBuilder`](https://developer.gnome.org/gtk3/stable/GtkBuilder.html)
UI definition), that document can be validated at compile time, which
can help catch problems without the need for runtime testing.
Schemas can be written in
[XSD](http://en.wikipedia.org/wiki/XML_Schema_%28W3C%29) or
[RelaxNG](http://en.wikipedia.org/wiki/RELAX_NG). The choice is a matter
of personal preference, as both are equally expressive.
One tool for this is [xmllint](http://xmlsoft.org/xmllint.html), which
allows validation of XML documents against schemas. Given a schema
called `schema.xsd` and an XML document called `example.xml`, the
following `Makefile.am` snippet will validate them at compile time:
check-local: check-xml
check-xml: schema.xsd $(xml_files)
xmllint --noout --schema schema.xsd $(xml_files)
.PHONY: check-xml
Various existing autotools macros for systems which use XML, such as
[GSettings]( {{< ref "gsettings.md" >}} ), already automatically
validate the relevant XML files.
## External links
- [XML website](http://www.xml.com/)
- [XML Schema tutorial](http://www.w3schools.com/schema/)
- [libxml2 website](http://xmlsoft.org/)
- [libxml2 documentation](http://xmlsoft.org/html/libxml-lib.html)