---
title: XML
author: massifrg@gmail.com
---
# Pandoc XML format
This document describes Pandoc's `xml` format, a 1:1 equivalent
of the `native` and `json` formats.
Here's the xml version of the beginning of this document,
to give you a glimpse of the format:
```xml
massifrg@gmail.comXMLPandoc XML formatThis document describes Pandoc’s xml format, a 1:1 equivalentof the native and json formats.
...
```
## The tags
If you know [Pandoc types](https://hackage.haskell.org/package/pandoc-types-1.23.1/docs/Text-Pandoc-Definition.html), the XML conversion is fairly straightforward.
These are the main rules:
- `Str` inlines are usually converted to plain, UTF-8 text (see below for exceptions)
- `Space` inlines are usually converted to " " chars (see below for exceptions)
- every `Block` and `Inline` becomes an element with the same name and the same capitalization:
a `Para` Block becomes a `` element, an `Emph` Inline becomes an `` element,
and so on;
- the root element is `` and it has a `api-version` attribute, whose value
is a string of comma-separated integer numbers; it matches the `pandoc-api-version`
field of the `json` format;
- the root `` element has only two children: `` and ``
(lowercase, as in `json` format);
- blocks and inlines with an `Attr` are HTM-like, and they have:
- the `id` attribute for the identifier
- the `class` attribute, a string of space-separated classes
- the other attributes of `Attr`, without any prefix (so no `data-` prefix, instead of HTML)
- attributes are in lower (kebab) case:
- `level` in Header
- `start`, `number-style`, `number-delim` in OrderedList;
style and delimiter values are capitalized exactly as in `Text.Pandoc.Definition`;
- `format` in `RawBlock` and RawInline
- `quote-type` in Quoted (values are `SingleQuote` and `DoubleQuote`)
- `math-type` in Math (values are `InlineMath` and `DisplayMath`)
- `title` and `src` in Image target
- `title` and `href` in Link target
- `alignment` and `col-width` in ColSpec (about `col-width` values, see below);
(alignment values are capitalized as in `Text.Pandoc.Definition`)
- `alignment`, `row-span` and `col-span` in Cell
- `row-head-columns` in TableBody
- `id`, `mode`, `note-num` and `hash` for Citation (about Cite elements, see below);
(`mode` values are capitalized as in `Text.Pandoc.Definition`)
The classes of items with an `Attr` are put in a `class` attribute,
so that you can style the XML with CSS.
## Str and Space elements
`Str` and `Space` usually result in text and normal " " spaces, but there are exceptions:
- `Str ""`, an empty string, is not suppressed; instead it is converted into a `` element;
- `Str "foo bar"`, a string containing a space, is converted as ``;
- consecutive `Str` inlines, as in `[ ..., Str "foo", Str "bar", ... ]`,
are encoded as `foo` to keep their individuality;
- consecutive `Space` inlines, as in `[ ..., Space, Space, ... ]`,
are encoded as ``
- `Space` inlines at the start or at the end of their container element
are always encoded with a `` element, instead of just a " "
These encodings are necessary to ensure 1:1 equivalence of the `xml` format with the AST,
or the `native` and `json` formats.
Since the ones above are corner cases, usually you should not see those `` and ``
elements in your documents.
## Added tags
Some other elements have been introduced to better structure the resulting XML.
Since they are not Pandoc Blocks or Inlines, or they have no constructor or type
in Pandoc's haskell code, they are kept lowercased.
### BulletList and OrderedList items
Items of those lists are embedded in `` elements.
These snippets are from the `xml` version of `test/testsuite.native`:
```xml
asterisk 1asterisk 2asterisk 3
...
FirstSecondThird
```
### DefinitionList items
Definition lists have `` elements.
Each `` term has only one `` child element,
and one or more `` children elements.
This snippet is from the `xml` version of `test/testsuite.native`:
```xml
applered fruitorangeorange fruitbananayellow fruit
```
### Figure and Table captions
Figures and tables have a `
` child element,
which in turn may optionally have a `` child element.
This snippet is from the `xml` version of `test/testsuite.native`:
```xml
lalune
lalune
```
### Tables
A `
` element has:
- a `
` child element;
- a `` child element, whose children are empty
`` elements;
- a `` child element;
- one or more `` children elements, that in turn
have two children: `` and ``, whose children
are `` elements;
- a `` child element.
This specification is debatable; I have these doubts:
- is it necessary to enclose the `` elements in a `` element?
- to discriminate between header and data cells in table bodies,
there are the `row-head-columns` attribute, and the `` and `` children
of the `` element, but there's only one type of cell:
every cell is a `` element
- the specs are a tradeoff between consistency with pandoc types and CSS compatibility;
this way bodies' header rows are easily stylable with CSS, while header columns are not
The `ColWidthDefault` value becomes a "0" value for the attribute `col-width`;
this way it's type-consistent with non-zero values, but I'm still doubtful whether to
leave its value as a "ColWidthDefault" string.
Here's an example from the `xml` version of `test/tables/planets.native`:
```xml
Data about the planets of our solar system.
NameMass (10^24kg)
...
Terrestrial planetsMercury0.3304,87954273.74222.657.91670Closest to the Sun
...
```
### Metadata and MetaMap entries
Metadata entries are meta values (`MetaBool`, `MetaString`, `MetaInlines`, `MetaBlocks`,
`MetaList` and `MetaMap` elements) inside `` elements.
The `` and the `` elements have the same children elements (``),
which have a `key` attribute.
``, ``, `` and `` elements
all have children elements.
`` elements have only text.
`` elements are empty, they can be either ``
or ``.
This snippet is from the `xml` version of `test/testsuite.native`:
```xml
John MacFarlaneAnonymousJuly 17, 2006Pandoc Test Suite
```
### Cite elements
`Cite` inlines are modeled with `` elements, whose first child
is a `` element, that have only `` children elements.
`` elements are empty, unless they have a prefix and/or a suffix.
Here's an example from the `xml` version of `test/markdown-citations.native`:
```xml
@item1 says blah.p. 30@item1 [p. 30] says blah.A citation group see chap. 3also p. 34-35[see @item1 chap. 3; also @пункт3 p. 34-35].
```