aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/xml.md376
-rw-r--r--pandoc.cabal4
-rw-r--r--src/Text/Pandoc/Readers.hs3
-rw-r--r--src/Text/Pandoc/Readers/XML.hs540
-rw-r--r--src/Text/Pandoc/Writers.hs3
-rw-r--r--src/Text/Pandoc/Writers/XML.hs365
-rw-r--r--src/Text/Pandoc/XMLFormat.hs188
-rw-r--r--test/Tests/XML.hs28
-rw-r--r--test/test-pandoc.hs2
-rw-r--r--tools/pandoc-xml.dtd181
-rw-r--r--tools/pandoc-xml.rnc250
-rw-r--r--tools/pandoc-xml.rng913
-rw-r--r--tools/pandoc-xml.xsd602
13 files changed, 3455 insertions, 0 deletions
diff --git a/doc/xml.md b/doc/xml.md
new file mode 100644
index 000000000..df90b7eea
--- /dev/null
+++ b/doc/xml.md
@@ -0,0 +1,376 @@
+---
+title: XML
+---
+
+# Pandoc XML format
+
+This document describes Pandoc's `xml` format, a 1:1 equivalent
+of the `native` and `json` formats.
+
+Here's the xml version of the beginning of this document,
+to give you a glimpse of the format:
+
+```xml
+<?xml version='1.0' ?>
+<Pandoc api-version="1,23,1">
+<meta>
+ <entry key="author">
+ <MetaInlines>[email protected]</MetaInlines>
+ </entry>
+ <entry key="title">
+ <MetaInlines>XML</MetaInlines>
+ </entry>
+</meta>
+<blocks>
+ <Header id="pandoc-xml-format" level="1">Pandoc XML format</Header>
+ <Para>This document describes Pandoc’s <Code>xml</Code> format, a 1:1 equivalent<SoftBreak />of the <Code>native</Code> and <Code>json</Code> formats.</Para>
+ ...
+</blocks>
+</Pandoc>
+```
+
+## The tags
+
+If you know [Pandoc types](https://hackage.haskell.org/package/pandoc-types-1.23.1/docs/Text-Pandoc-Definition.html), the XML conversion is fairly straightforward.
+
+These are the main rules:
+
+- `Str` inlines are usually converted to plain, UTF-8 text (see below for exceptions)
+
+- `Space` inlines are usually converted to " " chars (see below for exceptions)
+
+- every `Block` and `Inline` becomes an element with the same name and the same capitalization:
+ a `Para` Block becomes a `<Para>` element, an `Emph` Inline becomes an `<Emph>` element,
+ and so on;
+
+- the root element is `<Pandoc>` and it has a `api-version` attribute, whose value
+ is a string of comma-separated integer numbers; it matches the `pandoc-api-version`
+ field of the `json` format;
+
+- the root `<Pandoc>` element has only two children: `<meta>` and `<blocks>`
+ (lowercase, as in `json` format);
+
+- blocks and inlines with an `Attr` are HTM-like, and they have:
+
+ - the `id` attribute for the identifier
+
+ - the `class` attribute, a string of space-separated classes
+
+ - the other attributes of `Attr`, without any prefix (so no `data-` prefix, instead of HTML)
+
+- attributes are in lower (kebab) case:
+
+ - `level` in Header
+
+ - `start`, `number-style`, `number-delim` in OrderedList;
+ style and delimiter values are capitalized exactly as in `Text.Pandoc.Definition`;
+
+ - `format` in `RawBlock` and RawInline
+
+ - `quote-type` in Quoted (values are `SingleQuote` and `DoubleQuote`)
+
+ - `math-type` in Math (values are `InlineMath` and `DisplayMath`)
+
+ - `title` and `src` in Image target
+
+ - `title` and `href` in Link target
+
+ - `alignment` and `col-width` in ColSpec (about `col-width` values, see below);
+ (alignment values are capitalized as in `Text.Pandoc.Definition`)
+
+ - `alignment`, `row-span` and `col-span` in Cell
+
+ - `row-head-columns` in TableBody
+
+ - `id`, `mode`, `note-num` and `hash` for Citation (about Cite elements, see below);
+ (`mode` values are capitalized as in `Text.Pandoc.Definition`)
+
+The classes of items with an `Attr` are put in a `class` attribute,
+so that you can style the XML with CSS.
+
+## Str and Space elements
+
+`Str` and `Space` usually result in text and normal " " spaces, but there are exceptions:
+
+- `Str ""`, an empty string, is not suppressed; instead it is converted into a `<Str />` element;
+
+- `Str "foo bar"`, a string containing a space, is converted as `<Str content="foo bar" />`;
+
+- consecutive `Str` inlines, as in `[ ..., Str "foo", Str "bar", ... ]`,
+ are encoded as `foo<Str content="bar" />` to keep their individuality;
+
+- consecutive `Space` inlines, as in `[ ..., Space, Space, ... ]`,
+ are encoded as `<Space count="2" />`
+
+- `Space` inlines at the start or at the end of their container element
+ are always encoded with a `<Space />` element, instead of just a " "
+
+These encodings are necessary to ensure 1:1 equivalence of the `xml` format with the AST,
+or the `native` and `json` formats.
+
+Since the ones above are corner cases, usually you should not see those `<Str />` and `<Space />`
+elements in your documents.
+
+## Added tags
+
+Some other elements have been introduced to better structure the resulting XML.
+
+Since they are not Pandoc Blocks or Inlines, or they have no constructor or type
+in Pandoc's haskell code, they are kept lowercased.
+
+### BulletList and OrderedList items
+
+Items of those lists are embedded in `<item>` elements.
+
+These snippets are from the `xml` version of `test/testsuite.native`:
+
+```xml
+<BulletList>
+ <item>
+ <Plain>asterisk 1</Plain>
+ </item>
+ <item>
+ <Plain>asterisk 2</Plain>
+ </item>
+ <item>
+ <Plain>asterisk 3</Plain>
+ </item>
+</BulletList>
+...
+<OrderedList start="1" number-style="Decimal" number-delim="Period">
+ <item>
+ <Plain>First</Plain>
+ </item>
+ <item>
+ <Plain>Second</Plain>
+ </item>
+ <item>
+ <Plain>Third</Plain>
+ </item>
+</OrderedList>
+```
+
+### DefinitionList items
+
+Definition lists have `<item>` elements.
+
+Each `<item>` term has only one `<term>` child element,
+and one or more `<def>` children elements.
+
+This snippet is from the `xml` version of `test/testsuite.native`:
+
+```xml
+<DefinitionList>
+ <item>
+ <term>apple</term>
+ <def>
+ <Plain>red fruit</Plain>
+ </def>
+ </item>
+ <item>
+ <term>orange</term>
+ <def>
+ <Plain>orange fruit</Plain>
+ </def>
+ </item>
+ <item>
+ <term>banana</term>
+ <def>
+ <Plain>yellow fruit</Plain>
+ </def>
+ </item>
+</DefinitionList>
+```
+
+### Figure and Table captions
+
+Figures and tables have a `<Caption>` child element,
+which in turn may optionally have a `<ShortCaption>` child element.
+
+This snippet is from the `xml` version of `test/testsuite.native`:
+
+```xml
+<Figure>
+ <Caption>
+ <Plain>lalune</Plain>
+ </Caption>
+ <Plain><Image src="lalune.jpg" title="Voyage dans la Lune">lalune</Image></Plain>
+</Figure>
+```
+
+### Tables
+
+A `<Table>` element has:
+
+- a `<Caption>` child element;
+
+- a `<colspecs>` child element, whose children are empty
+ `<ColSpec alignment="..." col-width="..." />` elements;
+
+- a `<TableHead>` child element;
+
+- one or more `<TableBody>` children elements, that in turn
+ have two children: `<header>` and `<body>`, whose children
+ are `<Row>` elements;
+
+- a `<TableFoot>` child element.
+
+This specification is debatable; I have these doubts:
+
+- is it necessary to enclose the `<ColSpec>` elements in a `<colspecs>` element?
+
+- to discriminate between header and data cells in table bodies,
+ there are the `row-head-columns` attribute, and the `<header>` and `<body>` children
+ of the `<TableBody>` element, but there's only one type of cell:
+ every cell is a `<Cell>` element
+
+- the specs are a tradeoff between consistency with pandoc types and CSS compatibility;
+ this way bodies' header rows are easily stylable with CSS, while header columns are not
+
+The `ColWidthDefault` value becomes a "0" value for the attribute `col-width`;
+this way it's type-consistent with non-zero values, but I'm still doubtful whether to
+leave its value as a "ColWidthDefault" string.
+
+Here's an example from the `xml` version of `test/tables/planets.native`:
+
+```xml
+<Table>
+ <Caption>
+ <Para>Data about the planets of our solar system.</Para>
+ </Caption>
+ <colspecs>
+ <ColSpec col-width="0" alignment="AlignCenter" />
+ <ColSpec col-width="0" alignment="AlignCenter" />
+ <ColSpec col-width="0" alignment="AlignDefault" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignRight" />
+ <ColSpec col-width="0" alignment="AlignDefault" />
+ </colspecs>
+ <TableHead>
+ <Row>
+ <Cell col-span="2" row-span="1" alignment="AlignDefault" />
+ <Cell col-span="1" row-span="1" alignment="AlignDefault">
+ <Plain>Name</Plain>
+ </Cell>
+ <Cell col-span="1" row-span="1" alignment="AlignDefault">
+ <Plain>Mass (10^24kg)</Plain>
+ </Cell>
+ ...
+ </Row>
+ </TableHead>
+ <TableBody row-head-columns="3">
+ <header />
+ <body>
+ <Row>
+ <Cell col-span="2" row-span="4" alignment="AlignDefault">
+ <Plain>Terrestrial planets</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>Mercury</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>0.330</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>4,879</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>5427</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>3.7</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>4222.6</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>57.9</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>167</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>0</Plain>
+ </Cell>
+ <Cell alignment="AlignDefault">
+ <Plain>Closest to the Sun</Plain>
+ </Cell>
+ </Row>
+ ...
+ </body>
+ </TableBody>
+ <TableFoot />
+</Table>
+```
+
+### Metadata and MetaMap entries
+
+Metadata entries are meta values (`MetaBool`, `MetaString`, `MetaInlines`, `MetaBlocks`,
+`MetaList` and `MetaMap` elements) inside `<entry>` elements.
+
+The `<meta>` and the `<MetaMap>` elements have the same children elements (`<entry>`),
+which have a `key` attribute.
+
+`<MetaInlines>`, `<MetaBlocks>`, `<MetaList>` and `<MetaMap>` elements
+all have children elements.
+
+`<MetaString>` elements have only text.
+
+`<MetaBool>` elements are empty, they can be either `<MetaBool value="true" />`
+or `<MetaBool value="false" />`.
+
+This snippet is from the `xml` version of `test/testsuite.native`:
+
+```xml
+<meta>
+ <entry key="author">
+ <MetaList>
+ <MetaInlines>John MacFarlane</MetaInlines>
+ <MetaInlines>Anonymous</MetaInlines>
+ </MetaList>
+ </entry>
+ <entry key="date">
+ <MetaInlines>July 17, 2006</MetaInlines>
+ </entry>
+ <entry key="title">
+ <MetaInlines>Pandoc Test Suite</MetaInlines>
+ </entry>
+</meta>
+```
+
+### Cite elements
+
+`Cite` inlines are modeled with `<Cite>` elements, whose first child
+is a `<citations>` element, that have only `<Citation>` children elements.
+
+`<Citation>` elements are empty, unless they have a prefix and/or a suffix.
+
+Here's an example from the `xml` version of `test/markdown-citations.native`:
+
+```xml
+<Para><Cite><citations>
+ <Citation note-num="3" mode="AuthorInText" id="item1" hash="0" />
+</citations>@item1</Cite> says blah.</Para>
+<Para><Cite><citations>
+ <Citation note-num="4" mode="AuthorInText" id="item1" hash="0">
+ <suffix>p. 30</suffix>
+ </Citation>
+</citations>@item1 [p. 30]</Cite> says blah.</Para>
+<Para>A citation group <Cite><citations>
+ <Citation note-num="8" mode="NormalCitation" id="item1" hash="0">
+ <prefix>see</prefix>
+ <suffix> chap. 3</suffix>
+ </Citation>
+ <Citation note-num="8" mode="NormalCitation" id="пункт3" hash="0">
+ <prefix>also</prefix>
+ <suffix> p. 34-35</suffix>
+ </Citation>
+</citations>[see @item1 chap. 3; also @пункт3 p. 34-35]</Cite>.</Para>
+```
diff --git a/pandoc.cabal b/pandoc.cabal
index d045f49f7..610723036 100644
--- a/pandoc.cabal
+++ b/pandoc.cabal
@@ -614,6 +614,7 @@ library
Text.Pandoc.Readers.Pod,
Text.Pandoc.Writers,
Text.Pandoc.Writers.Native,
+ Text.Pandoc.Writers.XML,
Text.Pandoc.Writers.DocBook,
Text.Pandoc.Writers.JATS,
Text.Pandoc.Writers.OPML,
@@ -744,6 +745,7 @@ library
Text.Pandoc.Readers.Metadata,
Text.Pandoc.Readers.Roff,
Text.Pandoc.Readers.Roff.Escape,
+ Text.Pandoc.Readers.XML,
Text.Pandoc.Writers.Docx.OpenXML,
Text.Pandoc.Writers.Docx.StyleMap,
Text.Pandoc.Writers.Docx.Table,
@@ -769,6 +771,7 @@ library
Text.Pandoc.Char,
Text.Pandoc.TeX,
Text.Pandoc.URI,
+ Text.Pandoc.XMLFormat,
Text.Pandoc.CSS,
Text.Pandoc.CSV,
Text.Pandoc.RoffChar,
@@ -815,6 +818,7 @@ test-suite test-pandoc
Tests.Helpers
Tests.Shared
Tests.MediaBag
+ Tests.XML
Tests.Readers.LaTeX
Tests.Readers.HTML
Tests.Readers.JATS
diff --git a/src/Text/Pandoc/Readers.hs b/src/Text/Pandoc/Readers.hs
index 8f4800efb..12d1c6c95 100644
--- a/src/Text/Pandoc/Readers.hs
+++ b/src/Text/Pandoc/Readers.hs
@@ -65,6 +65,7 @@ module Text.Pandoc.Readers
, readTypst
, readDjot
, readPod
+ , readXML
-- * Miscellaneous
, getReader
, getDefaultExtensions
@@ -118,6 +119,7 @@ import Text.Pandoc.Readers.RIS
import Text.Pandoc.Readers.RTF
import Text.Pandoc.Readers.Typst
import Text.Pandoc.Readers.Djot
+import Text.Pandoc.Readers.XML
import qualified Text.Pandoc.UTF8 as UTF8
import Text.Pandoc.Sources (ToSources(..), sourcesToText)
@@ -174,6 +176,7 @@ readers = [("native" , TextReader readNative)
,("djot" , TextReader readDjot)
,("mdoc" , TextReader readMdoc)
,("pod" , TextReader readPod)
+ ,("xml" , TextReader readXML)
]
-- | Retrieve reader, extensions based on format spec (format+extensions).
diff --git a/src/Text/Pandoc/Readers/XML.hs b/src/Text/Pandoc/Readers/XML.hs
new file mode 100644
index 000000000..e3173f4c9
--- /dev/null
+++ b/src/Text/Pandoc/Readers/XML.hs
@@ -0,0 +1,540 @@
+{-# LANGUAGE OverloadedStrings #-}
+{-# LANGUAGE ScopedTypeVariables #-}
+
+-- |
+-- Module : Text.Pandoc.Readers.XML
+-- Copyright : Copyright (C) 2025- Massimiliano Farinella and John MacFarlane
+-- License : GNU GPL, version 2 or above
+--
+-- Maintainer : Massimiliano Farinella <[email protected]>
+-- Stability : WIP
+-- Portability : portable
+--
+-- Conversion of (Pandoc specific) xml to 'Pandoc' document.
+module Text.Pandoc.Readers.XML (readXML) where
+
+import Control.Monad (msum)
+import Control.Monad.Except (throwError)
+import Control.Monad.State.Strict (StateT (runStateT), modify)
+import Data.Char (isSpace)
+import Data.Default (Default (..))
+import qualified Data.List as L
+import qualified Data.Map as M
+import Data.Maybe (catMaybes, fromMaybe, mapMaybe)
+import qualified Data.Set as S (Set, fromList, member)
+import Data.Text (Text)
+import qualified Data.Text as T
+import Data.Text.Lazy (fromStrict)
+import Data.Version (Version, makeVersion)
+import Text.Pandoc.Builder
+import Text.Pandoc.Class.PandocMonad
+import Text.Pandoc.Error (PandocError (..))
+import Text.Pandoc.Options
+import Text.Pandoc.Parsing (ToSources, toSources)
+import Text.Pandoc.Sources (sourcesToText)
+import Text.Pandoc.Version (pandocVersion)
+import Text.Pandoc.XML (lookupEntity)
+import Text.Pandoc.XML.Light
+import Text.Pandoc.XMLFormat
+import Text.Read (readMaybe)
+
+-- TODO: use xmlPath state to give better context when an error occurs
+
+type XMLReader m = StateT XMLReaderState m
+
+data XMLReaderState = XMLReaderState
+ { xmlApiVersion :: Version,
+ xmlMeta :: Meta,
+ xmlContent :: [Content],
+ xmlPath :: [Text]
+ }
+ deriving (Show)
+
+instance Default XMLReaderState where
+ def =
+ XMLReaderState
+ { xmlApiVersion = pandocVersion,
+ xmlMeta = mempty,
+ xmlContent = [],
+ xmlPath = ["root"]
+ }
+
+readXML :: (PandocMonad m, ToSources a) => ReaderOptions -> a -> m Pandoc
+readXML _ inp = do
+ let sources = toSources inp
+ tree <-
+ either (throwError . PandocXMLError "") return $
+ parseXMLContents (fromStrict . sourcesToText $ sources)
+ (bs, st') <- flip runStateT (def {xmlContent = tree}) $ mapM parseBlock tree
+ let blockList = toList $ concatMany bs
+ return $ Pandoc (xmlMeta st') blockList
+
+concatMany :: [Many a] -> Many a
+concatMany = Many . mconcat . map unMany
+
+parseBlocks :: (PandocMonad m) => [Content] -> XMLReader m Blocks
+parseBlocks contents = concatMany <$> mapM parseBlock contents
+
+getBlocks :: (PandocMonad m) => Element -> XMLReader m Blocks
+getBlocks e = parseBlocks (elContent e)
+
+elementName :: Element -> Text
+elementName e = qName $ elName e
+
+attrValue :: Text -> Element -> Text
+attrValue attr =
+ fromMaybe "" . maybeAttrValue attr
+
+maybeAttrValue :: Text -> Element -> Maybe Text
+maybeAttrValue attr elt =
+ lookupAttrBy (\x -> qName x == attr) (elAttribs elt)
+
+parseBlock :: (PandocMonad m) => Content -> XMLReader m Blocks
+parseBlock (Text (CData CDataRaw _ _)) = return mempty -- DOCTYPE
+parseBlock (Text (CData _ s _)) =
+ if T.all isSpace s
+ then return mempty
+ else do
+ throwError $ PandocXMLError "" "non-space characters out of inline context"
+parseBlock (CRef x) = do
+ throwError $ PandocXMLError "" ("reference \"" <> x <> "\" out of inline context")
+parseBlock (Elem e) = do
+ let name = elementName e
+ in case (name) of
+ "Pandoc" -> parsePandoc
+ "?xml" -> return mempty
+ "blocks" -> getBlocks e
+ "meta" ->
+ let entry_els = childrenNamed tgNameMetaMapEntry e
+ in do
+ entries <- catMaybes <$> mapM parseMetaMapEntry entry_els
+ mapM_ (uncurry addMeta) entries
+ return mempty
+ "Para" -> para <$> getInlines (elContent e)
+ "Plain" -> do
+ ils <- getInlines (elContent e)
+ return $ singleton . Plain . toList $ ils
+ "Header" -> (headerWith attr level) <$> getInlines (elContent e)
+ where
+ level = textToInt (attrValue atNameLevel e) 1
+ attr = filterAttrAttributes [atNameLevel] $ attrFromElement e
+ "HorizontalRule" -> return horizontalRule
+ "BlockQuote" -> do
+ contents <- getBlocks e
+ return $ blockQuote contents
+ "Div" -> do
+ contents <- getBlocks e
+ return $ divWith (attrFromElement e) contents
+ "BulletList" -> do
+ items <- getListItems e
+ return $ bulletList items
+ "OrderedList" -> do
+ items <- getListItems e
+ return $ orderedListWith (getListAttributes e) items
+ "DefinitionList" -> do
+ let items_contents = getContentsOfElements (isElementNamed tgNameDefListItem) (elContent e)
+ items <- mapM parseDefinitionListItem items_contents
+ return $ definitionList items
+ "Figure" -> do
+ let attr = attrFromElement e
+ (maybe_caption_el, contents) = partitionFirstChildNamed "Caption" $ elContent e
+ figure_caption <- case (maybe_caption_el) of
+ Just (caption_el) -> parseCaption $ elContent caption_el
+ Nothing -> pure emptyCaption
+ blocks <- parseBlocks contents
+ return $ figureWith attr figure_caption blocks
+ "CodeBlock" -> do
+ let attr = attrFromElement e
+ return $ codeBlockWith attr $ strContentRecursive e
+ "RawBlock" -> do
+ let format = (attrValue atNameFormat e)
+ return $ rawBlock format $ strContentRecursive e
+ "LineBlock" -> do
+ lins <- mapM getInlines (contentsOfChildren tgNameLineItem (elContent e))
+ return $ lineBlock lins
+ "Table" -> do
+ -- TODO: check unexpected items
+ let attr = attrFromElement e
+ (maybe_caption_el, after_caption) = partitionFirstChildNamed "Caption" $ elContent e
+ children = elementsWithNames (S.fromList [tgNameColspecs, "TableHead", "TableBody", "TableFoot"]) after_caption
+ is_element tag el = tag == elementName el
+ colspecs <- getColspecs $ L.find (is_element tgNameColspecs) children
+ tbs <- getTableBodies $ filter (is_element "TableBody") children
+ th <- getTableHead $ L.find (is_element "TableHead") children
+ tf <- getTableFoot $ L.find (is_element "TableFoot") children
+ capt <- parseMaybeCaptionElement maybe_caption_el
+ case colspecs of
+ Nothing -> return mempty
+ Just cs -> return $ fromList [Table attr capt cs th tbs tf]
+ _ -> do
+ throwError $ PandocXMLError "" ("unexpected element \"" <> name <> "\" in blocks context")
+ where
+ parsePandoc = do
+ let version = maybeAttrValue atNameApiVersion e
+ apiversion = case (version) of
+ Just (v) -> makeVersion $ map (read . T.unpack) $ T.splitOn "," v
+ Nothing -> pandocVersion
+ in modify $ \st -> st {xmlApiVersion = apiversion}
+ getBlocks e
+
+getListItems :: (PandocMonad m) => Element -> XMLReader m [Blocks]
+getListItems e =
+ let items_els = childrenNamed tgNameListItem e
+ in do
+ mapM getBlocks items_els
+
+getContentsOfElements :: (Content -> Bool) -> [Content] -> [[Content]]
+getContentsOfElements filter_element contents = mapMaybe element_contents $ filter filter_element contents
+ where
+ element_contents :: Content -> Maybe [Content]
+ element_contents c = case (c) of
+ Elem e -> Just (elContent e)
+ _ -> Nothing
+
+strContentRecursive :: Element -> Text
+strContentRecursive =
+ strContent
+ . (\e' -> e' {elContent = map elementToStr $ elContent e'})
+
+elementToStr :: Content -> Content
+elementToStr (Elem e') = Text $ CData CDataText (strContentRecursive e') Nothing
+elementToStr x = x
+
+textToInt :: Text -> Int -> Int
+textToInt t deflt =
+ let safe_to_int :: Text -> Maybe Int
+ safe_to_int s = readMaybe $ T.unpack s
+ in case (safe_to_int t) of
+ Nothing -> deflt
+ Just (n) -> n
+
+parseInline :: (PandocMonad m) => Content -> XMLReader m Inlines
+parseInline (Text (CData _ s _)) =
+ return $ text s
+parseInline (CRef ref) =
+ return $
+ maybe (text $ T.toUpper ref) text $
+ lookupEntity ref
+parseInline (Elem e) =
+ let name = elementName e
+ in case (name) of
+ "Space" ->
+ let count = textToInt (attrValue atNameSpaceCount e) 1
+ in return $ fromList $ replicate count Space
+ "Str" -> return $ fromList [Str $ attrValue atNameStrContent e]
+ "Emph" -> innerInlines emph
+ "Strong" -> innerInlines strong
+ "Strikeout" -> innerInlines strikeout
+ "Subscript" -> innerInlines subscript
+ "Superscript" -> innerInlines superscript
+ "Underline" -> innerInlines underline
+ "SoftBreak" -> return softbreak
+ "LineBreak" -> return linebreak
+ "SmallCaps" -> innerInlines smallcaps
+ "Quoted" -> case (attrValue atNameQuoteType e) of
+ "SingleQuote" -> innerInlines singleQuoted
+ _ -> innerInlines doubleQuoted
+ "Math" -> case (attrValue atNameMathType e) of
+ "DisplayMath" -> pure $ displayMath $ strContentRecursive e
+ _ -> pure $ math $ strContentRecursive e
+ "Span" -> innerInlines $ spanWith (attrFromElement e)
+ "Code" -> do
+ let attr = attrFromElement e
+ return $ codeWith attr $ strContentRecursive e
+ "Link" -> innerInlines $ linkWith attr url title
+ where
+ url = attrValue atNameLinkUrl e
+ title = attrValue atNameTitle e
+ attr = filterAttrAttributes [atNameLinkUrl, atNameTitle] $ attrFromElement e
+ "Image" -> innerInlines $ imageWith attr url title
+ where
+ url = attrValue atNameImageUrl e
+ title = attrValue atNameTitle e
+ attr = filterAttrAttributes [atNameImageUrl, atNameTitle] $ attrFromElement e
+ "RawInline" -> do
+ let format = (attrValue atNameFormat e)
+ return $ rawInline format $ strContentRecursive e
+ "Note" -> do
+ contents <- getBlocks e
+ return $ note contents
+ "Cite" ->
+ let (maybe_citations_el, contents) = partitionFirstChildNamed tgNameCitations $ elContent e
+ in case (maybe_citations_el) of
+ Just citations_el -> do
+ citations <- parseCitations $ elContent citations_el
+ (innerInlines' contents) $ cite citations
+ Nothing -> getInlines contents
+ _ -> do
+ throwError $ PandocXMLError "" ("unexpected element \"" <> name <> "\" in inline context")
+ where
+ innerInlines' contents f =
+ f . concatMany
+ <$> mapM parseInline contents
+ innerInlines f = innerInlines' (elContent e) f
+
+getInlines :: (PandocMonad m) => [Content] -> XMLReader m Inlines
+getInlines contents = concatMany <$> mapM parseInline contents
+
+getListAttributes :: Element -> ListAttributes
+getListAttributes e = (start, style, delim)
+ where
+ start = textToInt (attrValue atNameStart e) 1
+ style = case (attrValue atNameNumberStyle e) of
+ "Example" -> Example
+ "Decimal" -> Decimal
+ "LowerRoman" -> LowerRoman
+ "UpperRoman" -> UpperRoman
+ "LowerAlpha" -> LowerAlpha
+ "UpperAlpha" -> UpperAlpha
+ _ -> DefaultStyle
+ delim = case (attrValue atNameNumberDelim e) of
+ "Period" -> Period
+ "OneParen" -> OneParen
+ "TwoParens" -> TwoParens
+ _ -> DefaultDelim
+
+contentsOfChildren :: Text -> [Content] -> [[Content]]
+contentsOfChildren tag contents = mapMaybe childrenElementWithTag contents
+ where
+ childrenElementWithTag :: Content -> Maybe [Content]
+ childrenElementWithTag c = case (c) of
+ (Elem e) -> if tag == elementName e then Just (elContent e) else Nothing
+ _ -> Nothing
+
+alignmentFromText :: Text -> Alignment
+alignmentFromText t = case t of
+ "AlignLeft" -> AlignLeft
+ "AlignRight" -> AlignRight
+ "AlignCenter" -> AlignCenter
+ _ -> AlignDefault
+
+getColWidth :: Text -> ColWidth
+getColWidth txt = case reads (T.unpack txt) of
+ [(value, "")] -> if value == 0.0 then ColWidthDefault else ColWidth value
+ _ -> ColWidthDefault
+
+getColspecs :: (PandocMonad m) => Maybe Element -> XMLReader m (Maybe [ColSpec])
+getColspecs Nothing = pure Nothing
+getColspecs (Just cs) = do
+ return $ Just $ map elementToColSpec (childrenNamed "ColSpec" cs)
+ where
+ elementToColSpec e = (alignmentFromText $ attrValue atNameAlignment e, getColWidth $ attrValue atNameColWidth e)
+
+getTableBody :: (PandocMonad m) => Element -> XMLReader m (Maybe TableBody)
+getTableBody body_el = do
+ let attr = filterAttrAttributes [atNameRowHeadColumns] $ attrFromElement body_el
+ bh = childrenNamed tgNameBodyHeader body_el
+ bb = childrenNamed tgNameBodyBody body_el
+ headcols = textToInt (attrValue atNameRowHeadColumns body_el) 0
+ hrows <- mconcat <$> mapM getRows bh
+ brows <- mconcat <$> mapM getRows bb
+ return $ Just $ TableBody attr (RowHeadColumns headcols) hrows brows
+
+getTableBodies :: (PandocMonad m) => [Element] -> XMLReader m [TableBody]
+getTableBodies body_elements = do
+ catMaybes <$> mapM getTableBody body_elements
+
+getTableHead :: (PandocMonad m) => Maybe Element -> XMLReader m TableHead
+getTableHead maybe_e = case maybe_e of
+ Just e -> do
+ let attr = attrFromElement e
+ rows <- getRows e
+ return $ TableHead attr rows
+ Nothing -> return $ TableHead nullAttr []
+
+getTableFoot :: (PandocMonad m) => Maybe Element -> XMLReader m TableFoot
+getTableFoot maybe_e = case maybe_e of
+ Just e -> do
+ let attr = attrFromElement e
+ rows <- getRows e
+ return $ TableFoot attr rows
+ Nothing -> return $ TableFoot nullAttr []
+
+getCell :: (PandocMonad m) => Element -> XMLReader m Cell
+getCell c = do
+ let alignment = alignmentFromText $ attrValue atNameAlignment c
+ rowspan = RowSpan $ textToInt (attrValue atNameRowspan c) 1
+ colspan = ColSpan $ textToInt (attrValue atNameColspan c) 1
+ attr = filterAttrAttributes [atNameAlignment, atNameRowspan, atNameColspan] $ attrFromElement c
+ blocks <- getBlocks c
+ return $ Cell attr alignment rowspan colspan (toList blocks)
+
+getRows :: (PandocMonad m) => Element -> XMLReader m [Row]
+getRows e = mapM getRow $ childrenNamed "Row" e
+ where
+ getRow r = do
+ cells <- mapM getCell (childrenNamed "Cell" r)
+ return $ Row (attrFromElement r) cells
+
+parseCitations :: (PandocMonad m) => [Content] -> XMLReader m [Citation]
+parseCitations contents = do
+ maybecitations <- mapM getCitation contents
+ return $ catMaybes maybecitations
+ where
+ getCitation :: (PandocMonad m) => Content -> XMLReader m (Maybe Citation)
+ getCitation content = case (content) of
+ (Elem e) ->
+ if qName (elName e) == "Citation"
+ then do
+ p <- inlinesOfChildrenNamed tgNameCitationPrefix e
+ s <- inlinesOfChildrenNamed tgNameCitationSuffix e
+ return $
+ Just
+ ( Citation
+ { citationId = attrValue "id" e,
+ citationPrefix = toList p,
+ citationSuffix = toList s,
+ citationMode = case (attrValue atNameCitationMode e) of
+ "AuthorInText" -> AuthorInText
+ "SuppressAuthor" -> SuppressAuthor
+ _ -> NormalCitation,
+ citationNoteNum = textToInt (attrValue atNameCitationNoteNum e) 0,
+ citationHash = textToInt (attrValue atNameCitationHash e) 0
+ }
+ )
+ else do
+ return Nothing
+ _ -> do
+ return Nothing
+ where
+ inlinesOfChildrenNamed tag e = getInlines $ concatMap (\e' -> elContent e') (childrenNamed tag e)
+
+parseMaybeCaptionElement :: (PandocMonad m) => Maybe Element -> XMLReader m Caption
+parseMaybeCaptionElement Nothing = pure emptyCaption
+parseMaybeCaptionElement (Just e) = parseCaption $ elContent e
+
+parseCaption :: (PandocMonad m) => [Content] -> XMLReader m Caption
+parseCaption contents =
+ let (maybe_shortcaption_el, caption_contents) = partitionFirstChildNamed tgNameShortCaption contents
+ in do
+ blocks <- parseBlocks caption_contents
+ case (maybe_shortcaption_el) of
+ Just shortcaption_el -> do
+ short_caption <- getInlines (elContent shortcaption_el)
+ return $ caption (Just $ toList short_caption) blocks
+ Nothing -> return $ caption Nothing blocks
+
+parseDefinitionListItem :: (PandocMonad m) => [Content] -> XMLReader m (Inlines, [Blocks])
+parseDefinitionListItem contents = do
+ let term_contents = getContentsOfElements (isElementNamed tgNameDefListTerm) contents
+ defs_elements = elementContents $ filter (isElementNamed tgNameDefListDef) contents
+ term_inlines <- getInlines (concat term_contents)
+ defs <- mapM getBlocks defs_elements
+ return (term_inlines, defs)
+
+elementContents :: [Content] -> [Element]
+elementContents contents = mapMaybe toElement contents
+ where
+ toElement :: Content -> Maybe Element
+ toElement (Elem e) = Just e
+ toElement _ = Nothing
+
+isElementNamed :: Text -> Content -> Bool
+isElementNamed t c = case (c) of
+ Elem e -> t == elementName e
+ _ -> False
+
+childrenNamed :: Text -> Element -> [Element]
+childrenNamed tag e = elementContents $ filter (isElementNamed tag) (elContent e)
+
+elementsWithNames :: S.Set Text -> [Content] -> [Element]
+elementsWithNames tags contents = mapMaybe isElementWithNameInSet contents
+ where
+ isElementWithNameInSet c = case (c) of
+ Elem el ->
+ if (elementName el) `S.member` tags
+ then Just el
+ else Nothing
+ _ -> Nothing
+
+partitionFirstChildNamed :: Text -> [Content] -> (Maybe Element, [Content])
+partitionFirstChildNamed tag contents = case (contents) of
+ (Text (CData _ s _) : rest) ->
+ if T.all isSpace s
+ then partitionFirstChildNamed tag rest
+ else (Nothing, contents)
+ (Elem e : rest) ->
+ if tag == elementName e
+ then (Just e, rest)
+ else (Nothing, contents)
+ _ -> (Nothing, contents)
+
+type PandocAttr = (Text, [Text], [(Text, Text)])
+
+filterAttributes :: S.Set Text -> [(Text, Text)] -> [(Text, Text)]
+filterAttributes to_be_removed a = filter keep_attr a
+ where
+ keep_attr (k, _) = not (k `S.member` to_be_removed)
+
+filterAttrAttributes :: [Text] -> PandocAttr -> PandocAttr
+filterAttrAttributes to_be_removed (idn, classes, a) = (idn, classes, filtered)
+ where
+ filtered = filterAttributes (S.fromList to_be_removed) a
+
+attrFromElement :: Element -> PandocAttr
+attrFromElement e = filterAttrAttributes ["id", "class"] (idn, classes, attributes)
+ where
+ idn = attrValue "id" e
+ classes = T.words $ attrValue "class" e
+ attributes = map (\a -> (qName $ attrKey a, attrVal a)) $ elAttribs e
+
+addMeta :: (PandocMonad m) => (ToMetaValue a) => Text -> a -> XMLReader m ()
+addMeta field val = modify (setMeta field val)
+
+instance HasMeta XMLReaderState where
+ setMeta field v s = s {xmlMeta = setMeta field v (xmlMeta s)}
+
+ deleteMeta field s = s {xmlMeta = deleteMeta field (xmlMeta s)}
+
+parseMetaMapEntry :: (PandocMonad m) => Element -> XMLReader m (Maybe (Text, MetaValue))
+parseMetaMapEntry e =
+ let key = attrValue atNameMetaMapEntryKey e
+ in case (key) of
+ "" -> pure Nothing
+ k -> do
+ maybe_value <- parseMetaMapEntryContents $ elContent e
+ case (maybe_value) of
+ Nothing -> return Nothing
+ Just v -> return $ Just (k, v)
+
+parseMetaMapEntryContents :: (PandocMonad m) => [Content] -> XMLReader m (Maybe MetaValue)
+parseMetaMapEntryContents cs = msum <$> mapM parseMeta cs
+
+parseMeta :: (PandocMonad m) => Content -> XMLReader m (Maybe MetaValue)
+parseMeta (Text (CData CDataRaw _ _)) = return Nothing
+parseMeta (Text (CData _ s _)) =
+ if T.all isSpace s
+ then return Nothing
+ else do
+ throwError $ PandocXMLError "" "non-space characters out of inline context in metadata"
+parseMeta (CRef x) =
+ throwError $ PandocXMLError "" ("reference \"" <> x <> "\" out of inline context")
+parseMeta (Elem e) = do
+ let name = elementName e
+ in case (name) of
+ "MetaBool" -> case (attrValue atNameMetaBoolValue e) of
+ "true" -> return $ Just $ MetaBool True
+ _ -> return $ Just $ MetaBool False
+ "MetaString" -> pure Nothing
+ "MetaInlines" -> do
+ inlines <- getInlines (elContent e)
+ return $ Just $ MetaInlines $ toList inlines
+ "MetaBlocks" -> do
+ blocks <- getBlocks e
+ return $ Just $ MetaBlocks $ toList blocks
+ "MetaList" -> do
+ maybe_items <- mapM parseMeta $ elContent e
+ let items = catMaybes maybe_items
+ in -- TODO: report empty MetaList?
+ return $ Just $ MetaList items
+ "MetaMap" ->
+ let entry_els = childrenNamed tgNameMetaMapEntry e
+ in do
+ entries <- catMaybes <$> mapM parseMetaMapEntry entry_els
+ if null entries
+ then
+ -- TODO: report empty MetaMap
+ return Nothing
+ else return $ Just $ MetaMap $ M.fromList entries
+ _ -> do
+ throwError $ PandocXMLError "" ("unexpected element \"" <> name <> "\" in metadata")
diff --git a/src/Text/Pandoc/Writers.hs b/src/Text/Pandoc/Writers.hs
index 3e1974ad6..164bf6d0f 100644
--- a/src/Text/Pandoc/Writers.hs
+++ b/src/Text/Pandoc/Writers.hs
@@ -76,6 +76,7 @@ module Text.Pandoc.Writers
, writeTexinfo
, writeTextile
, writeTypst
+ , writeXML
, writeXWiki
, writeZimWiki
, getWriter
@@ -128,6 +129,7 @@ import Text.Pandoc.Writers.TEI
import Text.Pandoc.Writers.Texinfo
import Text.Pandoc.Writers.Textile
import Text.Pandoc.Writers.Typst
+import Text.Pandoc.Writers.XML
import Text.Pandoc.Writers.XWiki
import Text.Pandoc.Writers.ZimWiki
@@ -203,6 +205,7 @@ writers = [
,("chunkedhtml" , ByteStringWriter writeChunkedHTML)
,("djot" , TextWriter writeDjot)
,("ansi" , TextWriter writeANSI)
+ ,("xml" , TextWriter writeXML)
]
-- | Retrieve writer, extensions based on formatSpec (format+extensions).
diff --git a/src/Text/Pandoc/Writers/XML.hs b/src/Text/Pandoc/Writers/XML.hs
new file mode 100644
index 000000000..35bcb07f8
--- /dev/null
+++ b/src/Text/Pandoc/Writers/XML.hs
@@ -0,0 +1,365 @@
+{-# LANGUAGE FlexibleContexts #-}
+{-# LANGUAGE OverloadedStrings #-}
+{-# LANGUAGE ScopedTypeVariables #-}
+
+-- |
+-- Module : Text.Pandoc.Writers.XML
+-- Copyright : Copyright (C) 2025- Massimiliano Farinella and John MacFarlane
+-- License : GNU GPL, version 2 or above
+--
+-- Maintainer : Massimiliano Farinella <[email protected]>
+-- Stability : WIP
+-- Portability : portable
+--
+-- Conversion of 'Pandoc' documents to (pandoc specific) xml markup.
+module Text.Pandoc.Writers.XML (writeXML) where
+
+import Data.Map (Map, toList)
+import Data.Maybe (mapMaybe)
+import qualified Data.Text as T
+import Data.Version (versionBranch)
+import Text.Pandoc.Class.PandocMonad (PandocMonad)
+import Text.Pandoc.Definition
+import Text.Pandoc.Options (WriterOptions (..))
+import Text.Pandoc.XML.Light
+import qualified Text.Pandoc.XML.Light as XML
+import Text.Pandoc.XMLFormat
+import Text.XML.Light (xml_header)
+
+type PandocAttr = Text.Pandoc.Definition.Attr
+
+writeXML :: (PandocMonad m) => WriterOptions -> Pandoc -> m T.Text
+writeXML _ doc = do
+ return $ pandocToXmlText doc
+
+text_node :: T.Text -> Content
+text_node text = Text (CData CDataText text Nothing)
+
+emptyElement :: T.Text -> Element
+emptyElement tag =
+ Element
+ { elName = unqual tag,
+ elAttribs = [],
+ elContent = [],
+ elLine = Nothing
+ }
+
+elementWithContents :: T.Text -> [Content] -> Element
+elementWithContents tag contents =
+ Element
+ { elName = unqual tag,
+ elAttribs = [],
+ elContent = contents,
+ elLine = Nothing
+ }
+
+elementWithAttributes :: T.Text -> [XML.Attr] -> Element
+elementWithAttributes tag attributes =
+ Element
+ { elName = unqual tag,
+ elAttribs = attributes,
+ elContent = [],
+ elLine = Nothing
+ }
+
+elementWithAttrAndContents :: T.Text -> PandocAttr -> [Content] -> Element
+elementWithAttrAndContents tag attr contents = addAttrAttributes attr $ elementWithContents tag contents
+
+asBlockOfInlines :: Element -> [Content]
+asBlockOfInlines el = [Elem el, text_node "\n"]
+
+asBlockOfBlocks :: Element -> [Content]
+asBlockOfBlocks el = [Elem newline_before_first, newline]
+ where
+ newline = text_node "\n"
+ newline_before_first = if null (elContent el) then el else prependContents [newline] el
+
+itemName :: (Show a) => a -> T.Text
+itemName a = T.pack $ takeWhile (/= ' ') (show a)
+
+intAsText :: Int -> T.Text
+intAsText i = T.pack $ show i
+
+itemAsEmptyElement :: (Show a) => a -> Element
+itemAsEmptyElement item = emptyElement $ itemName item
+
+pandocToXmlText :: Pandoc -> T.Text
+pandocToXmlText (Pandoc (Meta meta) blocks) = with_header . with_blocks . with_meta . with_version $ el
+ where
+ el = prependContents [text_node "\n"] $ emptyElement "Pandoc"
+ with_version = addAttribute atNameApiVersion (T.intercalate "," $ map (T.pack . show) $ versionBranch pandocTypesVersion)
+ with_meta = appendContents (metaMapToXML meta "meta")
+ with_blocks = appendContents (asBlockOfBlocks $ elementWithContents "blocks" $ blocksToXML blocks)
+ with_header :: Element -> T.Text
+ with_header e = T.concat [T.pack xml_header, "\n", showElement e]
+
+metaMapToXML :: Map T.Text MetaValue -> T.Text -> [Content]
+metaMapToXML mmap tag = asBlockOfBlocks $ elementWithContents tag entries
+ where
+ entries = concatMap to_entry $ toList mmap
+ to_entry :: (T.Text, MetaValue) -> [Content]
+ to_entry (text, metavalue) = asBlockOfBlocks with_key
+ where
+ entry = elementWithContents tgNameMetaMapEntry $ metaValueToXML metavalue
+ with_key = addAttribute atNameMetaMapEntryKey text entry
+
+metaValueToXML :: MetaValue -> [Content]
+metaValueToXML value =
+ let name = itemName value
+ el = itemAsEmptyElement value
+ in case (value) of
+ MetaBool b -> asBlockOfInlines $ addAttribute atNameMetaBoolValue bool_value el
+ where
+ bool_value = if b then "true" else "false"
+ MetaString s -> asBlockOfInlines $ appendContents [text_node s] el
+ MetaInlines inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) el
+ MetaBlocks blocks -> asBlockOfBlocks $ appendContents (blocksToXML blocks) el
+ MetaList items -> asBlockOfBlocks $ appendContents (concatMap metaValueToXML items) el
+ MetaMap mm -> metaMapToXML mm name
+
+blocksToXML :: [Block] -> [Content]
+blocksToXML blocks = concatMap blockToXML blocks
+
+inlinesToXML :: [Inline] -> [Content]
+inlinesToXML inlines = concatMap inlineContentToContents (ilsToIlsContent inlines [])
+
+data InlineContent
+ = NormalInline Inline
+ | ElSpace Int
+ | ElStr T.Text
+
+ilsToIlsContent :: [Inline] -> [InlineContent] -> [InlineContent]
+ilsToIlsContent (Space : xs) [] = ilsToIlsContent xs [ElSpace 1]
+ilsToIlsContent (Space : xs) (NormalInline Space : cs) = ilsToIlsContent xs (ElSpace 2 : cs)
+ilsToIlsContent (Space : xs) (ElSpace n : cs) = ilsToIlsContent xs (ElSpace (n + 1) : cs)
+-- empty Str are always encoded as <Str />
+ilsToIlsContent (Str "" : xs) ilct = ilsToIlsContent xs (ElStr "" : ilct)
+-- Str s1, Str s2 -> s1<Str content="s2">
+ilsToIlsContent (Str s2 : xs) (NormalInline str1@(Str _) : ilct) = ilsToIlsContent xs (ElStr s2 : NormalInline str1 : ilct)
+--
+ilsToIlsContent (Str s : xs) ilct =
+ if T.any (== ' ') s
+ then ilsToIlsContent xs (ElStr s : ilct)
+ else ilsToIlsContent xs (NormalInline (Str s) : ilct)
+ilsToIlsContent (x : xs) ilct = ilsToIlsContent xs (NormalInline x : ilct)
+ilsToIlsContent [] ilct = reverse $ lastSpaceAsElem ilct
+ where
+ lastSpaceAsElem :: [InlineContent] -> [InlineContent]
+ lastSpaceAsElem (NormalInline Space : xs) = ElSpace 1 : xs
+ lastSpaceAsElem ilcts = ilcts
+
+inlineContentToContents :: InlineContent -> [Content]
+inlineContentToContents (NormalInline il) = inlineToXML il
+inlineContentToContents (ElSpace 1) = [Elem $ emptyElement "Space"]
+inlineContentToContents (ElSpace n) = [Elem $ addAttribute atNameSpaceCount (intAsText n) (emptyElement "Space")]
+inlineContentToContents (ElStr "") = [Elem $ emptyElement "Str"]
+inlineContentToContents (ElStr s) = [Elem $ addAttribute atNameStrContent s (emptyElement "Str")]
+
+asContents :: Element -> [Content]
+asContents el = [Elem el]
+
+wrapBlocks :: T.Text -> [Block] -> [Content]
+wrapBlocks tag blocks = asBlockOfBlocks $ elementWithContents tag $ blocksToXML blocks
+
+wrapArrayOfBlocks :: T.Text -> [[Block]] -> [Content]
+wrapArrayOfBlocks tag array = concatMap (wrapBlocks tag) array
+
+-- wrapInlines :: T.Text -> [Inline] -> [Content]
+-- wrapInlines tag inlines = asBlockOfInlines $ element_with_contents tag $ inlinesToXML inlines
+
+blockToXML :: Block -> [Content]
+blockToXML block =
+ let el = itemAsEmptyElement block
+ in case (block) of
+ Para inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) el
+ Header level (idn, cls, attrs) inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) with_attr
+ where
+ with_attr = addAttrAttributes (idn, cls, attrs ++ [(atNameLevel, intAsText level)]) el
+ Plain inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) el
+ Div attr blocks -> asBlockOfBlocks $ appendContents (blocksToXML blocks) with_attr
+ where
+ with_attr = addAttrAttributes attr el
+ BulletList items -> asBlockOfBlocks $ appendContents (wrapArrayOfBlocks tgNameListItem items) el
+ OrderedList (start, style, delim) items -> asBlockOfBlocks $ with_contents . with_attrs $ el
+ where
+ with_attrs =
+ addAttributes
+ ( validAttributes
+ [ (atNameStart, intAsText start),
+ (atNameNumberStyle, itemName style),
+ (atNameNumberDelim, itemName delim)
+ ]
+ )
+ with_contents = appendContents (wrapArrayOfBlocks tgNameListItem items)
+ BlockQuote blocks -> asBlockOfBlocks $ appendContents (blocksToXML blocks) el
+ HorizontalRule -> asBlockOfInlines el
+ CodeBlock attr text -> asBlockOfInlines $ with_contents . with_attr $ el
+ where
+ with_contents = appendContents [text_node text]
+ with_attr = addAttrAttributes attr
+ LineBlock lins -> asBlockOfBlocks $ appendContents (concatMap wrapInlines lins) el
+ where
+ wrapInlines inlines = asContents $ appendContents (inlinesToXML inlines) $ emptyElement tgNameLineItem
+ Table attr caption colspecs thead tbodies tfoot -> asBlockOfBlocks $ with_foot . with_bodies . with_head . with_colspecs . with_caption . with_attr $ el
+ where
+ with_attr = addAttrAttributes attr
+ with_caption = appendContents (captionToXML caption)
+ with_colspecs = appendContents (colSpecsToXML colspecs)
+ with_head = appendContents (tableHeadToXML thead)
+ with_bodies = appendContents (concatMap tableBodyToXML tbodies)
+ with_foot = appendContents (tableFootToXML tfoot)
+ Figure attr caption blocks -> asBlockOfBlocks $ with_contents . with_caption . with_attr $ el
+ where
+ with_attr = addAttrAttributes attr
+ with_caption = appendContents (captionToXML caption)
+ with_contents = appendContents (blocksToXML blocks)
+ RawBlock (Format format) text -> asContents $ appendContents [text_node text] raw
+ where
+ raw = addAttribute atNameFormat format el
+ DefinitionList items -> asBlockOfBlocks $ appendContents (map definitionListItemToXML items) el
+
+inlineToXML :: Inline -> [Content]
+inlineToXML inline =
+ let el = itemAsEmptyElement inline
+ wrapInlines inlines = asContents $ appendContents (inlinesToXML inlines) el
+ in case (inline) of
+ Space -> [text_node " "]
+ Str s -> [text_node s]
+ Emph inlines -> wrapInlines inlines
+ Strong inlines -> wrapInlines inlines
+ Quoted quote_type inlines -> asContents $ appendContents (inlinesToXML inlines) quoted
+ where
+ quoted = addAttribute atNameQuoteType (itemName quote_type) el
+ Underline inlines -> wrapInlines inlines
+ Strikeout inlines -> wrapInlines inlines
+ SmallCaps inlines -> wrapInlines inlines
+ Superscript inlines -> wrapInlines inlines
+ Subscript inlines -> wrapInlines inlines
+ SoftBreak -> asContents el
+ LineBreak -> asContents el
+ Span attr inlines -> asContents $ appendContents (inlinesToXML inlines) with_attr
+ where
+ with_attr = addAttrAttributes attr el
+ Link (idn, cls, attrs) inlines (url, title) -> asContents $ appendContents (inlinesToXML inlines) with_attr
+ where
+ with_attr = addAttrAttributes (idn, cls, attrs ++ [(atNameLinkUrl, url), (atNameTitle, title)]) el
+ Image (idn, cls, attrs) inlines (url, title) -> asContents $ appendContents (inlinesToXML inlines) with_attr
+ where
+ with_attr = addAttrAttributes (idn, cls, attrs ++ [(atNameImageUrl, url), (atNameTitle, title)]) el
+ RawInline (Format format) text -> asContents $ appendContents [text_node text] raw
+ where
+ raw = addAttribute atNameFormat format el
+ Math math_type text -> asContents $ appendContents [text_node text] math
+ where
+ math = addAttribute atNameMathType (itemName math_type) el
+ Code attr text -> asContents $ appendContents [text_node text] with_attr
+ where
+ with_attr = addAttrAttributes attr el
+ Note blocks -> asContents $ appendContents (blocksToXML blocks) el
+ Cite citations inlines -> asContents $ appendContents (inlinesToXML inlines) with_citations
+ where
+ with_citations = addCitations citations el
+
+-- TODO: don't let an attribute overwrite id or class
+maybeAttribute :: (T.Text, T.Text) -> Maybe XML.Attr
+maybeAttribute (_, "") = Nothing
+maybeAttribute ("", _) = Nothing
+maybeAttribute (name, value) = Just $ XML.Attr (unqual name) value
+
+validAttributes :: [(T.Text, T.Text)] -> [XML.Attr]
+validAttributes pairs = mapMaybe maybeAttribute pairs
+
+appendContents :: [Content] -> Element -> Element
+appendContents newContents el = el {elContent = (elContent el) ++ newContents}
+
+prependContents :: [Content] -> Element -> Element
+prependContents newContents el = el {elContent = newContents ++ (elContent el)}
+
+addAttributes :: [XML.Attr] -> Element -> Element
+addAttributes newAttrs el = el {elAttribs = newAttrs ++ elAttribs el}
+
+addAttribute :: T.Text -> T.Text -> Element -> Element
+addAttribute attr_name attr_value el = el {elAttribs = new_attr : elAttribs el}
+ where
+ new_attr = XML.Attr (unqual attr_name) attr_value
+
+addAttrAttributes :: PandocAttr -> Element -> Element
+addAttrAttributes (identifier, classes, attributes) el = addAttributes attrs' el
+ where
+ attrs' = mapMaybe maybeAttribute (("id", identifier) : ("class", T.intercalate " " classes) : attributes)
+
+addCitations :: [Citation] -> Element -> Element
+addCitations citations el = appendContents [Elem $ elementWithContents tgNameCitations $ (text_node "\n") : concatMap citation_to_elem citations] el
+ where
+ citation_to_elem :: Citation -> [Content]
+ citation_to_elem citation = asBlockOfInlines with_suffix
+ where
+ cit_elem = elementWithAttributes (itemName citation) attrs
+ prefix = citationPrefix citation
+ suffix = citationSuffix citation
+ with_prefix =
+ if null prefix
+ then cit_elem
+ else appendContents [Elem $ elementWithContents tgNameCitationPrefix $ inlinesToXML prefix] cit_elem
+ with_suffix =
+ if null suffix
+ then with_prefix
+ else appendContents [Elem $ elementWithContents tgNameCitationSuffix $ inlinesToXML suffix] with_prefix
+ attrs =
+ map
+ (\(n, v) -> XML.Attr (unqual n) v)
+ [ ("id", citationId citation),
+ (atNameCitationMode, T.pack $ show $ citationMode citation),
+ (atNameCitationNoteNum, intAsText $ citationNoteNum citation),
+ (atNameCitationHash, intAsText $ citationHash citation)
+ ]
+
+definitionListItemToXML :: ([Inline], [[Block]]) -> Content
+definitionListItemToXML (inlines, defs) = Elem $ elementWithContents tgNameDefListItem $ term ++ wrapArrayOfBlocks tgNameDefListDef defs
+ where
+ term = asBlockOfInlines $ appendContents (inlinesToXML inlines) $ emptyElement tgNameDefListTerm
+
+captionToXML :: Caption -> [Content]
+captionToXML (Caption short blocks) = asBlockOfBlocks with_short_caption
+ where
+ el = elementWithContents "Caption" $ blocksToXML blocks
+ with_short_caption = case (short) of
+ Just inlines -> prependContents (asBlockOfInlines $ elementWithContents tgNameShortCaption $ inlinesToXML inlines) el
+ _ -> el
+
+colSpecToXML :: (Alignment, ColWidth) -> [Content]
+colSpecToXML (align, cw) = asBlockOfInlines colspec
+ where
+ colspec = elementWithAttributes "ColSpec" $ validAttributes [(atNameAlignment, itemName align), (atNameColWidth, colwidth)]
+ colwidth = case (cw) of
+ ColWidth d -> T.pack $ show d
+ ColWidthDefault -> "0"
+
+colSpecsToXML :: [(Alignment, ColWidth)] -> [Content]
+colSpecsToXML colspecs = asBlockOfBlocks $ elementWithContents tgNameColspecs $ concatMap colSpecToXML colspecs
+
+tableHeadToXML :: TableHead -> [Content]
+tableHeadToXML (TableHead attr rows) = asBlockOfBlocks $ elementWithAttrAndContents "TableHead" attr $ concatMap rowToXML rows
+
+tableBodyToXML :: TableBody -> [Content]
+tableBodyToXML (TableBody (idn, cls, attrs) (RowHeadColumns headcols) hrows brows) = asBlockOfBlocks $ elementWithAttrAndContents "TableBody" attr children
+ where
+ attr = (idn, cls, (atNameRowHeadColumns, intAsText headcols) : attrs)
+ header_rows = asBlockOfBlocks $ elementWithContents tgNameBodyHeader $ concatMap rowToXML hrows
+ body_rows = asBlockOfBlocks $ elementWithContents tgNameBodyBody $ concatMap rowToXML brows
+ children = header_rows ++ body_rows
+
+tableFootToXML :: TableFoot -> [Content]
+tableFootToXML (TableFoot attr rows) = asBlockOfBlocks $ elementWithAttrAndContents "TableFoot" attr $ concatMap rowToXML rows
+
+rowToXML :: Row -> [Content]
+rowToXML (Row attr cells) = asBlockOfBlocks $ elementWithAttrAndContents "Row" attr $ concatMap cellToXML cells
+
+cellToXML :: Cell -> [Content]
+cellToXML (Cell (idn, cls, attrs) alignment (RowSpan rowspan) (ColSpan colspan) blocks) = asBlockOfBlocks $ elementWithAttrAndContents "Cell" attr $ blocksToXML blocks
+ where
+ with_alignment a = (atNameAlignment, itemName alignment) : a
+ with_rowspan a = if rowspan > 1 then (atNameRowspan, intAsText rowspan) : a else a
+ with_colspan a = if colspan > 1 then (atNameColspan, intAsText colspan) : a else a
+ attrs' = (with_colspan . with_rowspan . with_alignment) attrs
+ attr = (idn, cls, attrs')
diff --git a/src/Text/Pandoc/XMLFormat.hs b/src/Text/Pandoc/XMLFormat.hs
new file mode 100644
index 000000000..6801de90c
--- /dev/null
+++ b/src/Text/Pandoc/XMLFormat.hs
@@ -0,0 +1,188 @@
+{-# LANGUAGE OverloadedStrings #-}
+{-# LANGUAGE ScopedTypeVariables #-}
+
+module Text.Pandoc.XMLFormat
+ ( atNameAlignment,
+ atNameApiVersion,
+ atNameCitationHash,
+ atNameCitationMode,
+ atNameCitationNoteNum,
+ atNameColspan,
+ atNameColWidth,
+ atNameFormat,
+ atNameImageUrl,
+ atNameLevel,
+ atNameLinkUrl,
+ atNameMathType,
+ atNameMetaBoolValue,
+ atNameMetaMapEntryKey,
+ atNameNumberDelim,
+ atNameNumberStyle,
+ atNameQuoteType,
+ atNameRowHeadColumns,
+ atNameRowspan,
+ atNameSpaceCount,
+ atNameStart,
+ atNameStrContent,
+ atNameTitle,
+ tgNameBodyBody,
+ tgNameBodyHeader,
+ tgNameCitations,
+ tgNameCitationPrefix,
+ tgNameCitationSuffix,
+ tgNameColspecs,
+ tgNameDefListDef,
+ tgNameDefListItem,
+ tgNameDefListTerm,
+ tgNameLineItem,
+ tgNameListItem,
+ tgNameMetaMapEntry,
+ tgNameShortCaption,
+ )
+where
+
+import Data.Text (Text)
+
+-- the attribute carrying the API version of pandoc types in the main Pandoc element
+atNameApiVersion :: Text
+atNameApiVersion = "api-version"
+
+-- the element of a <meta> or <MetaMap> entry
+tgNameMetaMapEntry :: Text
+tgNameMetaMapEntry = "entry"
+
+-- the attribute carrying the key name of a <meta> or <MetaMap> entry
+atNameMetaMapEntryKey :: Text
+atNameMetaMapEntryKey = "key"
+
+-- the attribute carrying the boolean value ("true" or "false") of a MetaBool
+atNameMetaBoolValue :: Text
+atNameMetaBoolValue = "value"
+
+-- level of a Header
+atNameLevel :: Text
+atNameLevel = "level"
+
+-- start number of an OrderedList
+atNameStart :: Text
+atNameStart = "start"
+
+-- number delimiter of an OrderedList
+atNameNumberDelim :: Text
+atNameNumberDelim = "number-delim"
+
+-- number style of an OrderedList
+atNameNumberStyle :: Text
+atNameNumberStyle = "number-style"
+
+-- target title in Image and Link
+atNameTitle :: Text
+atNameTitle = "title"
+
+-- target url in Image
+atNameImageUrl :: Text
+atNameImageUrl = "src"
+
+-- target url in Link
+atNameLinkUrl :: Text
+atNameLinkUrl = "href"
+
+-- QuoteType of a Quoted
+atNameQuoteType :: Text
+atNameQuoteType = "quote-type"
+
+-- MathType of a Math
+atNameMathType :: Text
+atNameMathType = "math-type"
+
+-- format of a RawInline or a RawBlock
+atNameFormat :: Text
+atNameFormat = "format"
+
+-- alignment attribute in a ColSpec or in a Cell
+atNameAlignment :: Text
+atNameAlignment = "alignment"
+
+-- ColWidth attribute in a ColSpec
+atNameColWidth :: Text
+atNameColWidth = "col-width"
+
+-- RowHeadColumns attribute in a TableBody
+atNameRowHeadColumns :: Text
+atNameRowHeadColumns = "row-head-columns"
+
+-- RowSpan attribute in a Cell
+atNameRowspan :: Text
+atNameRowspan = "row-span"
+
+-- ColSpan attribute in a Cell
+atNameColspan :: Text
+atNameColspan = "col-span"
+
+-- the citationMode of a Citation
+atNameCitationMode :: Text
+atNameCitationMode = "mode"
+
+-- the citationHash of a Citation
+atNameCitationHash :: Text
+atNameCitationHash = "hash"
+
+-- the citationNoteNum of a Citation
+atNameCitationNoteNum :: Text
+atNameCitationNoteNum = "note-num"
+
+-- the number of consecutive spaces of the <Space> element
+atNameSpaceCount :: Text
+atNameSpaceCount = "count"
+
+-- the content of the <Str> element
+atNameStrContent :: Text
+atNameStrContent = "content"
+
+-- container of Citation elements in Cite inlines
+tgNameCitations :: Text
+tgNameCitations = "citations"
+
+-- element around the prefix inlines of a Citation
+tgNameCitationPrefix :: Text
+tgNameCitationPrefix = "prefix"
+
+-- element around the suffix inlines of a Citation
+tgNameCitationSuffix :: Text
+tgNameCitationSuffix = "suffix"
+
+-- list item for BulletList and OrderedList
+tgNameListItem :: Text
+tgNameListItem = "item"
+
+-- list item for DefinitionList
+tgNameDefListItem :: Text
+tgNameDefListItem = "item"
+
+-- element around the inlines of the term of a DefinitionList item
+tgNameDefListTerm :: Text
+tgNameDefListTerm = "term"
+
+-- element around the blocks of a definition in a DefinitionList item
+tgNameDefListDef :: Text
+tgNameDefListDef = "def"
+
+-- optional element of the ShortCaption
+tgNameShortCaption :: Text
+tgNameShortCaption = "ShortCaption"
+
+-- element around the ColSpec of a Table
+tgNameColspecs :: Text
+tgNameColspecs = "colspecs"
+
+-- element around the header rows of a TableBody
+tgNameBodyHeader :: Text
+tgNameBodyHeader = "header"
+
+-- element around the body rows of a TableBody
+tgNameBodyBody :: Text
+tgNameBodyBody = "body"
+
+-- element around the inlines of a line in a LineBlock
+tgNameLineItem :: Text
+tgNameLineItem = "line"
diff --git a/test/Tests/XML.hs b/test/Tests/XML.hs
new file mode 100644
index 000000000..71175917c
--- /dev/null
+++ b/test/Tests/XML.hs
@@ -0,0 +1,28 @@
+{-# LANGUAGE OverloadedStrings #-}
+{- |
+-- Module : Tests.XML
+-- Copyright : Copyright (C) 2025- Massimiliano Farinella and John MacFarlane
+-- License : GNU GPL, version 2 or above
+--
+-- Maintainer : Massimiliano Farinella <[email protected]>
+-- Stability : WIP
+-- Portability : portable
+Runs a roundtrip conversion of an AST trough the XML format:
+- first from AST to XML (XML Writer),
+- then back to AST (XML Reader),
+- and checks that the two ASTs are the same
+-}
+module Tests.XML (tests) where
+
+import Control.Monad ((>=>))
+import Test.Tasty (TestTree)
+import Test.Tasty.QuickCheck
+import Tests.Helpers
+import Text.Pandoc
+import Text.Pandoc.Arbitrary ()
+
+p_xml_roundtrip :: Pandoc -> Bool
+p_xml_roundtrip d = d == purely (writeXML def {writerTemplate = Just mempty} >=> readXML def) d
+
+tests :: [TestTree]
+tests = [testProperty "p_xml_roundtrip" p_xml_roundtrip] \ No newline at end of file
diff --git a/test/test-pandoc.hs b/test/test-pandoc.hs
index 2f53c16be..6c6c2d1d4 100644
--- a/test/test-pandoc.hs
+++ b/test/test-pandoc.hs
@@ -50,6 +50,7 @@ import qualified Tests.Writers.RST
import qualified Tests.Writers.AnnotatedTable
import qualified Tests.Writers.TEI
import qualified Tests.Writers.Markua
+import qualified Tests.XML
import qualified Tests.MediaBag
import Text.Pandoc.Shared (inDirectory)
@@ -59,6 +60,7 @@ tests pandocPath = testGroup "pandoc tests"
, testGroup "Old" (Tests.Old.tests pandocPath)
, testGroup "Shared" Tests.Shared.tests
, testGroup "MediaBag" Tests.MediaBag.tests
+ , testGroup "XML" Tests.XML.tests
, testGroup "Writers"
[ testGroup "Native" Tests.Writers.Native.tests
, testGroup "ConTeXt" Tests.Writers.ConTeXt.tests
diff --git a/tools/pandoc-xml.dtd b/tools/pandoc-xml.dtd
new file mode 100644
index 000000000..e787596ed
--- /dev/null
+++ b/tools/pandoc-xml.dtd
@@ -0,0 +1,181 @@
+<!--
+A DTD for Pandoc XML format.
+Copyright : Copyright (C) 2025- Massimiliano Farinella
+License : GNU GPL, version 2 or above
+Maintainer : Massimiliano Farinella <[email protected]>
+
+This is a DTD for the XML representation of Pandoc AST.
+It's an equivalent of native and JSON formats, but modeled as XML.
+You can validate Pandoc XML documents with this DTD,
+but there are some limitations:
+- a Pandoc Attr can contain arbitrary attributes, but it looks like you can't tell
+ an element has arbitrary attributes in a DTD
+- some Pandoc attributes have constraints that can't be specified with a DTD
+The Relax-NG and XML Schema translations of this specification overcome
+some of those limitations (see pandoc-xml.rng and pandoc-xml.xsd).
+-->
+<!ELEMENT Pandoc (meta, blocks)>
+<!ATTLIST Pandoc api-version CDATA #REQUIRED>
+
+<!ENTITY % block "Para | Plain | Header | Div | BlockQuote | HorizontalRule | BulletList | OrderedList | DefinitionList | Table | Figure | LineBlock | CodeBlock | RawBlock">
+
+<!ENTITY % inline_element "Str | Space | Emph | Strong | Underline | Strikeout | Superscript | Subscript | SmallCaps | Quoted | Cite | Code | SoftBreak | LineBreak | Math | RawInline | Link | Image | Note | Span">
+<!ENTITY % inline "#PCDATA | %inline_element;">
+
+<!ENTITY % attr "id ID #IMPLIED class CDATA #IMPLIED">
+
+<!ENTITY % metavalue "MetaMap | MetaList | MetaBool | MetaString | MetaInlines | MetaBlocks">
+
+<!ELEMENT meta (entry*)>
+<!ELEMENT MetaMap (entry*)>
+
+<!ELEMENT entry (%metavalue;)*>
+<!ATTLIST entry key CDATA #REQUIRED>
+
+<!ELEMENT MetaList (%metavalue;)*>
+
+<!ELEMENT MetaBool EMPTY>
+<!ATTLIST MetaBool value (true | false) #REQUIRED>
+
+<!ELEMENT MetaString (#PCDATA)>
+
+<!ELEMENT MetaInlines (%inline;)*>
+
+<!ELEMENT MetaBlocks (%block;)*>
+
+<!ELEMENT blocks (%block;)*>
+
+<!ELEMENT Para (%inline;)*>
+
+<!ELEMENT Plain (%inline;)*>
+
+<!ELEMENT Header (%inline;)*>
+<!ATTLIST Header
+ level CDATA "1"
+ %attr;>
+
+<!ELEMENT Div (%block;)* >
+<!ATTLIST Div
+ custom-style CDATA #IMPLIED
+ %attr;>
+
+<!ELEMENT BlockQuote (%block;)*>
+
+<!ELEMENT HorizontalRule EMPTY>
+
+<!ELEMENT BulletList (item)+>
+
+<!ELEMENT OrderedList (item)+>
+<!ATTLIST OrderedList
+ start CDATA "1"
+ number-style (DefaultStyle | Example | Decimal | LowerRoman | UpperRoman | LowerAlpha | UpperAlpha) "DefaultStyle"
+ number-delim (DefaultDelim | Period | OneParen | TwoParens) "DefaultDelim">
+
+<!ELEMENT DefinitionList (item)+>
+
+<!ELEMENT item ((%block;)*|(term,def+))>
+<!ELEMENT term (%inline;)*>
+<!ELEMENT def (%block;)*>
+
+<!ELEMENT Table (Caption, colspecs, TableHead, TableBody+, TableFoot)>
+<!ATTLIST Table
+ custom-style CDATA #IMPLIED
+ %attr;>
+<!ELEMENT Caption (ShortCaption?, (%block;)*)>
+<!ELEMENT ShortCaption (%inline;)*>
+<!ELEMENT colspecs (ColSpec+)>
+<!ELEMENT ColSpec EMPTY>
+<!ATTLIST ColSpec
+ alignment (AlignLeft | AlignRight | AlignCenter | AlignDefault) "AlignDefault"
+ col-width CDATA "0">
+<!ELEMENT TableHead (Row*)>
+<!ATTLIST TableHead %attr;>
+<!ELEMENT TableFoot (Row*)>
+<!ATTLIST TableFoot %attr;>
+<!ELEMENT TableBody (header, body)>
+<!ATTLIST TableBody
+ row-head-columns CDATA "0"
+ %attr;>
+<!ELEMENT header (Row*)>
+<!ELEMENT body (Row*)>
+<!ELEMENT Row (Cell*)>
+<!ATTLIST Row %attr;>
+<!ELEMENT Cell (%block;)*>
+<!ATTLIST Cell
+ alignment (AlignLeft | AlignRight | AlignCenter | AlignDefault) "AlignDefault"
+ row-span CDATA "1"
+ col-span CDATA "1"
+ %attr; >
+
+<!ELEMENT Figure (Caption,(%block;)*)>
+<!ATTLIST Figure %attr;>
+
+<!ELEMENT LineBlock (line+)>
+<!ELEMENT line (%inline;)*>
+
+<!ELEMENT CodeBlock (#PCDATA)>
+<!ATTLIST CodeBlock %attr;>
+
+<!ELEMENT RawBlock (#PCDATA)>
+<!ATTLIST RawBlock format CDATA #REQUIRED>
+
+<!ELEMENT Space EMPTY>
+<!ATTLIST Space count CDATA "1">
+
+<!ELEMENT Str EMPTY>
+<!ATTLIST Str content CDATA "">
+
+<!ELEMENT Emph (%inline;)*>
+<!ELEMENT Strong (%inline;)*>
+<!ELEMENT Underline (%inline;)*>
+<!ELEMENT Strikeout (%inline;)*>
+<!ELEMENT Superscript (%inline;)*>
+<!ELEMENT Subscript (%inline;)*>
+<!ELEMENT SmallCaps (%inline;)*>
+
+<!ELEMENT Span (%inline;)*>
+<!ATTLIST Span
+ custom-style CDATA #IMPLIED
+ %attr;>
+
+<!ELEMENT Quoted (%inline;)*>
+<!ATTLIST Quoted quote-type (SingleQuote | DoubleQuote) "DoubleQuote">
+
+<!ELEMENT Math (#PCDATA)>
+<!ATTLIST Math math-type (DisplayMath | InlineMath) "InlineMath">
+
+<!ELEMENT RawInline (#PCDATA)>
+<!ATTLIST RawInline format CDATA #REQUIRED>
+
+<!ELEMENT Cite (#PCDATA | citations | %inline_element;)*>
+<!ELEMENT citations (Citation)+>
+<!ELEMENT Citation (prefix?, suffix?)>
+<!ELEMENT prefix (%inline;)*>
+<!ELEMENT suffix (%inline;)*>
+<!ATTLIST Citation
+ id CDATA #IMPLIED
+ note-num CDATA #IMPLIED
+ hash CDATA "0"
+ mode (AuthorInText | SuppressAuthor | NormalCitation) "AuthorInText">
+
+<!ELEMENT Code (#PCDATA)>
+<!ATTLIST Code %attr;>
+
+<!ENTITY % target "title CDATA #IMPLIED url CDATA #IMPLIED">
+
+<!ELEMENT Image (%inline;)*>
+<!ATTLIST Image
+ title CDATA #IMPLIED
+ src CDATA #IMPLIED
+ %attr;>
+
+<!ELEMENT Link (%inline;)*>
+<!ATTLIST Link
+ title CDATA #IMPLIED
+ href CDATA #IMPLIED
+ %attr;>
+
+<!ELEMENT SoftBreak EMPTY>
+<!ELEMENT LineBreak EMPTY>
+
+<!ELEMENT Note (%block;)*>
diff --git a/tools/pandoc-xml.rnc b/tools/pandoc-xml.rnc
new file mode 100644
index 000000000..59043d650
--- /dev/null
+++ b/tools/pandoc-xml.rnc
@@ -0,0 +1,250 @@
+# A RELAX NG schema for Pandoc XML format.
+# Copyright : Copyright (C) 2025- Massimiliano Farinella
+# License : GNU GPL, version 2 or above
+# Maintainer : Massimiliano Farinella <[email protected]>
+#
+# This is a RELAX NG schema for the XML representation of Pandoc AST.
+# It's an equivalent of native and JSON formats, but modeled as XML.
+# You can use this schema to validate Pandoc XML documents.
+# It's translated from pandoc-xml.dtd with the "Trang" software by James Clark,
+# and adjusted manually to add some constraints:
+# - elements with Attr can have arbitrary attributes (this is not possible with a DTD)
+# - Header's "level", OrderedList's "start" and Cell's "rowspan" and "colspan" attributes
+# must be a positive integer and are equal to 1 if not specified
+# - column widths in ColSpec must be between 0 and 1 (inclusive, with 0=ColWidthDefault)
+# - the "count" attribute in the "<Space>" element must be positive and equal to 1 if not specified
+
+namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
+
+Pandoc = element Pandoc { attlist_Pandoc, meta, blocks }
+attlist_Pandoc &= attribute api-version { text }
+block =
+ Para
+ | Plain
+ | Header
+ | Div
+ | BlockQuote
+ | HorizontalRule
+ | BulletList
+ | OrderedList
+ | DefinitionList
+ | Table
+ | Figure
+ | LineBlock
+ | CodeBlock
+ | RawBlock
+inline_element =
+ Str
+ | Space
+ | Emph
+ | Strong
+ | Underline
+ | Strikeout
+ | Superscript
+ | Subscript
+ | SmallCaps
+ | Quoted
+ | Cite
+ | Code
+ | SoftBreak
+ | LineBreak
+ | Math
+ | RawInline
+ | Link
+ | Image
+ | Note
+ | Span
+inline = text | inline_element
+attr =
+ attribute id { xsd:ID }?,
+ attribute class { text }?,
+ attribute * { text }*
+metavalue =
+ MetaMap | MetaList | MetaBool | MetaString | MetaInlines | MetaBlocks
+meta = element meta { attlist_meta, entry* }
+attlist_meta &= empty
+MetaMap = element MetaMap { attlist_MetaMap, entry* }
+attlist_MetaMap &= empty
+entry = element entry { attlist_entry, metavalue* }
+attlist_entry &= attribute key { text }
+MetaList = element MetaList { attlist_MetaList, metavalue* }
+attlist_MetaList &= empty
+MetaBool = element MetaBool { attlist_MetaBool, empty }
+attlist_MetaBool &= attribute value { "true" | "false" }
+MetaString = element MetaString { attlist_MetaString, text }
+attlist_MetaString &= empty
+MetaInlines = element MetaInlines { attlist_MetaInlines, inline* }
+attlist_MetaInlines &= empty
+MetaBlocks = element MetaBlocks { attlist_MetaBlocks, block* }
+attlist_MetaBlocks &= empty
+blocks = element blocks { attlist_blocks, block* }
+attlist_blocks &= empty
+Para = element Para { attlist_Para, inline* }
+attlist_Para &= empty
+Plain = element Plain { attlist_Plain, inline* }
+attlist_Plain &= empty
+Header = element Header { attlist_Header, inline* }
+attlist_Header &=
+ [ a:defaultValue = "1" ] attribute level { xsd:positiveInteger }?,
+ attr
+Div = element Div { attlist_Div, block* }
+attlist_Div &= attr
+BlockQuote = element BlockQuote { attlist_BlockQuote, block* }
+attlist_BlockQuote &= empty
+HorizontalRule =
+ element HorizontalRule { attlist_HorizontalRule, empty }
+attlist_HorizontalRule &= empty
+BulletList = element BulletList { attlist_BulletList, item+ }
+attlist_BulletList &= empty
+OrderedList = element OrderedList { attlist_OrderedList, item+ }
+attlist_OrderedList &=
+ [ a:defaultValue = "1" ] attribute start { xsd:positiveInteger }?,
+ [ a:defaultValue = "DefaultStyle" ]
+ attribute number-style {
+ "DefaultStyle"
+ | "Example"
+ | "Decimal"
+ | "LowerRoman"
+ | "UpperRoman"
+ | "LowerAlpha"
+ | "UpperAlpha"
+ }?,
+ [ a:defaultValue = "DefaultDelim" ]
+ attribute number-delim {
+ "DefaultDelim" | "Period" | "OneParen" | "TwoParens"
+ }?
+DefinitionList =
+ element DefinitionList { attlist_DefinitionList, item+ }
+attlist_DefinitionList &= empty
+item =
+ element item {
+ attlist_item,
+ (block* | (term, def+))
+ }
+attlist_item &= empty
+term = element term { attlist_term, inline* }
+attlist_term &= empty
+def = element def { attlist_def, block* }
+attlist_def &= empty
+Table =
+ element Table {
+ attlist_Table, Caption, colspecs, TableHead, TableBody+, TableFoot
+ }
+attlist_Table &= attr
+Caption = element Caption { attlist_Caption, ShortCaption?, block* }
+attlist_Caption &= empty
+ShortCaption = element ShortCaption { attlist_ShortCaption, inline* }
+attlist_ShortCaption &= empty
+colspecs = element colspecs { attlist_colspecs, ColSpec+ }
+attlist_colspecs &= empty
+ColSpec = element ColSpec { attlist_ColSpec, empty }
+attlist_ColSpec &=
+ [ a:defaultValue = "AlignDefault" ]
+ attribute alignment {
+ "AlignLeft" | "AlignRight" | "AlignCenter" | "AlignDefault"
+ }?,
+ [ a:defaultValue = "0" ]
+ attribute col-width {
+ xsd:double { minInclusive = "0" maxInclusive = "1" }
+ }?
+TableHead = element TableHead { attlist_TableHead, Row* }
+attlist_TableHead &= attr
+TableFoot = element TableFoot { attlist_TableFoot, Row* }
+attlist_TableFoot &= attr
+TableBody = element TableBody { attlist_TableBody, header, body }
+attlist_TableBody &=
+ [ a:defaultValue = "0" ] attribute row-head-columns { text }?,
+ attr
+header = element header { attlist_header, Row* }
+attlist_header &= empty
+body = element body { attlist_body, Row* }
+attlist_body &= empty
+Row = element Row { attlist_Row, Cell* }
+attlist_Row &= attr
+Cell = element Cell { attlist_Cell, block* }
+attlist_Cell &=
+ [ a:defaultValue = "AlignDefault" ]
+ attribute alignment {
+ "AlignLeft" | "AlignRight" | "AlignCenter" | "AlignDefault"
+ }?,
+ [ a:defaultValue = "1" ] attribute row-span { xsd:positiveInteger }?,
+ [ a:defaultValue = "1" ] attribute col-span { xsd:positiveInteger }?,
+ attr
+Figure = element Figure { attlist_Figure, Caption, block* }
+attlist_Figure &= attr
+LineBlock = element LineBlock { attlist_LineBlock, line+ }
+attlist_LineBlock &= empty
+line = element line { attlist_line, inline* }
+attlist_line &= empty
+CodeBlock = element CodeBlock { attlist_CodeBlock, text }
+attlist_CodeBlock &= attr
+RawBlock = element RawBlock { attlist_RawBlock, text }
+attlist_RawBlock &= attribute format { text }
+Space = element Space { attlist_Space, empty }
+attlist_Space &=
+ [ a:defaultValue = "1" ] attribute count { xsd:positiveInteger }?
+Str = element Str { attlist_Str, empty }
+attlist_Str &= [ a:defaultValue = "" ] attribute content { text }?
+Emph = element Emph { attlist_Emph, inline* }
+attlist_Emph &= empty
+Strong = element Strong { attlist_Strong, inline* }
+attlist_Strong &= empty
+Underline = element Underline { attlist_Underline, inline* }
+attlist_Underline &= empty
+Strikeout = element Strikeout { attlist_Strikeout, inline* }
+attlist_Strikeout &= empty
+Superscript = element Superscript { attlist_Superscript, inline* }
+attlist_Superscript &= empty
+Subscript = element Subscript { attlist_Subscript, inline* }
+attlist_Subscript &= empty
+SmallCaps = element SmallCaps { attlist_SmallCaps, inline* }
+attlist_SmallCaps &= empty
+Span = element Span { attlist_Span, inline* }
+attlist_Span &= attr
+Quoted = element Quoted { attlist_Quoted, inline* }
+attlist_Quoted &=
+ [ a:defaultValue = "DoubleQuote" ]
+ attribute quote-type { "SingleQuote" | "DoubleQuote" }?
+Math = element Math { attlist_Math, text }
+attlist_Math &=
+ [ a:defaultValue = "InlineMath" ]
+ attribute math-type { "DisplayMath" | "InlineMath" }?
+RawInline = element RawInline { attlist_RawInline, text }
+attlist_RawInline &= attribute format { text }
+Cite =
+ element Cite { attlist_Cite, (text | citations | inline_element)* }
+attlist_Cite &= empty
+citations = element citations { attlist_citations, Citation+ }
+attlist_citations &= empty
+Citation = element Citation { attlist_Citation, prefix?, suffix? }
+prefix = element prefix { attlist_prefix, inline* }
+attlist_prefix &= empty
+suffix = element suffix { attlist_suffix, inline* }
+attlist_suffix &= empty
+attlist_Citation &=
+ attribute id { text }?,
+ attribute note-num { text }?,
+ [ a:defaultValue = "0" ] attribute hash { text }?,
+ [ a:defaultValue = "AuthorInText" ]
+ attribute mode {
+ "AuthorInText" | "SuppressAuthor" | "NormalCitation"
+ }?
+Code = element Code { attlist_Code, text }
+attlist_Code &= attr
+Image = element Image { attlist_Image, inline* }
+attlist_Image &=
+ attribute title { text }?,
+ attribute src { text }?,
+ attr
+Link = element Link { attlist_Link, inline* }
+attlist_Link &=
+ attribute title { text }?,
+ attribute href { text }?,
+ attr
+SoftBreak = element SoftBreak { attlist_SoftBreak, empty }
+attlist_SoftBreak &= empty
+LineBreak = element LineBreak { attlist_LineBreak, empty }
+attlist_LineBreak &= empty
+Note = element Note { attlist_Note, block* }
+attlist_Note &= empty
+start = Pandoc
diff --git a/tools/pandoc-xml.rng b/tools/pandoc-xml.rng
new file mode 100644
index 000000000..38188f781
--- /dev/null
+++ b/tools/pandoc-xml.rng
@@ -0,0 +1,913 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+A RELAX NG schema for Pandoc XML format.
+Copyright : Copyright (C) 2025- Massimiliano Farinella
+License : GNU GPL, version 2 or above
+Maintainer : Massimiliano Farinella <[email protected]>
+
+This is a RELAX NG schema for the XML representation of Pandoc AST.
+It's an equivalent of native and JSON formats, but modeled as XML.
+You can use this schema to validate Pandoc XML documents.
+It's translated from pandoc-xml.dtd with the "Trang" software by James Clark,
+and adjusted manually to add some constraints:
+- elements with Attr can have arbitrary attributes (this is not possible with a DTD)
+- Header's "level", OrderedList's "start" and Cell's "rowspan" and "colspan" attributes
+ must be a positive integer and are equal to 1 if not specified
+- column widths in ColSpec must be between 0 and 1 (inclusive, with 0=ColWidthDefault)
+- the "count" attribute in the "<Space>" element must be positive and equal to 1 if not specified
+-->
+<grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
+ xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
+ <define name="Pandoc">
+ <element name="Pandoc">
+ <ref name="attlist_Pandoc" />
+ <ref name="meta" />
+ <ref name="blocks" />
+ </element>
+ </define>
+ <define name="attlist_Pandoc" combine="interleave">
+ <attribute name="api-version" />
+ </define>
+ <define name="block">
+ <choice>
+ <ref name="Para" />
+ <ref name="Plain" />
+ <ref name="Header" />
+ <ref name="Div" />
+ <ref name="BlockQuote" />
+ <ref name="HorizontalRule" />
+ <ref name="BulletList" />
+ <ref name="OrderedList" />
+ <ref name="DefinitionList" />
+ <ref name="Table" />
+ <ref name="Figure" />
+ <ref name="LineBlock" />
+ <ref name="CodeBlock" />
+ <ref name="RawBlock" />
+ </choice>
+ </define>
+ <define name="inline_element">
+ <choice>
+ <ref name="Str" />
+ <ref name="Space" />
+ <ref name="Emph" />
+ <ref name="Strong" />
+ <ref name="Underline" />
+ <ref name="Strikeout" />
+ <ref name="Superscript" />
+ <ref name="Subscript" />
+ <ref name="SmallCaps" />
+ <ref name="Quoted" />
+ <ref name="Cite" />
+ <ref name="Code" />
+ <ref name="SoftBreak" />
+ <ref name="LineBreak" />
+ <ref name="Math" />
+ <ref name="RawInline" />
+ <ref name="Link" />
+ <ref name="Image" />
+ <ref name="Note" />
+ <ref name="Span" />
+ </choice>
+ </define>
+ <define name="inline">
+ <choice>
+ <text />
+ <ref name="inline_element" />
+ </choice>
+ </define>
+ <define name="attr">
+ <optional>
+ <attribute name="id">
+ <data type="ID" />
+ </attribute>
+ </optional>
+ <optional>
+ <attribute name="class" />
+ </optional>
+ <zeroOrMore>
+ <attribute>
+ <anyName />
+ </attribute>
+ </zeroOrMore>
+ </define>
+ <define name="metavalue">
+ <choice>
+ <ref name="MetaMap" />
+ <ref name="MetaList" />
+ <ref name="MetaBool" />
+ <ref name="MetaString" />
+ <ref name="MetaInlines" />
+ <ref name="MetaBlocks" />
+ </choice>
+ </define>
+ <define name="meta">
+ <element name="meta">
+ <ref name="attlist_meta" />
+ <zeroOrMore>
+ <ref name="entry" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_meta" combine="interleave">
+ <empty />
+ </define>
+ <define name="MetaMap">
+ <element name="MetaMap">
+ <ref name="attlist_MetaMap" />
+ <zeroOrMore>
+ <ref name="entry" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_MetaMap" combine="interleave">
+ <empty />
+ </define>
+ <define name="entry">
+ <element name="entry">
+ <ref name="attlist_entry" />
+ <zeroOrMore>
+ <ref name="metavalue" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_entry" combine="interleave">
+ <attribute name="key" />
+ </define>
+ <define name="MetaList">
+ <element name="MetaList">
+ <ref name="attlist_MetaList" />
+ <zeroOrMore>
+ <ref name="metavalue" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_MetaList" combine="interleave">
+ <empty />
+ </define>
+ <define name="MetaBool">
+ <element name="MetaBool">
+ <ref name="attlist_MetaBool" />
+ <empty />
+ </element>
+ </define>
+ <define name="attlist_MetaBool" combine="interleave">
+ <attribute name="value">
+ <choice>
+ <value>true</value>
+ <value>false</value>
+ </choice>
+ </attribute>
+ </define>
+ <define name="MetaString">
+ <element name="MetaString">
+ <ref name="attlist_MetaString" />
+ <text />
+ </element>
+ </define>
+ <define name="attlist_MetaString" combine="interleave">
+ <empty />
+ </define>
+ <define name="MetaInlines">
+ <element name="MetaInlines">
+ <ref name="attlist_MetaInlines" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_MetaInlines" combine="interleave">
+ <empty />
+ </define>
+ <define name="MetaBlocks">
+ <element name="MetaBlocks">
+ <ref name="attlist_MetaBlocks" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_MetaBlocks" combine="interleave">
+ <empty />
+ </define>
+ <define name="blocks">
+ <element name="blocks">
+ <ref name="attlist_blocks" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_blocks" combine="interleave">
+ <empty />
+ </define>
+ <define name="Para">
+ <element name="Para">
+ <ref name="attlist_Para" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Para" combine="interleave">
+ <empty />
+ </define>
+ <define name="Plain">
+ <element name="Plain">
+ <ref name="attlist_Plain" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Plain" combine="interleave">
+ <empty />
+ </define>
+ <define name="Header">
+ <element name="Header">
+ <ref name="attlist_Header" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Header" combine="interleave">
+ <optional>
+ <attribute name="level" a:defaultValue="1">
+ <data type="positiveInteger" />
+ </attribute>
+ </optional>
+ <ref name="attr" />
+ </define>
+ <define name="Div">
+ <element name="Div">
+ <ref name="attlist_Div" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Div" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="BlockQuote">
+ <element name="BlockQuote">
+ <ref name="attlist_BlockQuote" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_BlockQuote" combine="interleave">
+ <empty />
+ </define>
+ <define name="HorizontalRule">
+ <element name="HorizontalRule">
+ <ref name="attlist_HorizontalRule" />
+ <empty />
+ </element>
+ </define>
+ <define name="attlist_HorizontalRule" combine="interleave">
+ <empty />
+ </define>
+ <define name="BulletList">
+ <element name="BulletList">
+ <ref name="attlist_BulletList" />
+ <oneOrMore>
+ <ref name="item" />
+ </oneOrMore>
+ </element>
+ </define>
+ <define name="attlist_BulletList" combine="interleave">
+ <empty />
+ </define>
+ <define name="OrderedList">
+ <element name="OrderedList">
+ <ref name="attlist_OrderedList" />
+ <oneOrMore>
+ <ref name="item" />
+ </oneOrMore>
+ </element>
+ </define>
+ <define name="attlist_OrderedList" combine="interleave">
+ <optional>
+ <attribute name="start" a:defaultValue="1">
+ <data type="positiveInteger" />
+ </attribute>
+ </optional>
+ <optional>
+ <attribute name="number-style" a:defaultValue="DefaultStyle">
+ <choice>
+ <value>DefaultStyle</value>
+ <value>Example</value>
+ <value>Decimal</value>
+ <value>LowerRoman</value>
+ <value>UpperRoman</value>
+ <value>LowerAlpha</value>
+ <value>UpperAlpha</value>
+ </choice>
+ </attribute>
+ </optional>
+ <optional>
+ <attribute name="number-delim" a:defaultValue="DefaultDelim">
+ <choice>
+ <value>DefaultDelim</value>
+ <value>Period</value>
+ <value>OneParen</value>
+ <value>TwoParens</value>
+ </choice>
+ </attribute>
+ </optional>
+ </define>
+ <define name="DefinitionList">
+ <element name="DefinitionList">
+ <ref name="attlist_DefinitionList" />
+ <oneOrMore>
+ <ref name="item" />
+ </oneOrMore>
+ </element>
+ </define>
+ <define name="attlist_DefinitionList" combine="interleave">
+ <empty />
+ </define>
+ <define name="item">
+ <element name="item">
+ <ref name="attlist_item" />
+ <choice>
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ <group>
+ <ref name="term" />
+ <oneOrMore>
+ <ref name="def" />
+ </oneOrMore>
+ </group>
+ </choice>
+ </element>
+ </define>
+ <define name="attlist_item" combine="interleave">
+ <empty />
+ </define>
+ <define name="term">
+ <element name="term">
+ <ref name="attlist_term" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_term" combine="interleave">
+ <empty />
+ </define>
+ <define name="def">
+ <element name="def">
+ <ref name="attlist_def" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_def" combine="interleave">
+ <empty />
+ </define>
+ <define name="Table">
+ <element name="Table">
+ <ref name="attlist_Table" />
+ <ref name="Caption" />
+ <ref name="colspecs" />
+ <ref name="TableHead" />
+ <oneOrMore>
+ <ref name="TableBody" />
+ </oneOrMore>
+ <ref name="TableFoot" />
+ </element>
+ </define>
+ <define name="attlist_Table" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="Caption">
+ <element name="Caption">
+ <ref name="attlist_Caption" />
+ <optional>
+ <ref name="ShortCaption" />
+ </optional>
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Caption" combine="interleave">
+ <empty />
+ </define>
+ <define name="ShortCaption">
+ <element name="ShortCaption">
+ <ref name="attlist_ShortCaption" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_ShortCaption" combine="interleave">
+ <empty />
+ </define>
+ <define name="colspecs">
+ <element name="colspecs">
+ <ref name="attlist_colspecs" />
+ <oneOrMore>
+ <ref name="ColSpec" />
+ </oneOrMore>
+ </element>
+ </define>
+ <define name="attlist_colspecs" combine="interleave">
+ <empty />
+ </define>
+ <define name="ColSpec">
+ <element name="ColSpec">
+ <ref name="attlist_ColSpec" />
+ <empty />
+ </element>
+ </define>
+ <define name="attlist_ColSpec" combine="interleave">
+ <optional>
+ <attribute name="alignment" a:defaultValue="AlignDefault">
+ <choice>
+ <value>AlignLeft</value>
+ <value>AlignRight</value>
+ <value>AlignCenter</value>
+ <value>AlignDefault</value>
+ </choice>
+ </attribute>
+ </optional>
+ <optional>
+ <attribute name="col-width" a:defaultValue="0">
+ <data type="double">
+ <param name="minInclusive">0</param>
+ <param name="maxInclusive">1</param>
+ </data>
+ </attribute>
+ </optional>
+ </define>
+ <define name="TableHead">
+ <element name="TableHead">
+ <ref name="attlist_TableHead" />
+ <zeroOrMore>
+ <ref name="Row" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_TableHead" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="TableFoot">
+ <element name="TableFoot">
+ <ref name="attlist_TableFoot" />
+ <zeroOrMore>
+ <ref name="Row" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_TableFoot" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="TableBody">
+ <element name="TableBody">
+ <ref name="attlist_TableBody" />
+ <ref name="header" />
+ <ref name="body" />
+ </element>
+ </define>
+ <define name="attlist_TableBody" combine="interleave">
+ <optional>
+ <attribute name="row-head-columns" a:defaultValue="0" />
+ </optional>
+ <ref name="attr" />
+ </define>
+ <define name="header">
+ <element name="header">
+ <ref name="attlist_header" />
+ <zeroOrMore>
+ <ref name="Row" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_header" combine="interleave">
+ <empty />
+ </define>
+ <define name="body">
+ <element name="body">
+ <ref name="attlist_body" />
+ <zeroOrMore>
+ <ref name="Row" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_body" combine="interleave">
+ <empty />
+ </define>
+ <define name="Row">
+ <element name="Row">
+ <ref name="attlist_Row" />
+ <zeroOrMore>
+ <ref name="Cell" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Row" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="Cell">
+ <element name="Cell">
+ <ref name="attlist_Cell" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Cell" combine="interleave">
+ <optional>
+ <attribute name="alignment" a:defaultValue="AlignDefault">
+ <choice>
+ <value>AlignLeft</value>
+ <value>AlignRight</value>
+ <value>AlignCenter</value>
+ <value>AlignDefault</value>
+ </choice>
+ </attribute>
+ </optional>
+ <optional>
+ <attribute name="row-span" a:defaultValue="1">
+ <data type="positiveInteger" />
+ </attribute>
+ </optional>
+ <optional>
+ <attribute name="col-span" a:defaultValue="1">
+ <data type="positiveInteger" />
+ </attribute>
+ </optional>
+ <ref name="attr" />
+ </define>
+ <define name="Figure">
+ <element name="Figure">
+ <ref name="attlist_Figure" />
+ <ref name="Caption" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Figure" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="LineBlock">
+ <element name="LineBlock">
+ <ref name="attlist_LineBlock" />
+ <oneOrMore>
+ <ref name="line" />
+ </oneOrMore>
+ </element>
+ </define>
+ <define name="attlist_LineBlock" combine="interleave">
+ <empty />
+ </define>
+ <define name="line">
+ <element name="line">
+ <ref name="attlist_line" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_line" combine="interleave">
+ <empty />
+ </define>
+ <define name="CodeBlock">
+ <element name="CodeBlock">
+ <ref name="attlist_CodeBlock" />
+ <text />
+ </element>
+ </define>
+ <define name="attlist_CodeBlock" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="RawBlock">
+ <element name="RawBlock">
+ <ref name="attlist_RawBlock" />
+ <text />
+ </element>
+ </define>
+ <define name="attlist_RawBlock" combine="interleave">
+ <attribute name="format" />
+ </define>
+ <define name="Space">
+ <element name="Space">
+ <ref name="attlist_Space" />
+ <empty />
+ </element>
+ </define>
+ <define name="attlist_Space" combine="interleave">
+ <optional>
+ <attribute name="count" a:defaultValue="1">
+ <data type="positiveInteger" />
+ </attribute>
+ </optional>
+ </define>
+ <define name="Str">
+ <element name="Str">
+ <ref name="attlist_Str" />
+ <empty />
+ </element>
+ </define>
+ <define name="attlist_Str" combine="interleave">
+ <optional>
+ <attribute name="content" a:defaultValue="" />
+ </optional>
+ </define>
+ <define name="Emph">
+ <element name="Emph">
+ <ref name="attlist_Emph" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Emph" combine="interleave">
+ <empty />
+ </define>
+ <define name="Strong">
+ <element name="Strong">
+ <ref name="attlist_Strong" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Strong" combine="interleave">
+ <empty />
+ </define>
+ <define name="Underline">
+ <element name="Underline">
+ <ref name="attlist_Underline" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Underline" combine="interleave">
+ <empty />
+ </define>
+ <define name="Strikeout">
+ <element name="Strikeout">
+ <ref name="attlist_Strikeout" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Strikeout" combine="interleave">
+ <empty />
+ </define>
+ <define name="Superscript">
+ <element name="Superscript">
+ <ref name="attlist_Superscript" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Superscript" combine="interleave">
+ <empty />
+ </define>
+ <define name="Subscript">
+ <element name="Subscript">
+ <ref name="attlist_Subscript" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Subscript" combine="interleave">
+ <empty />
+ </define>
+ <define name="SmallCaps">
+ <element name="SmallCaps">
+ <ref name="attlist_SmallCaps" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_SmallCaps" combine="interleave">
+ <empty />
+ </define>
+ <define name="Span">
+ <element name="Span">
+ <ref name="attlist_Span" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Span" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="Quoted">
+ <element name="Quoted">
+ <ref name="attlist_Quoted" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Quoted" combine="interleave">
+ <optional>
+ <attribute name="quote-type" a:defaultValue="DoubleQuote">
+ <choice>
+ <value>SingleQuote</value>
+ <value>DoubleQuote</value>
+ </choice>
+ </attribute>
+ </optional>
+ </define>
+ <define name="Math">
+ <element name="Math">
+ <ref name="attlist_Math" />
+ <text />
+ </element>
+ </define>
+ <define name="attlist_Math" combine="interleave">
+ <optional>
+ <attribute name="math-type" a:defaultValue="InlineMath">
+ <choice>
+ <value>DisplayMath</value>
+ <value>InlineMath</value>
+ </choice>
+ </attribute>
+ </optional>
+ </define>
+ <define name="RawInline">
+ <element name="RawInline">
+ <ref name="attlist_RawInline" />
+ <text />
+ </element>
+ </define>
+ <define name="attlist_RawInline" combine="interleave">
+ <attribute name="format" />
+ </define>
+ <define name="Cite">
+ <element name="Cite">
+ <ref name="attlist_Cite" />
+ <zeroOrMore>
+ <choice>
+ <text />
+ <ref name="citations" />
+ <ref name="inline_element" />
+ </choice>
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Cite" combine="interleave">
+ <empty />
+ </define>
+ <define name="citations">
+ <element name="citations">
+ <ref name="attlist_citations" />
+ <oneOrMore>
+ <ref name="Citation" />
+ </oneOrMore>
+ </element>
+ </define>
+ <define name="attlist_citations" combine="interleave">
+ <empty />
+ </define>
+ <define name="Citation">
+ <element name="Citation">
+ <ref name="attlist_Citation" />
+ <optional>
+ <ref name="prefix" />
+ </optional>
+ <optional>
+ <ref name="suffix" />
+ </optional>
+ </element>
+ </define>
+ <define name="prefix">
+ <element name="prefix">
+ <ref name="attlist_prefix" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_prefix" combine="interleave">
+ <empty />
+ </define>
+ <define name="suffix">
+ <element name="suffix">
+ <ref name="attlist_suffix" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_suffix" combine="interleave">
+ <empty />
+ </define>
+ <define name="attlist_Citation" combine="interleave">
+ <optional>
+ <attribute name="id" />
+ </optional>
+ <optional>
+ <attribute name="note-num" />
+ </optional>
+ <optional>
+ <attribute name="hash" a:defaultValue="0" />
+ </optional>
+ <optional>
+ <attribute name="mode" a:defaultValue="AuthorInText">
+ <choice>
+ <value>AuthorInText</value>
+ <value>SuppressAuthor</value>
+ <value>NormalCitation</value>
+ </choice>
+ </attribute>
+ </optional>
+ </define>
+ <define name="Code">
+ <element name="Code">
+ <ref name="attlist_Code" />
+ <text />
+ </element>
+ </define>
+ <define name="attlist_Code" combine="interleave">
+ <ref name="attr" />
+ </define>
+ <define name="Image">
+ <element name="Image">
+ <ref name="attlist_Image" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Image" combine="interleave">
+ <optional>
+ <attribute name="title"/>
+ </optional>
+ <optional>
+ <attribute name="src"/>
+ </optional>
+ <ref name="attr"/>
+ </define>
+ <define name="Link">
+ <element name="Link">
+ <ref name="attlist_Link" />
+ <zeroOrMore>
+ <ref name="inline" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Link" combine="interleave">
+ <optional>
+ <attribute name="title"/>
+ </optional>
+ <optional>
+ <attribute name="href"/>
+ </optional>
+ <ref name="attr"/>
+ </define>
+ <define name="SoftBreak">
+ <element name="SoftBreak">
+ <ref name="attlist_SoftBreak" />
+ <empty />
+ </element>
+ </define>
+ <define name="attlist_SoftBreak" combine="interleave">
+ <empty />
+ </define>
+ <define name="LineBreak">
+ <element name="LineBreak">
+ <ref name="attlist_LineBreak" />
+ <empty />
+ </element>
+ </define>
+ <define name="attlist_LineBreak" combine="interleave">
+ <empty />
+ </define>
+ <define name="Note">
+ <element name="Note">
+ <ref name="attlist_Note" />
+ <zeroOrMore>
+ <ref name="block" />
+ </zeroOrMore>
+ </element>
+ </define>
+ <define name="attlist_Note" combine="interleave">
+ <empty />
+ </define>
+ <start>
+ <choice>
+ <ref name="Pandoc" />
+ </choice>
+ </start>
+</grammar>
diff --git a/tools/pandoc-xml.xsd b/tools/pandoc-xml.xsd
new file mode 100644
index 000000000..01834f80f
--- /dev/null
+++ b/tools/pandoc-xml.xsd
@@ -0,0 +1,602 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+A XML Schema definition for Pandoc XML format.
+Copyright : Copyright (C) 2025- Massimiliano Farinella
+License : GNU GPL, version 2 or above
+Maintainer : Massimiliano Farinella <[email protected]>
+
+This is a XML Schema schema for the XML representation of Pandoc AST.
+It's an equivalent of native and JSON formats, but modeled as XML.
+You can use this schema to validate Pandoc XML documents.
+It's translated from pandoc-xml.dtd with the "Trang" software by James Clark,
+and adjusted manually to add some constraints:
+- elements with Attr can have arbitrary attributes (this is not possible with a DTD)
+- Header's "level", OrderedList's "start" and Cell's "rowspan" and "colspan" attributes
+ must be a positive integer and are equal to 1 if not specified
+- column widths in ColSpec must be between 0 and 1 (inclusive, with 0=ColWidthDefault)
+- the "count" attribute in the "<Space>" element must be a positive integer,
+ equal to 1 if not specified
+- OrderedList's number style and delimiter, ColSpec's and Cell's alignment,
+ Quoted's QuoteType and Math's Mathtype
+ can only take the values specified in pandoc-types
+- TableBody's row-head-columns must be zero (default, when not specified) or a positive integer
+-->
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
+ <xs:element name="Pandoc">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="meta"/>
+ <xs:element ref="blocks"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_Pandoc"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Pandoc">
+ <xs:attribute name="api-version" use="required"/>
+ </xs:attributeGroup>
+ <xs:element name="block" abstract="true"/>
+ <xs:element name="inline_element" abstract="true"/>
+ <xs:group name="inline">
+ <xs:sequence>
+ <xs:element minOccurs="0" ref="inline_element"/>
+ </xs:sequence>
+ </xs:group>
+ <xs:attributeGroup name="attr">
+ <xs:attribute name="id" type="xs:ID"/>
+ <xs:attribute name="class"/>
+ <xs:anyAttribute processContents="skip"/>
+ </xs:attributeGroup>
+ <xs:element name="metavalue" abstract="true"/>
+ <xs:element name="meta">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="entry"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="MetaMap" substitutionGroup="metavalue">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="entry"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="entry">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="metavalue"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_entry"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_entry">
+ <xs:attribute name="key" use="required"/>
+ </xs:attributeGroup>
+ <xs:element name="MetaList" substitutionGroup="metavalue">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="metavalue"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="MetaBool" substitutionGroup="metavalue">
+ <xs:complexType>
+ <xs:attributeGroup ref="attlist_MetaBool"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_MetaBool">
+ <xs:attribute name="value" use="required">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="true"/>
+ <xs:enumeration value="false"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:attributeGroup>
+ <xs:element name="MetaString" substitutionGroup="metavalue" type="xs:string"/>
+ <xs:element name="MetaInlines" substitutionGroup="metavalue">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="MetaBlocks" substitutionGroup="metavalue">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="blocks">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Para" substitutionGroup="block">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Plain" substitutionGroup="block">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Header" substitutionGroup="block">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ <xs:attributeGroup ref="attlist_Header"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Header">
+ <xs:attribute name="level" default="1" type="xs:positiveInteger"/>
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="Div" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_Div"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Div">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="BlockQuote" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="HorizontalRule" substitutionGroup="block">
+ <xs:complexType/>
+ </xs:element>
+ <xs:element name="BulletList" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element maxOccurs="unbounded" ref="item"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="OrderedList" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element maxOccurs="unbounded" ref="item"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_OrderedList"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_OrderedList">
+ <xs:attribute name="start" default="1" type="xs:positiveInteger"/>
+ <xs:attribute name="number-style" default="DefaultStyle">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="DefaultStyle"/>
+ <xs:enumeration value="Example"/>
+ <xs:enumeration value="Decimal"/>
+ <xs:enumeration value="LowerRoman"/>
+ <xs:enumeration value="UpperRoman"/>
+ <xs:enumeration value="LowerAlpha"/>
+ <xs:enumeration value="UpperAlpha"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="number-delim" default="DefaultDelim">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="DefaultDelim"/>
+ <xs:enumeration value="Period"/>
+ <xs:enumeration value="OneParen"/>
+ <xs:enumeration value="TwoParens"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:attributeGroup>
+ <xs:element name="DefinitionList" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element maxOccurs="unbounded" ref="item"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="item">
+ <xs:complexType>
+ <xs:choice>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ <xs:sequence>
+ <xs:element ref="term"/>
+ <xs:element maxOccurs="unbounded" ref="def"/>
+ </xs:sequence>
+ </xs:choice>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="term">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="def">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Table" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="Caption"/>
+ <xs:element ref="colspecs"/>
+ <xs:element ref="TableHead"/>
+ <xs:element maxOccurs="unbounded" ref="TableBody"/>
+ <xs:element ref="TableFoot"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_Table"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Table">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="Caption">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" ref="ShortCaption"/>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="ShortCaption">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="colspecs">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element maxOccurs="unbounded" ref="ColSpec"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="ColSpec">
+ <xs:complexType>
+ <xs:attributeGroup ref="attlist_ColSpec"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_ColSpec">
+ <xs:attribute name="alignment" default="AlignDefault">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="AlignLeft"/>
+ <xs:enumeration value="AlignRight"/>
+ <xs:enumeration value="AlignCenter"/>
+ <xs:enumeration value="AlignDefault"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="col-width" default="0">
+ <xs:simpleType>
+ <xs:restriction base="xs:double">
+ <xs:minInclusive value="0"/>
+ <xs:maxInclusive value="1"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:attributeGroup>
+ <xs:element name="TableHead">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_TableHead"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_TableHead">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="TableFoot">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_TableFoot"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_TableFoot">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="TableBody">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="header"/>
+ <xs:element ref="body"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_TableBody"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_TableBody">
+ <xs:attribute name="row-head-columns" default="0"/>
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="header">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="body">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Row">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="Cell"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_Row"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Row">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="Cell">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_Cell"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Cell">
+ <xs:attribute name="alignment" default="AlignDefault">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="AlignLeft"/>
+ <xs:enumeration value="AlignRight"/>
+ <xs:enumeration value="AlignCenter"/>
+ <xs:enumeration value="AlignDefault"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ <xs:attribute name="row-span" default="1" type="xs:positiveInteger"/>
+ <xs:attribute name="col-span" default="1" type="xs:positiveInteger"/>
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="Figure" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="Caption"/>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_Figure"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Figure">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="LineBlock" substitutionGroup="block">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element maxOccurs="unbounded" ref="line"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="line">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="CodeBlock" substitutionGroup="block">
+ <xs:complexType mixed="true">
+ <xs:attributeGroup ref="attlist_CodeBlock"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_CodeBlock">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="RawBlock" substitutionGroup="block">
+ <xs:complexType mixed="true">
+ <xs:attributeGroup ref="attlist_RawBlock"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_RawBlock">
+ <xs:attribute name="format" use="required"/>
+ </xs:attributeGroup>
+ <xs:element name="Space" substitutionGroup="inline_element">
+ <xs:complexType>
+ <xs:attributeGroup ref="attlist_Space"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Space">
+ <xs:attribute name="count" default="1" type="xs:positiveInteger"/>
+ </xs:attributeGroup>
+ <xs:element name="Str" substitutionGroup="inline_element">
+ <xs:complexType>
+ <xs:attributeGroup ref="attlist_Str"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Str">
+ <xs:attribute name="content" default=""/>
+ </xs:attributeGroup>
+ <xs:element name="Emph" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Strong" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Underline" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Strikeout" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Superscript" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Subscript" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="SmallCaps" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Span" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ <xs:attributeGroup ref="attlist_Span"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Span">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="Quoted" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ <xs:attributeGroup ref="attlist_Quoted"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Quoted">
+ <xs:attribute name="quote-type" default="DoubleQuote">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="SingleQuote"/>
+ <xs:enumeration value="DoubleQuote"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:attributeGroup>
+ <xs:element name="Math" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:attributeGroup ref="attlist_Math"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Math">
+ <xs:attribute name="math-type" default="InlineMath">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="DisplayMath"/>
+ <xs:enumeration value="InlineMath"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:attributeGroup>
+ <xs:element name="RawInline" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:attributeGroup ref="attlist_RawInline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_RawInline">
+ <xs:attribute name="format" use="required"/>
+ </xs:attributeGroup>
+ <xs:element name="Cite" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:choice minOccurs="0" maxOccurs="unbounded">
+ <xs:element ref="citations"/>
+ <xs:element ref="inline_element"/>
+ </xs:choice>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="citations">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element maxOccurs="unbounded" ref="Citation"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="Citation">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" ref="prefix"/>
+ <xs:element minOccurs="0" ref="suffix"/>
+ </xs:sequence>
+ <xs:attributeGroup ref="attlist_Citation"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="prefix">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="suffix">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Citation">
+ <xs:attribute name="id"/>
+ <xs:attribute name="note-num"/>
+ <xs:attribute name="hash" default="0"/>
+ <xs:attribute name="mode" default="AuthorInText">
+ <xs:simpleType>
+ <xs:restriction base="xs:token">
+ <xs:enumeration value="AuthorInText"/>
+ <xs:enumeration value="SuppressAuthor"/>
+ <xs:enumeration value="NormalCitation"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:attribute>
+ </xs:attributeGroup>
+ <xs:element name="Code" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:attributeGroup ref="attlist_Code"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Code">
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="Image" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ <xs:attributeGroup ref="attlist_Image"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Image">
+ <xs:attribute name="title"/>
+ <xs:attribute name="src"/>
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="Link" substitutionGroup="inline_element">
+ <xs:complexType mixed="true">
+ <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/>
+ <xs:attributeGroup ref="attlist_Link"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:attributeGroup name="attlist_Link">
+ <xs:attribute name="title"/>
+ <xs:attribute name="href"/>
+ <xs:attributeGroup ref="attr"/>
+ </xs:attributeGroup>
+ <xs:element name="SoftBreak" substitutionGroup="inline_element">
+ <xs:complexType/>
+ </xs:element>
+ <xs:element name="LineBreak" substitutionGroup="inline_element">
+ <xs:complexType/>
+ </xs:element>
+ <xs:element name="Note" substitutionGroup="inline_element">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+</xs:schema>