diff options
| -rw-r--r-- | doc/xml.md | 376 | ||||
| -rw-r--r-- | pandoc.cabal | 4 | ||||
| -rw-r--r-- | src/Text/Pandoc/Readers.hs | 3 | ||||
| -rw-r--r-- | src/Text/Pandoc/Readers/XML.hs | 540 | ||||
| -rw-r--r-- | src/Text/Pandoc/Writers.hs | 3 | ||||
| -rw-r--r-- | src/Text/Pandoc/Writers/XML.hs | 365 | ||||
| -rw-r--r-- | src/Text/Pandoc/XMLFormat.hs | 188 | ||||
| -rw-r--r-- | test/Tests/XML.hs | 28 | ||||
| -rw-r--r-- | test/test-pandoc.hs | 2 | ||||
| -rw-r--r-- | tools/pandoc-xml.dtd | 181 | ||||
| -rw-r--r-- | tools/pandoc-xml.rnc | 250 | ||||
| -rw-r--r-- | tools/pandoc-xml.rng | 913 | ||||
| -rw-r--r-- | tools/pandoc-xml.xsd | 602 |
13 files changed, 3455 insertions, 0 deletions
diff --git a/doc/xml.md b/doc/xml.md new file mode 100644 index 000000000..df90b7eea --- /dev/null +++ b/doc/xml.md @@ -0,0 +1,376 @@ +--- +title: XML +author: [email protected] +--- + +# Pandoc XML format + +This document describes Pandoc's `xml` format, a 1:1 equivalent +of the `native` and `json` formats. + +Here's the xml version of the beginning of this document, +to give you a glimpse of the format: + +```xml +<?xml version='1.0' ?> +<Pandoc api-version="1,23,1"> +<meta> + <entry key="author"> + <MetaInlines>[email protected]</MetaInlines> + </entry> + <entry key="title"> + <MetaInlines>XML</MetaInlines> + </entry> +</meta> +<blocks> + <Header id="pandoc-xml-format" level="1">Pandoc XML format</Header> + <Para>This document describes Pandoc’s <Code>xml</Code> format, a 1:1 equivalent<SoftBreak />of the <Code>native</Code> and <Code>json</Code> formats.</Para> + ... +</blocks> +</Pandoc> +``` + +## The tags + +If you know [Pandoc types](https://hackage.haskell.org/package/pandoc-types-1.23.1/docs/Text-Pandoc-Definition.html), the XML conversion is fairly straightforward. + +These are the main rules: + +- `Str` inlines are usually converted to plain, UTF-8 text (see below for exceptions) + +- `Space` inlines are usually converted to " " chars (see below for exceptions) + +- every `Block` and `Inline` becomes an element with the same name and the same capitalization: + a `Para` Block becomes a `<Para>` element, an `Emph` Inline becomes an `<Emph>` element, + and so on; + +- the root element is `<Pandoc>` and it has a `api-version` attribute, whose value + is a string of comma-separated integer numbers; it matches the `pandoc-api-version` + field of the `json` format; + +- the root `<Pandoc>` element has only two children: `<meta>` and `<blocks>` + (lowercase, as in `json` format); + +- blocks and inlines with an `Attr` are HTM-like, and they have: + + - the `id` attribute for the identifier + + - the `class` attribute, a string of space-separated classes + + - the other attributes of `Attr`, without any prefix (so no `data-` prefix, instead of HTML) + +- attributes are in lower (kebab) case: + + - `level` in Header + + - `start`, `number-style`, `number-delim` in OrderedList; + style and delimiter values are capitalized exactly as in `Text.Pandoc.Definition`; + + - `format` in `RawBlock` and RawInline + + - `quote-type` in Quoted (values are `SingleQuote` and `DoubleQuote`) + + - `math-type` in Math (values are `InlineMath` and `DisplayMath`) + + - `title` and `src` in Image target + + - `title` and `href` in Link target + + - `alignment` and `col-width` in ColSpec (about `col-width` values, see below); + (alignment values are capitalized as in `Text.Pandoc.Definition`) + + - `alignment`, `row-span` and `col-span` in Cell + + - `row-head-columns` in TableBody + + - `id`, `mode`, `note-num` and `hash` for Citation (about Cite elements, see below); + (`mode` values are capitalized as in `Text.Pandoc.Definition`) + +The classes of items with an `Attr` are put in a `class` attribute, +so that you can style the XML with CSS. + +## Str and Space elements + +`Str` and `Space` usually result in text and normal " " spaces, but there are exceptions: + +- `Str ""`, an empty string, is not suppressed; instead it is converted into a `<Str />` element; + +- `Str "foo bar"`, a string containing a space, is converted as `<Str content="foo bar" />`; + +- consecutive `Str` inlines, as in `[ ..., Str "foo", Str "bar", ... ]`, + are encoded as `foo<Str content="bar" />` to keep their individuality; + +- consecutive `Space` inlines, as in `[ ..., Space, Space, ... ]`, + are encoded as `<Space count="2" />` + +- `Space` inlines at the start or at the end of their container element + are always encoded with a `<Space />` element, instead of just a " " + +These encodings are necessary to ensure 1:1 equivalence of the `xml` format with the AST, +or the `native` and `json` formats. + +Since the ones above are corner cases, usually you should not see those `<Str />` and `<Space />` +elements in your documents. + +## Added tags + +Some other elements have been introduced to better structure the resulting XML. + +Since they are not Pandoc Blocks or Inlines, or they have no constructor or type +in Pandoc's haskell code, they are kept lowercased. + +### BulletList and OrderedList items + +Items of those lists are embedded in `<item>` elements. + +These snippets are from the `xml` version of `test/testsuite.native`: + +```xml +<BulletList> + <item> + <Plain>asterisk 1</Plain> + </item> + <item> + <Plain>asterisk 2</Plain> + </item> + <item> + <Plain>asterisk 3</Plain> + </item> +</BulletList> +... +<OrderedList start="1" number-style="Decimal" number-delim="Period"> + <item> + <Plain>First</Plain> + </item> + <item> + <Plain>Second</Plain> + </item> + <item> + <Plain>Third</Plain> + </item> +</OrderedList> +``` + +### DefinitionList items + +Definition lists have `<item>` elements. + +Each `<item>` term has only one `<term>` child element, +and one or more `<def>` children elements. + +This snippet is from the `xml` version of `test/testsuite.native`: + +```xml +<DefinitionList> + <item> + <term>apple</term> + <def> + <Plain>red fruit</Plain> + </def> + </item> + <item> + <term>orange</term> + <def> + <Plain>orange fruit</Plain> + </def> + </item> + <item> + <term>banana</term> + <def> + <Plain>yellow fruit</Plain> + </def> + </item> +</DefinitionList> +``` + +### Figure and Table captions + +Figures and tables have a `<Caption>` child element, +which in turn may optionally have a `<ShortCaption>` child element. + +This snippet is from the `xml` version of `test/testsuite.native`: + +```xml +<Figure> + <Caption> + <Plain>lalune</Plain> + </Caption> + <Plain><Image src="lalune.jpg" title="Voyage dans la Lune">lalune</Image></Plain> +</Figure> +``` + +### Tables + +A `<Table>` element has: + +- a `<Caption>` child element; + +- a `<colspecs>` child element, whose children are empty + `<ColSpec alignment="..." col-width="..." />` elements; + +- a `<TableHead>` child element; + +- one or more `<TableBody>` children elements, that in turn + have two children: `<header>` and `<body>`, whose children + are `<Row>` elements; + +- a `<TableFoot>` child element. + +This specification is debatable; I have these doubts: + +- is it necessary to enclose the `<ColSpec>` elements in a `<colspecs>` element? + +- to discriminate between header and data cells in table bodies, + there are the `row-head-columns` attribute, and the `<header>` and `<body>` children + of the `<TableBody>` element, but there's only one type of cell: + every cell is a `<Cell>` element + +- the specs are a tradeoff between consistency with pandoc types and CSS compatibility; + this way bodies' header rows are easily stylable with CSS, while header columns are not + +The `ColWidthDefault` value becomes a "0" value for the attribute `col-width`; +this way it's type-consistent with non-zero values, but I'm still doubtful whether to +leave its value as a "ColWidthDefault" string. + +Here's an example from the `xml` version of `test/tables/planets.native`: + +```xml +<Table> + <Caption> + <Para>Data about the planets of our solar system.</Para> + </Caption> + <colspecs> + <ColSpec col-width="0" alignment="AlignCenter" /> + <ColSpec col-width="0" alignment="AlignCenter" /> + <ColSpec col-width="0" alignment="AlignDefault" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignRight" /> + <ColSpec col-width="0" alignment="AlignDefault" /> + </colspecs> + <TableHead> + <Row> + <Cell col-span="2" row-span="1" alignment="AlignDefault" /> + <Cell col-span="1" row-span="1" alignment="AlignDefault"> + <Plain>Name</Plain> + </Cell> + <Cell col-span="1" row-span="1" alignment="AlignDefault"> + <Plain>Mass (10^24kg)</Plain> + </Cell> + ... + </Row> + </TableHead> + <TableBody row-head-columns="3"> + <header /> + <body> + <Row> + <Cell col-span="2" row-span="4" alignment="AlignDefault"> + <Plain>Terrestrial planets</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>Mercury</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>0.330</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>4,879</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>5427</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>3.7</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>4222.6</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>57.9</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>167</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>0</Plain> + </Cell> + <Cell alignment="AlignDefault"> + <Plain>Closest to the Sun</Plain> + </Cell> + </Row> + ... + </body> + </TableBody> + <TableFoot /> +</Table> +``` + +### Metadata and MetaMap entries + +Metadata entries are meta values (`MetaBool`, `MetaString`, `MetaInlines`, `MetaBlocks`, +`MetaList` and `MetaMap` elements) inside `<entry>` elements. + +The `<meta>` and the `<MetaMap>` elements have the same children elements (`<entry>`), +which have a `key` attribute. + +`<MetaInlines>`, `<MetaBlocks>`, `<MetaList>` and `<MetaMap>` elements +all have children elements. + +`<MetaString>` elements have only text. + +`<MetaBool>` elements are empty, they can be either `<MetaBool value="true" />` +or `<MetaBool value="false" />`. + +This snippet is from the `xml` version of `test/testsuite.native`: + +```xml +<meta> + <entry key="author"> + <MetaList> + <MetaInlines>John MacFarlane</MetaInlines> + <MetaInlines>Anonymous</MetaInlines> + </MetaList> + </entry> + <entry key="date"> + <MetaInlines>July 17, 2006</MetaInlines> + </entry> + <entry key="title"> + <MetaInlines>Pandoc Test Suite</MetaInlines> + </entry> +</meta> +``` + +### Cite elements + +`Cite` inlines are modeled with `<Cite>` elements, whose first child +is a `<citations>` element, that have only `<Citation>` children elements. + +`<Citation>` elements are empty, unless they have a prefix and/or a suffix. + +Here's an example from the `xml` version of `test/markdown-citations.native`: + +```xml +<Para><Cite><citations> + <Citation note-num="3" mode="AuthorInText" id="item1" hash="0" /> +</citations>@item1</Cite> says blah.</Para> +<Para><Cite><citations> + <Citation note-num="4" mode="AuthorInText" id="item1" hash="0"> + <suffix>p. 30</suffix> + </Citation> +</citations>@item1 [p. 30]</Cite> says blah.</Para> +<Para>A citation group <Cite><citations> + <Citation note-num="8" mode="NormalCitation" id="item1" hash="0"> + <prefix>see</prefix> + <suffix> chap. 3</suffix> + </Citation> + <Citation note-num="8" mode="NormalCitation" id="пункт3" hash="0"> + <prefix>also</prefix> + <suffix> p. 34-35</suffix> + </Citation> +</citations>[see @item1 chap. 3; also @пункт3 p. 34-35]</Cite>.</Para> +``` diff --git a/pandoc.cabal b/pandoc.cabal index d045f49f7..610723036 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -614,6 +614,7 @@ library Text.Pandoc.Readers.Pod, Text.Pandoc.Writers, Text.Pandoc.Writers.Native, + Text.Pandoc.Writers.XML, Text.Pandoc.Writers.DocBook, Text.Pandoc.Writers.JATS, Text.Pandoc.Writers.OPML, @@ -744,6 +745,7 @@ library Text.Pandoc.Readers.Metadata, Text.Pandoc.Readers.Roff, Text.Pandoc.Readers.Roff.Escape, + Text.Pandoc.Readers.XML, Text.Pandoc.Writers.Docx.OpenXML, Text.Pandoc.Writers.Docx.StyleMap, Text.Pandoc.Writers.Docx.Table, @@ -769,6 +771,7 @@ library Text.Pandoc.Char, Text.Pandoc.TeX, Text.Pandoc.URI, + Text.Pandoc.XMLFormat, Text.Pandoc.CSS, Text.Pandoc.CSV, Text.Pandoc.RoffChar, @@ -815,6 +818,7 @@ test-suite test-pandoc Tests.Helpers Tests.Shared Tests.MediaBag + Tests.XML Tests.Readers.LaTeX Tests.Readers.HTML Tests.Readers.JATS diff --git a/src/Text/Pandoc/Readers.hs b/src/Text/Pandoc/Readers.hs index 8f4800efb..12d1c6c95 100644 --- a/src/Text/Pandoc/Readers.hs +++ b/src/Text/Pandoc/Readers.hs @@ -65,6 +65,7 @@ module Text.Pandoc.Readers , readTypst , readDjot , readPod + , readXML -- * Miscellaneous , getReader , getDefaultExtensions @@ -118,6 +119,7 @@ import Text.Pandoc.Readers.RIS import Text.Pandoc.Readers.RTF import Text.Pandoc.Readers.Typst import Text.Pandoc.Readers.Djot +import Text.Pandoc.Readers.XML import qualified Text.Pandoc.UTF8 as UTF8 import Text.Pandoc.Sources (ToSources(..), sourcesToText) @@ -174,6 +176,7 @@ readers = [("native" , TextReader readNative) ,("djot" , TextReader readDjot) ,("mdoc" , TextReader readMdoc) ,("pod" , TextReader readPod) + ,("xml" , TextReader readXML) ] -- | Retrieve reader, extensions based on format spec (format+extensions). diff --git a/src/Text/Pandoc/Readers/XML.hs b/src/Text/Pandoc/Readers/XML.hs new file mode 100644 index 000000000..e3173f4c9 --- /dev/null +++ b/src/Text/Pandoc/Readers/XML.hs @@ -0,0 +1,540 @@ +{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE ScopedTypeVariables #-} + +-- | +-- Module : Text.Pandoc.Readers.XML +-- Copyright : Copyright (C) 2025- Massimiliano Farinella and John MacFarlane +-- License : GNU GPL, version 2 or above +-- +-- Maintainer : Massimiliano Farinella <[email protected]> +-- Stability : WIP +-- Portability : portable +-- +-- Conversion of (Pandoc specific) xml to 'Pandoc' document. +module Text.Pandoc.Readers.XML (readXML) where + +import Control.Monad (msum) +import Control.Monad.Except (throwError) +import Control.Monad.State.Strict (StateT (runStateT), modify) +import Data.Char (isSpace) +import Data.Default (Default (..)) +import qualified Data.List as L +import qualified Data.Map as M +import Data.Maybe (catMaybes, fromMaybe, mapMaybe) +import qualified Data.Set as S (Set, fromList, member) +import Data.Text (Text) +import qualified Data.Text as T +import Data.Text.Lazy (fromStrict) +import Data.Version (Version, makeVersion) +import Text.Pandoc.Builder +import Text.Pandoc.Class.PandocMonad +import Text.Pandoc.Error (PandocError (..)) +import Text.Pandoc.Options +import Text.Pandoc.Parsing (ToSources, toSources) +import Text.Pandoc.Sources (sourcesToText) +import Text.Pandoc.Version (pandocVersion) +import Text.Pandoc.XML (lookupEntity) +import Text.Pandoc.XML.Light +import Text.Pandoc.XMLFormat +import Text.Read (readMaybe) + +-- TODO: use xmlPath state to give better context when an error occurs + +type XMLReader m = StateT XMLReaderState m + +data XMLReaderState = XMLReaderState + { xmlApiVersion :: Version, + xmlMeta :: Meta, + xmlContent :: [Content], + xmlPath :: [Text] + } + deriving (Show) + +instance Default XMLReaderState where + def = + XMLReaderState + { xmlApiVersion = pandocVersion, + xmlMeta = mempty, + xmlContent = [], + xmlPath = ["root"] + } + +readXML :: (PandocMonad m, ToSources a) => ReaderOptions -> a -> m Pandoc +readXML _ inp = do + let sources = toSources inp + tree <- + either (throwError . PandocXMLError "") return $ + parseXMLContents (fromStrict . sourcesToText $ sources) + (bs, st') <- flip runStateT (def {xmlContent = tree}) $ mapM parseBlock tree + let blockList = toList $ concatMany bs + return $ Pandoc (xmlMeta st') blockList + +concatMany :: [Many a] -> Many a +concatMany = Many . mconcat . map unMany + +parseBlocks :: (PandocMonad m) => [Content] -> XMLReader m Blocks +parseBlocks contents = concatMany <$> mapM parseBlock contents + +getBlocks :: (PandocMonad m) => Element -> XMLReader m Blocks +getBlocks e = parseBlocks (elContent e) + +elementName :: Element -> Text +elementName e = qName $ elName e + +attrValue :: Text -> Element -> Text +attrValue attr = + fromMaybe "" . maybeAttrValue attr + +maybeAttrValue :: Text -> Element -> Maybe Text +maybeAttrValue attr elt = + lookupAttrBy (\x -> qName x == attr) (elAttribs elt) + +parseBlock :: (PandocMonad m) => Content -> XMLReader m Blocks +parseBlock (Text (CData CDataRaw _ _)) = return mempty -- DOCTYPE +parseBlock (Text (CData _ s _)) = + if T.all isSpace s + then return mempty + else do + throwError $ PandocXMLError "" "non-space characters out of inline context" +parseBlock (CRef x) = do + throwError $ PandocXMLError "" ("reference \"" <> x <> "\" out of inline context") +parseBlock (Elem e) = do + let name = elementName e + in case (name) of + "Pandoc" -> parsePandoc + "?xml" -> return mempty + "blocks" -> getBlocks e + "meta" -> + let entry_els = childrenNamed tgNameMetaMapEntry e + in do + entries <- catMaybes <$> mapM parseMetaMapEntry entry_els + mapM_ (uncurry addMeta) entries + return mempty + "Para" -> para <$> getInlines (elContent e) + "Plain" -> do + ils <- getInlines (elContent e) + return $ singleton . Plain . toList $ ils + "Header" -> (headerWith attr level) <$> getInlines (elContent e) + where + level = textToInt (attrValue atNameLevel e) 1 + attr = filterAttrAttributes [atNameLevel] $ attrFromElement e + "HorizontalRule" -> return horizontalRule + "BlockQuote" -> do + contents <- getBlocks e + return $ blockQuote contents + "Div" -> do + contents <- getBlocks e + return $ divWith (attrFromElement e) contents + "BulletList" -> do + items <- getListItems e + return $ bulletList items + "OrderedList" -> do + items <- getListItems e + return $ orderedListWith (getListAttributes e) items + "DefinitionList" -> do + let items_contents = getContentsOfElements (isElementNamed tgNameDefListItem) (elContent e) + items <- mapM parseDefinitionListItem items_contents + return $ definitionList items + "Figure" -> do + let attr = attrFromElement e + (maybe_caption_el, contents) = partitionFirstChildNamed "Caption" $ elContent e + figure_caption <- case (maybe_caption_el) of + Just (caption_el) -> parseCaption $ elContent caption_el + Nothing -> pure emptyCaption + blocks <- parseBlocks contents + return $ figureWith attr figure_caption blocks + "CodeBlock" -> do + let attr = attrFromElement e + return $ codeBlockWith attr $ strContentRecursive e + "RawBlock" -> do + let format = (attrValue atNameFormat e) + return $ rawBlock format $ strContentRecursive e + "LineBlock" -> do + lins <- mapM getInlines (contentsOfChildren tgNameLineItem (elContent e)) + return $ lineBlock lins + "Table" -> do + -- TODO: check unexpected items + let attr = attrFromElement e + (maybe_caption_el, after_caption) = partitionFirstChildNamed "Caption" $ elContent e + children = elementsWithNames (S.fromList [tgNameColspecs, "TableHead", "TableBody", "TableFoot"]) after_caption + is_element tag el = tag == elementName el + colspecs <- getColspecs $ L.find (is_element tgNameColspecs) children + tbs <- getTableBodies $ filter (is_element "TableBody") children + th <- getTableHead $ L.find (is_element "TableHead") children + tf <- getTableFoot $ L.find (is_element "TableFoot") children + capt <- parseMaybeCaptionElement maybe_caption_el + case colspecs of + Nothing -> return mempty + Just cs -> return $ fromList [Table attr capt cs th tbs tf] + _ -> do + throwError $ PandocXMLError "" ("unexpected element \"" <> name <> "\" in blocks context") + where + parsePandoc = do + let version = maybeAttrValue atNameApiVersion e + apiversion = case (version) of + Just (v) -> makeVersion $ map (read . T.unpack) $ T.splitOn "," v + Nothing -> pandocVersion + in modify $ \st -> st {xmlApiVersion = apiversion} + getBlocks e + +getListItems :: (PandocMonad m) => Element -> XMLReader m [Blocks] +getListItems e = + let items_els = childrenNamed tgNameListItem e + in do + mapM getBlocks items_els + +getContentsOfElements :: (Content -> Bool) -> [Content] -> [[Content]] +getContentsOfElements filter_element contents = mapMaybe element_contents $ filter filter_element contents + where + element_contents :: Content -> Maybe [Content] + element_contents c = case (c) of + Elem e -> Just (elContent e) + _ -> Nothing + +strContentRecursive :: Element -> Text +strContentRecursive = + strContent + . (\e' -> e' {elContent = map elementToStr $ elContent e'}) + +elementToStr :: Content -> Content +elementToStr (Elem e') = Text $ CData CDataText (strContentRecursive e') Nothing +elementToStr x = x + +textToInt :: Text -> Int -> Int +textToInt t deflt = + let safe_to_int :: Text -> Maybe Int + safe_to_int s = readMaybe $ T.unpack s + in case (safe_to_int t) of + Nothing -> deflt + Just (n) -> n + +parseInline :: (PandocMonad m) => Content -> XMLReader m Inlines +parseInline (Text (CData _ s _)) = + return $ text s +parseInline (CRef ref) = + return $ + maybe (text $ T.toUpper ref) text $ + lookupEntity ref +parseInline (Elem e) = + let name = elementName e + in case (name) of + "Space" -> + let count = textToInt (attrValue atNameSpaceCount e) 1 + in return $ fromList $ replicate count Space + "Str" -> return $ fromList [Str $ attrValue atNameStrContent e] + "Emph" -> innerInlines emph + "Strong" -> innerInlines strong + "Strikeout" -> innerInlines strikeout + "Subscript" -> innerInlines subscript + "Superscript" -> innerInlines superscript + "Underline" -> innerInlines underline + "SoftBreak" -> return softbreak + "LineBreak" -> return linebreak + "SmallCaps" -> innerInlines smallcaps + "Quoted" -> case (attrValue atNameQuoteType e) of + "SingleQuote" -> innerInlines singleQuoted + _ -> innerInlines doubleQuoted + "Math" -> case (attrValue atNameMathType e) of + "DisplayMath" -> pure $ displayMath $ strContentRecursive e + _ -> pure $ math $ strContentRecursive e + "Span" -> innerInlines $ spanWith (attrFromElement e) + "Code" -> do + let attr = attrFromElement e + return $ codeWith attr $ strContentRecursive e + "Link" -> innerInlines $ linkWith attr url title + where + url = attrValue atNameLinkUrl e + title = attrValue atNameTitle e + attr = filterAttrAttributes [atNameLinkUrl, atNameTitle] $ attrFromElement e + "Image" -> innerInlines $ imageWith attr url title + where + url = attrValue atNameImageUrl e + title = attrValue atNameTitle e + attr = filterAttrAttributes [atNameImageUrl, atNameTitle] $ attrFromElement e + "RawInline" -> do + let format = (attrValue atNameFormat e) + return $ rawInline format $ strContentRecursive e + "Note" -> do + contents <- getBlocks e + return $ note contents + "Cite" -> + let (maybe_citations_el, contents) = partitionFirstChildNamed tgNameCitations $ elContent e + in case (maybe_citations_el) of + Just citations_el -> do + citations <- parseCitations $ elContent citations_el + (innerInlines' contents) $ cite citations + Nothing -> getInlines contents + _ -> do + throwError $ PandocXMLError "" ("unexpected element \"" <> name <> "\" in inline context") + where + innerInlines' contents f = + f . concatMany + <$> mapM parseInline contents + innerInlines f = innerInlines' (elContent e) f + +getInlines :: (PandocMonad m) => [Content] -> XMLReader m Inlines +getInlines contents = concatMany <$> mapM parseInline contents + +getListAttributes :: Element -> ListAttributes +getListAttributes e = (start, style, delim) + where + start = textToInt (attrValue atNameStart e) 1 + style = case (attrValue atNameNumberStyle e) of + "Example" -> Example + "Decimal" -> Decimal + "LowerRoman" -> LowerRoman + "UpperRoman" -> UpperRoman + "LowerAlpha" -> LowerAlpha + "UpperAlpha" -> UpperAlpha + _ -> DefaultStyle + delim = case (attrValue atNameNumberDelim e) of + "Period" -> Period + "OneParen" -> OneParen + "TwoParens" -> TwoParens + _ -> DefaultDelim + +contentsOfChildren :: Text -> [Content] -> [[Content]] +contentsOfChildren tag contents = mapMaybe childrenElementWithTag contents + where + childrenElementWithTag :: Content -> Maybe [Content] + childrenElementWithTag c = case (c) of + (Elem e) -> if tag == elementName e then Just (elContent e) else Nothing + _ -> Nothing + +alignmentFromText :: Text -> Alignment +alignmentFromText t = case t of + "AlignLeft" -> AlignLeft + "AlignRight" -> AlignRight + "AlignCenter" -> AlignCenter + _ -> AlignDefault + +getColWidth :: Text -> ColWidth +getColWidth txt = case reads (T.unpack txt) of + [(value, "")] -> if value == 0.0 then ColWidthDefault else ColWidth value + _ -> ColWidthDefault + +getColspecs :: (PandocMonad m) => Maybe Element -> XMLReader m (Maybe [ColSpec]) +getColspecs Nothing = pure Nothing +getColspecs (Just cs) = do + return $ Just $ map elementToColSpec (childrenNamed "ColSpec" cs) + where + elementToColSpec e = (alignmentFromText $ attrValue atNameAlignment e, getColWidth $ attrValue atNameColWidth e) + +getTableBody :: (PandocMonad m) => Element -> XMLReader m (Maybe TableBody) +getTableBody body_el = do + let attr = filterAttrAttributes [atNameRowHeadColumns] $ attrFromElement body_el + bh = childrenNamed tgNameBodyHeader body_el + bb = childrenNamed tgNameBodyBody body_el + headcols = textToInt (attrValue atNameRowHeadColumns body_el) 0 + hrows <- mconcat <$> mapM getRows bh + brows <- mconcat <$> mapM getRows bb + return $ Just $ TableBody attr (RowHeadColumns headcols) hrows brows + +getTableBodies :: (PandocMonad m) => [Element] -> XMLReader m [TableBody] +getTableBodies body_elements = do + catMaybes <$> mapM getTableBody body_elements + +getTableHead :: (PandocMonad m) => Maybe Element -> XMLReader m TableHead +getTableHead maybe_e = case maybe_e of + Just e -> do + let attr = attrFromElement e + rows <- getRows e + return $ TableHead attr rows + Nothing -> return $ TableHead nullAttr [] + +getTableFoot :: (PandocMonad m) => Maybe Element -> XMLReader m TableFoot +getTableFoot maybe_e = case maybe_e of + Just e -> do + let attr = attrFromElement e + rows <- getRows e + return $ TableFoot attr rows + Nothing -> return $ TableFoot nullAttr [] + +getCell :: (PandocMonad m) => Element -> XMLReader m Cell +getCell c = do + let alignment = alignmentFromText $ attrValue atNameAlignment c + rowspan = RowSpan $ textToInt (attrValue atNameRowspan c) 1 + colspan = ColSpan $ textToInt (attrValue atNameColspan c) 1 + attr = filterAttrAttributes [atNameAlignment, atNameRowspan, atNameColspan] $ attrFromElement c + blocks <- getBlocks c + return $ Cell attr alignment rowspan colspan (toList blocks) + +getRows :: (PandocMonad m) => Element -> XMLReader m [Row] +getRows e = mapM getRow $ childrenNamed "Row" e + where + getRow r = do + cells <- mapM getCell (childrenNamed "Cell" r) + return $ Row (attrFromElement r) cells + +parseCitations :: (PandocMonad m) => [Content] -> XMLReader m [Citation] +parseCitations contents = do + maybecitations <- mapM getCitation contents + return $ catMaybes maybecitations + where + getCitation :: (PandocMonad m) => Content -> XMLReader m (Maybe Citation) + getCitation content = case (content) of + (Elem e) -> + if qName (elName e) == "Citation" + then do + p <- inlinesOfChildrenNamed tgNameCitationPrefix e + s <- inlinesOfChildrenNamed tgNameCitationSuffix e + return $ + Just + ( Citation + { citationId = attrValue "id" e, + citationPrefix = toList p, + citationSuffix = toList s, + citationMode = case (attrValue atNameCitationMode e) of + "AuthorInText" -> AuthorInText + "SuppressAuthor" -> SuppressAuthor + _ -> NormalCitation, + citationNoteNum = textToInt (attrValue atNameCitationNoteNum e) 0, + citationHash = textToInt (attrValue atNameCitationHash e) 0 + } + ) + else do + return Nothing + _ -> do + return Nothing + where + inlinesOfChildrenNamed tag e = getInlines $ concatMap (\e' -> elContent e') (childrenNamed tag e) + +parseMaybeCaptionElement :: (PandocMonad m) => Maybe Element -> XMLReader m Caption +parseMaybeCaptionElement Nothing = pure emptyCaption +parseMaybeCaptionElement (Just e) = parseCaption $ elContent e + +parseCaption :: (PandocMonad m) => [Content] -> XMLReader m Caption +parseCaption contents = + let (maybe_shortcaption_el, caption_contents) = partitionFirstChildNamed tgNameShortCaption contents + in do + blocks <- parseBlocks caption_contents + case (maybe_shortcaption_el) of + Just shortcaption_el -> do + short_caption <- getInlines (elContent shortcaption_el) + return $ caption (Just $ toList short_caption) blocks + Nothing -> return $ caption Nothing blocks + +parseDefinitionListItem :: (PandocMonad m) => [Content] -> XMLReader m (Inlines, [Blocks]) +parseDefinitionListItem contents = do + let term_contents = getContentsOfElements (isElementNamed tgNameDefListTerm) contents + defs_elements = elementContents $ filter (isElementNamed tgNameDefListDef) contents + term_inlines <- getInlines (concat term_contents) + defs <- mapM getBlocks defs_elements + return (term_inlines, defs) + +elementContents :: [Content] -> [Element] +elementContents contents = mapMaybe toElement contents + where + toElement :: Content -> Maybe Element + toElement (Elem e) = Just e + toElement _ = Nothing + +isElementNamed :: Text -> Content -> Bool +isElementNamed t c = case (c) of + Elem e -> t == elementName e + _ -> False + +childrenNamed :: Text -> Element -> [Element] +childrenNamed tag e = elementContents $ filter (isElementNamed tag) (elContent e) + +elementsWithNames :: S.Set Text -> [Content] -> [Element] +elementsWithNames tags contents = mapMaybe isElementWithNameInSet contents + where + isElementWithNameInSet c = case (c) of + Elem el -> + if (elementName el) `S.member` tags + then Just el + else Nothing + _ -> Nothing + +partitionFirstChildNamed :: Text -> [Content] -> (Maybe Element, [Content]) +partitionFirstChildNamed tag contents = case (contents) of + (Text (CData _ s _) : rest) -> + if T.all isSpace s + then partitionFirstChildNamed tag rest + else (Nothing, contents) + (Elem e : rest) -> + if tag == elementName e + then (Just e, rest) + else (Nothing, contents) + _ -> (Nothing, contents) + +type PandocAttr = (Text, [Text], [(Text, Text)]) + +filterAttributes :: S.Set Text -> [(Text, Text)] -> [(Text, Text)] +filterAttributes to_be_removed a = filter keep_attr a + where + keep_attr (k, _) = not (k `S.member` to_be_removed) + +filterAttrAttributes :: [Text] -> PandocAttr -> PandocAttr +filterAttrAttributes to_be_removed (idn, classes, a) = (idn, classes, filtered) + where + filtered = filterAttributes (S.fromList to_be_removed) a + +attrFromElement :: Element -> PandocAttr +attrFromElement e = filterAttrAttributes ["id", "class"] (idn, classes, attributes) + where + idn = attrValue "id" e + classes = T.words $ attrValue "class" e + attributes = map (\a -> (qName $ attrKey a, attrVal a)) $ elAttribs e + +addMeta :: (PandocMonad m) => (ToMetaValue a) => Text -> a -> XMLReader m () +addMeta field val = modify (setMeta field val) + +instance HasMeta XMLReaderState where + setMeta field v s = s {xmlMeta = setMeta field v (xmlMeta s)} + + deleteMeta field s = s {xmlMeta = deleteMeta field (xmlMeta s)} + +parseMetaMapEntry :: (PandocMonad m) => Element -> XMLReader m (Maybe (Text, MetaValue)) +parseMetaMapEntry e = + let key = attrValue atNameMetaMapEntryKey e + in case (key) of + "" -> pure Nothing + k -> do + maybe_value <- parseMetaMapEntryContents $ elContent e + case (maybe_value) of + Nothing -> return Nothing + Just v -> return $ Just (k, v) + +parseMetaMapEntryContents :: (PandocMonad m) => [Content] -> XMLReader m (Maybe MetaValue) +parseMetaMapEntryContents cs = msum <$> mapM parseMeta cs + +parseMeta :: (PandocMonad m) => Content -> XMLReader m (Maybe MetaValue) +parseMeta (Text (CData CDataRaw _ _)) = return Nothing +parseMeta (Text (CData _ s _)) = + if T.all isSpace s + then return Nothing + else do + throwError $ PandocXMLError "" "non-space characters out of inline context in metadata" +parseMeta (CRef x) = + throwError $ PandocXMLError "" ("reference \"" <> x <> "\" out of inline context") +parseMeta (Elem e) = do + let name = elementName e + in case (name) of + "MetaBool" -> case (attrValue atNameMetaBoolValue e) of + "true" -> return $ Just $ MetaBool True + _ -> return $ Just $ MetaBool False + "MetaString" -> pure Nothing + "MetaInlines" -> do + inlines <- getInlines (elContent e) + return $ Just $ MetaInlines $ toList inlines + "MetaBlocks" -> do + blocks <- getBlocks e + return $ Just $ MetaBlocks $ toList blocks + "MetaList" -> do + maybe_items <- mapM parseMeta $ elContent e + let items = catMaybes maybe_items + in -- TODO: report empty MetaList? + return $ Just $ MetaList items + "MetaMap" -> + let entry_els = childrenNamed tgNameMetaMapEntry e + in do + entries <- catMaybes <$> mapM parseMetaMapEntry entry_els + if null entries + then + -- TODO: report empty MetaMap + return Nothing + else return $ Just $ MetaMap $ M.fromList entries + _ -> do + throwError $ PandocXMLError "" ("unexpected element \"" <> name <> "\" in metadata") diff --git a/src/Text/Pandoc/Writers.hs b/src/Text/Pandoc/Writers.hs index 3e1974ad6..164bf6d0f 100644 --- a/src/Text/Pandoc/Writers.hs +++ b/src/Text/Pandoc/Writers.hs @@ -76,6 +76,7 @@ module Text.Pandoc.Writers , writeTexinfo , writeTextile , writeTypst + , writeXML , writeXWiki , writeZimWiki , getWriter @@ -128,6 +129,7 @@ import Text.Pandoc.Writers.TEI import Text.Pandoc.Writers.Texinfo import Text.Pandoc.Writers.Textile import Text.Pandoc.Writers.Typst +import Text.Pandoc.Writers.XML import Text.Pandoc.Writers.XWiki import Text.Pandoc.Writers.ZimWiki @@ -203,6 +205,7 @@ writers = [ ,("chunkedhtml" , ByteStringWriter writeChunkedHTML) ,("djot" , TextWriter writeDjot) ,("ansi" , TextWriter writeANSI) + ,("xml" , TextWriter writeXML) ] -- | Retrieve writer, extensions based on formatSpec (format+extensions). diff --git a/src/Text/Pandoc/Writers/XML.hs b/src/Text/Pandoc/Writers/XML.hs new file mode 100644 index 000000000..35bcb07f8 --- /dev/null +++ b/src/Text/Pandoc/Writers/XML.hs @@ -0,0 +1,365 @@ +{-# LANGUAGE FlexibleContexts #-} +{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE ScopedTypeVariables #-} + +-- | +-- Module : Text.Pandoc.Writers.XML +-- Copyright : Copyright (C) 2025- Massimiliano Farinella and John MacFarlane +-- License : GNU GPL, version 2 or above +-- +-- Maintainer : Massimiliano Farinella <[email protected]> +-- Stability : WIP +-- Portability : portable +-- +-- Conversion of 'Pandoc' documents to (pandoc specific) xml markup. +module Text.Pandoc.Writers.XML (writeXML) where + +import Data.Map (Map, toList) +import Data.Maybe (mapMaybe) +import qualified Data.Text as T +import Data.Version (versionBranch) +import Text.Pandoc.Class.PandocMonad (PandocMonad) +import Text.Pandoc.Definition +import Text.Pandoc.Options (WriterOptions (..)) +import Text.Pandoc.XML.Light +import qualified Text.Pandoc.XML.Light as XML +import Text.Pandoc.XMLFormat +import Text.XML.Light (xml_header) + +type PandocAttr = Text.Pandoc.Definition.Attr + +writeXML :: (PandocMonad m) => WriterOptions -> Pandoc -> m T.Text +writeXML _ doc = do + return $ pandocToXmlText doc + +text_node :: T.Text -> Content +text_node text = Text (CData CDataText text Nothing) + +emptyElement :: T.Text -> Element +emptyElement tag = + Element + { elName = unqual tag, + elAttribs = [], + elContent = [], + elLine = Nothing + } + +elementWithContents :: T.Text -> [Content] -> Element +elementWithContents tag contents = + Element + { elName = unqual tag, + elAttribs = [], + elContent = contents, + elLine = Nothing + } + +elementWithAttributes :: T.Text -> [XML.Attr] -> Element +elementWithAttributes tag attributes = + Element + { elName = unqual tag, + elAttribs = attributes, + elContent = [], + elLine = Nothing + } + +elementWithAttrAndContents :: T.Text -> PandocAttr -> [Content] -> Element +elementWithAttrAndContents tag attr contents = addAttrAttributes attr $ elementWithContents tag contents + +asBlockOfInlines :: Element -> [Content] +asBlockOfInlines el = [Elem el, text_node "\n"] + +asBlockOfBlocks :: Element -> [Content] +asBlockOfBlocks el = [Elem newline_before_first, newline] + where + newline = text_node "\n" + newline_before_first = if null (elContent el) then el else prependContents [newline] el + +itemName :: (Show a) => a -> T.Text +itemName a = T.pack $ takeWhile (/= ' ') (show a) + +intAsText :: Int -> T.Text +intAsText i = T.pack $ show i + +itemAsEmptyElement :: (Show a) => a -> Element +itemAsEmptyElement item = emptyElement $ itemName item + +pandocToXmlText :: Pandoc -> T.Text +pandocToXmlText (Pandoc (Meta meta) blocks) = with_header . with_blocks . with_meta . with_version $ el + where + el = prependContents [text_node "\n"] $ emptyElement "Pandoc" + with_version = addAttribute atNameApiVersion (T.intercalate "," $ map (T.pack . show) $ versionBranch pandocTypesVersion) + with_meta = appendContents (metaMapToXML meta "meta") + with_blocks = appendContents (asBlockOfBlocks $ elementWithContents "blocks" $ blocksToXML blocks) + with_header :: Element -> T.Text + with_header e = T.concat [T.pack xml_header, "\n", showElement e] + +metaMapToXML :: Map T.Text MetaValue -> T.Text -> [Content] +metaMapToXML mmap tag = asBlockOfBlocks $ elementWithContents tag entries + where + entries = concatMap to_entry $ toList mmap + to_entry :: (T.Text, MetaValue) -> [Content] + to_entry (text, metavalue) = asBlockOfBlocks with_key + where + entry = elementWithContents tgNameMetaMapEntry $ metaValueToXML metavalue + with_key = addAttribute atNameMetaMapEntryKey text entry + +metaValueToXML :: MetaValue -> [Content] +metaValueToXML value = + let name = itemName value + el = itemAsEmptyElement value + in case (value) of + MetaBool b -> asBlockOfInlines $ addAttribute atNameMetaBoolValue bool_value el + where + bool_value = if b then "true" else "false" + MetaString s -> asBlockOfInlines $ appendContents [text_node s] el + MetaInlines inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) el + MetaBlocks blocks -> asBlockOfBlocks $ appendContents (blocksToXML blocks) el + MetaList items -> asBlockOfBlocks $ appendContents (concatMap metaValueToXML items) el + MetaMap mm -> metaMapToXML mm name + +blocksToXML :: [Block] -> [Content] +blocksToXML blocks = concatMap blockToXML blocks + +inlinesToXML :: [Inline] -> [Content] +inlinesToXML inlines = concatMap inlineContentToContents (ilsToIlsContent inlines []) + +data InlineContent + = NormalInline Inline + | ElSpace Int + | ElStr T.Text + +ilsToIlsContent :: [Inline] -> [InlineContent] -> [InlineContent] +ilsToIlsContent (Space : xs) [] = ilsToIlsContent xs [ElSpace 1] +ilsToIlsContent (Space : xs) (NormalInline Space : cs) = ilsToIlsContent xs (ElSpace 2 : cs) +ilsToIlsContent (Space : xs) (ElSpace n : cs) = ilsToIlsContent xs (ElSpace (n + 1) : cs) +-- empty Str are always encoded as <Str /> +ilsToIlsContent (Str "" : xs) ilct = ilsToIlsContent xs (ElStr "" : ilct) +-- Str s1, Str s2 -> s1<Str content="s2"> +ilsToIlsContent (Str s2 : xs) (NormalInline str1@(Str _) : ilct) = ilsToIlsContent xs (ElStr s2 : NormalInline str1 : ilct) +-- +ilsToIlsContent (Str s : xs) ilct = + if T.any (== ' ') s + then ilsToIlsContent xs (ElStr s : ilct) + else ilsToIlsContent xs (NormalInline (Str s) : ilct) +ilsToIlsContent (x : xs) ilct = ilsToIlsContent xs (NormalInline x : ilct) +ilsToIlsContent [] ilct = reverse $ lastSpaceAsElem ilct + where + lastSpaceAsElem :: [InlineContent] -> [InlineContent] + lastSpaceAsElem (NormalInline Space : xs) = ElSpace 1 : xs + lastSpaceAsElem ilcts = ilcts + +inlineContentToContents :: InlineContent -> [Content] +inlineContentToContents (NormalInline il) = inlineToXML il +inlineContentToContents (ElSpace 1) = [Elem $ emptyElement "Space"] +inlineContentToContents (ElSpace n) = [Elem $ addAttribute atNameSpaceCount (intAsText n) (emptyElement "Space")] +inlineContentToContents (ElStr "") = [Elem $ emptyElement "Str"] +inlineContentToContents (ElStr s) = [Elem $ addAttribute atNameStrContent s (emptyElement "Str")] + +asContents :: Element -> [Content] +asContents el = [Elem el] + +wrapBlocks :: T.Text -> [Block] -> [Content] +wrapBlocks tag blocks = asBlockOfBlocks $ elementWithContents tag $ blocksToXML blocks + +wrapArrayOfBlocks :: T.Text -> [[Block]] -> [Content] +wrapArrayOfBlocks tag array = concatMap (wrapBlocks tag) array + +-- wrapInlines :: T.Text -> [Inline] -> [Content] +-- wrapInlines tag inlines = asBlockOfInlines $ element_with_contents tag $ inlinesToXML inlines + +blockToXML :: Block -> [Content] +blockToXML block = + let el = itemAsEmptyElement block + in case (block) of + Para inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) el + Header level (idn, cls, attrs) inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) with_attr + where + with_attr = addAttrAttributes (idn, cls, attrs ++ [(atNameLevel, intAsText level)]) el + Plain inlines -> asBlockOfInlines $ appendContents (inlinesToXML inlines) el + Div attr blocks -> asBlockOfBlocks $ appendContents (blocksToXML blocks) with_attr + where + with_attr = addAttrAttributes attr el + BulletList items -> asBlockOfBlocks $ appendContents (wrapArrayOfBlocks tgNameListItem items) el + OrderedList (start, style, delim) items -> asBlockOfBlocks $ with_contents . with_attrs $ el + where + with_attrs = + addAttributes + ( validAttributes + [ (atNameStart, intAsText start), + (atNameNumberStyle, itemName style), + (atNameNumberDelim, itemName delim) + ] + ) + with_contents = appendContents (wrapArrayOfBlocks tgNameListItem items) + BlockQuote blocks -> asBlockOfBlocks $ appendContents (blocksToXML blocks) el + HorizontalRule -> asBlockOfInlines el + CodeBlock attr text -> asBlockOfInlines $ with_contents . with_attr $ el + where + with_contents = appendContents [text_node text] + with_attr = addAttrAttributes attr + LineBlock lins -> asBlockOfBlocks $ appendContents (concatMap wrapInlines lins) el + where + wrapInlines inlines = asContents $ appendContents (inlinesToXML inlines) $ emptyElement tgNameLineItem + Table attr caption colspecs thead tbodies tfoot -> asBlockOfBlocks $ with_foot . with_bodies . with_head . with_colspecs . with_caption . with_attr $ el + where + with_attr = addAttrAttributes attr + with_caption = appendContents (captionToXML caption) + with_colspecs = appendContents (colSpecsToXML colspecs) + with_head = appendContents (tableHeadToXML thead) + with_bodies = appendContents (concatMap tableBodyToXML tbodies) + with_foot = appendContents (tableFootToXML tfoot) + Figure attr caption blocks -> asBlockOfBlocks $ with_contents . with_caption . with_attr $ el + where + with_attr = addAttrAttributes attr + with_caption = appendContents (captionToXML caption) + with_contents = appendContents (blocksToXML blocks) + RawBlock (Format format) text -> asContents $ appendContents [text_node text] raw + where + raw = addAttribute atNameFormat format el + DefinitionList items -> asBlockOfBlocks $ appendContents (map definitionListItemToXML items) el + +inlineToXML :: Inline -> [Content] +inlineToXML inline = + let el = itemAsEmptyElement inline + wrapInlines inlines = asContents $ appendContents (inlinesToXML inlines) el + in case (inline) of + Space -> [text_node " "] + Str s -> [text_node s] + Emph inlines -> wrapInlines inlines + Strong inlines -> wrapInlines inlines + Quoted quote_type inlines -> asContents $ appendContents (inlinesToXML inlines) quoted + where + quoted = addAttribute atNameQuoteType (itemName quote_type) el + Underline inlines -> wrapInlines inlines + Strikeout inlines -> wrapInlines inlines + SmallCaps inlines -> wrapInlines inlines + Superscript inlines -> wrapInlines inlines + Subscript inlines -> wrapInlines inlines + SoftBreak -> asContents el + LineBreak -> asContents el + Span attr inlines -> asContents $ appendContents (inlinesToXML inlines) with_attr + where + with_attr = addAttrAttributes attr el + Link (idn, cls, attrs) inlines (url, title) -> asContents $ appendContents (inlinesToXML inlines) with_attr + where + with_attr = addAttrAttributes (idn, cls, attrs ++ [(atNameLinkUrl, url), (atNameTitle, title)]) el + Image (idn, cls, attrs) inlines (url, title) -> asContents $ appendContents (inlinesToXML inlines) with_attr + where + with_attr = addAttrAttributes (idn, cls, attrs ++ [(atNameImageUrl, url), (atNameTitle, title)]) el + RawInline (Format format) text -> asContents $ appendContents [text_node text] raw + where + raw = addAttribute atNameFormat format el + Math math_type text -> asContents $ appendContents [text_node text] math + where + math = addAttribute atNameMathType (itemName math_type) el + Code attr text -> asContents $ appendContents [text_node text] with_attr + where + with_attr = addAttrAttributes attr el + Note blocks -> asContents $ appendContents (blocksToXML blocks) el + Cite citations inlines -> asContents $ appendContents (inlinesToXML inlines) with_citations + where + with_citations = addCitations citations el + +-- TODO: don't let an attribute overwrite id or class +maybeAttribute :: (T.Text, T.Text) -> Maybe XML.Attr +maybeAttribute (_, "") = Nothing +maybeAttribute ("", _) = Nothing +maybeAttribute (name, value) = Just $ XML.Attr (unqual name) value + +validAttributes :: [(T.Text, T.Text)] -> [XML.Attr] +validAttributes pairs = mapMaybe maybeAttribute pairs + +appendContents :: [Content] -> Element -> Element +appendContents newContents el = el {elContent = (elContent el) ++ newContents} + +prependContents :: [Content] -> Element -> Element +prependContents newContents el = el {elContent = newContents ++ (elContent el)} + +addAttributes :: [XML.Attr] -> Element -> Element +addAttributes newAttrs el = el {elAttribs = newAttrs ++ elAttribs el} + +addAttribute :: T.Text -> T.Text -> Element -> Element +addAttribute attr_name attr_value el = el {elAttribs = new_attr : elAttribs el} + where + new_attr = XML.Attr (unqual attr_name) attr_value + +addAttrAttributes :: PandocAttr -> Element -> Element +addAttrAttributes (identifier, classes, attributes) el = addAttributes attrs' el + where + attrs' = mapMaybe maybeAttribute (("id", identifier) : ("class", T.intercalate " " classes) : attributes) + +addCitations :: [Citation] -> Element -> Element +addCitations citations el = appendContents [Elem $ elementWithContents tgNameCitations $ (text_node "\n") : concatMap citation_to_elem citations] el + where + citation_to_elem :: Citation -> [Content] + citation_to_elem citation = asBlockOfInlines with_suffix + where + cit_elem = elementWithAttributes (itemName citation) attrs + prefix = citationPrefix citation + suffix = citationSuffix citation + with_prefix = + if null prefix + then cit_elem + else appendContents [Elem $ elementWithContents tgNameCitationPrefix $ inlinesToXML prefix] cit_elem + with_suffix = + if null suffix + then with_prefix + else appendContents [Elem $ elementWithContents tgNameCitationSuffix $ inlinesToXML suffix] with_prefix + attrs = + map + (\(n, v) -> XML.Attr (unqual n) v) + [ ("id", citationId citation), + (atNameCitationMode, T.pack $ show $ citationMode citation), + (atNameCitationNoteNum, intAsText $ citationNoteNum citation), + (atNameCitationHash, intAsText $ citationHash citation) + ] + +definitionListItemToXML :: ([Inline], [[Block]]) -> Content +definitionListItemToXML (inlines, defs) = Elem $ elementWithContents tgNameDefListItem $ term ++ wrapArrayOfBlocks tgNameDefListDef defs + where + term = asBlockOfInlines $ appendContents (inlinesToXML inlines) $ emptyElement tgNameDefListTerm + +captionToXML :: Caption -> [Content] +captionToXML (Caption short blocks) = asBlockOfBlocks with_short_caption + where + el = elementWithContents "Caption" $ blocksToXML blocks + with_short_caption = case (short) of + Just inlines -> prependContents (asBlockOfInlines $ elementWithContents tgNameShortCaption $ inlinesToXML inlines) el + _ -> el + +colSpecToXML :: (Alignment, ColWidth) -> [Content] +colSpecToXML (align, cw) = asBlockOfInlines colspec + where + colspec = elementWithAttributes "ColSpec" $ validAttributes [(atNameAlignment, itemName align), (atNameColWidth, colwidth)] + colwidth = case (cw) of + ColWidth d -> T.pack $ show d + ColWidthDefault -> "0" + +colSpecsToXML :: [(Alignment, ColWidth)] -> [Content] +colSpecsToXML colspecs = asBlockOfBlocks $ elementWithContents tgNameColspecs $ concatMap colSpecToXML colspecs + +tableHeadToXML :: TableHead -> [Content] +tableHeadToXML (TableHead attr rows) = asBlockOfBlocks $ elementWithAttrAndContents "TableHead" attr $ concatMap rowToXML rows + +tableBodyToXML :: TableBody -> [Content] +tableBodyToXML (TableBody (idn, cls, attrs) (RowHeadColumns headcols) hrows brows) = asBlockOfBlocks $ elementWithAttrAndContents "TableBody" attr children + where + attr = (idn, cls, (atNameRowHeadColumns, intAsText headcols) : attrs) + header_rows = asBlockOfBlocks $ elementWithContents tgNameBodyHeader $ concatMap rowToXML hrows + body_rows = asBlockOfBlocks $ elementWithContents tgNameBodyBody $ concatMap rowToXML brows + children = header_rows ++ body_rows + +tableFootToXML :: TableFoot -> [Content] +tableFootToXML (TableFoot attr rows) = asBlockOfBlocks $ elementWithAttrAndContents "TableFoot" attr $ concatMap rowToXML rows + +rowToXML :: Row -> [Content] +rowToXML (Row attr cells) = asBlockOfBlocks $ elementWithAttrAndContents "Row" attr $ concatMap cellToXML cells + +cellToXML :: Cell -> [Content] +cellToXML (Cell (idn, cls, attrs) alignment (RowSpan rowspan) (ColSpan colspan) blocks) = asBlockOfBlocks $ elementWithAttrAndContents "Cell" attr $ blocksToXML blocks + where + with_alignment a = (atNameAlignment, itemName alignment) : a + with_rowspan a = if rowspan > 1 then (atNameRowspan, intAsText rowspan) : a else a + with_colspan a = if colspan > 1 then (atNameColspan, intAsText colspan) : a else a + attrs' = (with_colspan . with_rowspan . with_alignment) attrs + attr = (idn, cls, attrs') diff --git a/src/Text/Pandoc/XMLFormat.hs b/src/Text/Pandoc/XMLFormat.hs new file mode 100644 index 000000000..6801de90c --- /dev/null +++ b/src/Text/Pandoc/XMLFormat.hs @@ -0,0 +1,188 @@ +{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE ScopedTypeVariables #-} + +module Text.Pandoc.XMLFormat + ( atNameAlignment, + atNameApiVersion, + atNameCitationHash, + atNameCitationMode, + atNameCitationNoteNum, + atNameColspan, + atNameColWidth, + atNameFormat, + atNameImageUrl, + atNameLevel, + atNameLinkUrl, + atNameMathType, + atNameMetaBoolValue, + atNameMetaMapEntryKey, + atNameNumberDelim, + atNameNumberStyle, + atNameQuoteType, + atNameRowHeadColumns, + atNameRowspan, + atNameSpaceCount, + atNameStart, + atNameStrContent, + atNameTitle, + tgNameBodyBody, + tgNameBodyHeader, + tgNameCitations, + tgNameCitationPrefix, + tgNameCitationSuffix, + tgNameColspecs, + tgNameDefListDef, + tgNameDefListItem, + tgNameDefListTerm, + tgNameLineItem, + tgNameListItem, + tgNameMetaMapEntry, + tgNameShortCaption, + ) +where + +import Data.Text (Text) + +-- the attribute carrying the API version of pandoc types in the main Pandoc element +atNameApiVersion :: Text +atNameApiVersion = "api-version" + +-- the element of a <meta> or <MetaMap> entry +tgNameMetaMapEntry :: Text +tgNameMetaMapEntry = "entry" + +-- the attribute carrying the key name of a <meta> or <MetaMap> entry +atNameMetaMapEntryKey :: Text +atNameMetaMapEntryKey = "key" + +-- the attribute carrying the boolean value ("true" or "false") of a MetaBool +atNameMetaBoolValue :: Text +atNameMetaBoolValue = "value" + +-- level of a Header +atNameLevel :: Text +atNameLevel = "level" + +-- start number of an OrderedList +atNameStart :: Text +atNameStart = "start" + +-- number delimiter of an OrderedList +atNameNumberDelim :: Text +atNameNumberDelim = "number-delim" + +-- number style of an OrderedList +atNameNumberStyle :: Text +atNameNumberStyle = "number-style" + +-- target title in Image and Link +atNameTitle :: Text +atNameTitle = "title" + +-- target url in Image +atNameImageUrl :: Text +atNameImageUrl = "src" + +-- target url in Link +atNameLinkUrl :: Text +atNameLinkUrl = "href" + +-- QuoteType of a Quoted +atNameQuoteType :: Text +atNameQuoteType = "quote-type" + +-- MathType of a Math +atNameMathType :: Text +atNameMathType = "math-type" + +-- format of a RawInline or a RawBlock +atNameFormat :: Text +atNameFormat = "format" + +-- alignment attribute in a ColSpec or in a Cell +atNameAlignment :: Text +atNameAlignment = "alignment" + +-- ColWidth attribute in a ColSpec +atNameColWidth :: Text +atNameColWidth = "col-width" + +-- RowHeadColumns attribute in a TableBody +atNameRowHeadColumns :: Text +atNameRowHeadColumns = "row-head-columns" + +-- RowSpan attribute in a Cell +atNameRowspan :: Text +atNameRowspan = "row-span" + +-- ColSpan attribute in a Cell +atNameColspan :: Text +atNameColspan = "col-span" + +-- the citationMode of a Citation +atNameCitationMode :: Text +atNameCitationMode = "mode" + +-- the citationHash of a Citation +atNameCitationHash :: Text +atNameCitationHash = "hash" + +-- the citationNoteNum of a Citation +atNameCitationNoteNum :: Text +atNameCitationNoteNum = "note-num" + +-- the number of consecutive spaces of the <Space> element +atNameSpaceCount :: Text +atNameSpaceCount = "count" + +-- the content of the <Str> element +atNameStrContent :: Text +atNameStrContent = "content" + +-- container of Citation elements in Cite inlines +tgNameCitations :: Text +tgNameCitations = "citations" + +-- element around the prefix inlines of a Citation +tgNameCitationPrefix :: Text +tgNameCitationPrefix = "prefix" + +-- element around the suffix inlines of a Citation +tgNameCitationSuffix :: Text +tgNameCitationSuffix = "suffix" + +-- list item for BulletList and OrderedList +tgNameListItem :: Text +tgNameListItem = "item" + +-- list item for DefinitionList +tgNameDefListItem :: Text +tgNameDefListItem = "item" + +-- element around the inlines of the term of a DefinitionList item +tgNameDefListTerm :: Text +tgNameDefListTerm = "term" + +-- element around the blocks of a definition in a DefinitionList item +tgNameDefListDef :: Text +tgNameDefListDef = "def" + +-- optional element of the ShortCaption +tgNameShortCaption :: Text +tgNameShortCaption = "ShortCaption" + +-- element around the ColSpec of a Table +tgNameColspecs :: Text +tgNameColspecs = "colspecs" + +-- element around the header rows of a TableBody +tgNameBodyHeader :: Text +tgNameBodyHeader = "header" + +-- element around the body rows of a TableBody +tgNameBodyBody :: Text +tgNameBodyBody = "body" + +-- element around the inlines of a line in a LineBlock +tgNameLineItem :: Text +tgNameLineItem = "line" diff --git a/test/Tests/XML.hs b/test/Tests/XML.hs new file mode 100644 index 000000000..71175917c --- /dev/null +++ b/test/Tests/XML.hs @@ -0,0 +1,28 @@ +{-# LANGUAGE OverloadedStrings #-} +{- | +-- Module : Tests.XML +-- Copyright : Copyright (C) 2025- Massimiliano Farinella and John MacFarlane +-- License : GNU GPL, version 2 or above +-- +-- Maintainer : Massimiliano Farinella <[email protected]> +-- Stability : WIP +-- Portability : portable +Runs a roundtrip conversion of an AST trough the XML format: +- first from AST to XML (XML Writer), +- then back to AST (XML Reader), +- and checks that the two ASTs are the same +-} +module Tests.XML (tests) where + +import Control.Monad ((>=>)) +import Test.Tasty (TestTree) +import Test.Tasty.QuickCheck +import Tests.Helpers +import Text.Pandoc +import Text.Pandoc.Arbitrary () + +p_xml_roundtrip :: Pandoc -> Bool +p_xml_roundtrip d = d == purely (writeXML def {writerTemplate = Just mempty} >=> readXML def) d + +tests :: [TestTree] +tests = [testProperty "p_xml_roundtrip" p_xml_roundtrip]
\ No newline at end of file diff --git a/test/test-pandoc.hs b/test/test-pandoc.hs index 2f53c16be..6c6c2d1d4 100644 --- a/test/test-pandoc.hs +++ b/test/test-pandoc.hs @@ -50,6 +50,7 @@ import qualified Tests.Writers.RST import qualified Tests.Writers.AnnotatedTable import qualified Tests.Writers.TEI import qualified Tests.Writers.Markua +import qualified Tests.XML import qualified Tests.MediaBag import Text.Pandoc.Shared (inDirectory) @@ -59,6 +60,7 @@ tests pandocPath = testGroup "pandoc tests" , testGroup "Old" (Tests.Old.tests pandocPath) , testGroup "Shared" Tests.Shared.tests , testGroup "MediaBag" Tests.MediaBag.tests + , testGroup "XML" Tests.XML.tests , testGroup "Writers" [ testGroup "Native" Tests.Writers.Native.tests , testGroup "ConTeXt" Tests.Writers.ConTeXt.tests diff --git a/tools/pandoc-xml.dtd b/tools/pandoc-xml.dtd new file mode 100644 index 000000000..e787596ed --- /dev/null +++ b/tools/pandoc-xml.dtd @@ -0,0 +1,181 @@ +<!-- +A DTD for Pandoc XML format. +Copyright : Copyright (C) 2025- Massimiliano Farinella +License : GNU GPL, version 2 or above +Maintainer : Massimiliano Farinella <[email protected]> + +This is a DTD for the XML representation of Pandoc AST. +It's an equivalent of native and JSON formats, but modeled as XML. +You can validate Pandoc XML documents with this DTD, +but there are some limitations: +- a Pandoc Attr can contain arbitrary attributes, but it looks like you can't tell + an element has arbitrary attributes in a DTD +- some Pandoc attributes have constraints that can't be specified with a DTD +The Relax-NG and XML Schema translations of this specification overcome +some of those limitations (see pandoc-xml.rng and pandoc-xml.xsd). +--> +<!ELEMENT Pandoc (meta, blocks)> +<!ATTLIST Pandoc api-version CDATA #REQUIRED> + +<!ENTITY % block "Para | Plain | Header | Div | BlockQuote | HorizontalRule | BulletList | OrderedList | DefinitionList | Table | Figure | LineBlock | CodeBlock | RawBlock"> + +<!ENTITY % inline_element "Str | Space | Emph | Strong | Underline | Strikeout | Superscript | Subscript | SmallCaps | Quoted | Cite | Code | SoftBreak | LineBreak | Math | RawInline | Link | Image | Note | Span"> +<!ENTITY % inline "#PCDATA | %inline_element;"> + +<!ENTITY % attr "id ID #IMPLIED class CDATA #IMPLIED"> + +<!ENTITY % metavalue "MetaMap | MetaList | MetaBool | MetaString | MetaInlines | MetaBlocks"> + +<!ELEMENT meta (entry*)> +<!ELEMENT MetaMap (entry*)> + +<!ELEMENT entry (%metavalue;)*> +<!ATTLIST entry key CDATA #REQUIRED> + +<!ELEMENT MetaList (%metavalue;)*> + +<!ELEMENT MetaBool EMPTY> +<!ATTLIST MetaBool value (true | false) #REQUIRED> + +<!ELEMENT MetaString (#PCDATA)> + +<!ELEMENT MetaInlines (%inline;)*> + +<!ELEMENT MetaBlocks (%block;)*> + +<!ELEMENT blocks (%block;)*> + +<!ELEMENT Para (%inline;)*> + +<!ELEMENT Plain (%inline;)*> + +<!ELEMENT Header (%inline;)*> +<!ATTLIST Header + level CDATA "1" + %attr;> + +<!ELEMENT Div (%block;)* > +<!ATTLIST Div + custom-style CDATA #IMPLIED + %attr;> + +<!ELEMENT BlockQuote (%block;)*> + +<!ELEMENT HorizontalRule EMPTY> + +<!ELEMENT BulletList (item)+> + +<!ELEMENT OrderedList (item)+> +<!ATTLIST OrderedList + start CDATA "1" + number-style (DefaultStyle | Example | Decimal | LowerRoman | UpperRoman | LowerAlpha | UpperAlpha) "DefaultStyle" + number-delim (DefaultDelim | Period | OneParen | TwoParens) "DefaultDelim"> + +<!ELEMENT DefinitionList (item)+> + +<!ELEMENT item ((%block;)*|(term,def+))> +<!ELEMENT term (%inline;)*> +<!ELEMENT def (%block;)*> + +<!ELEMENT Table (Caption, colspecs, TableHead, TableBody+, TableFoot)> +<!ATTLIST Table + custom-style CDATA #IMPLIED + %attr;> +<!ELEMENT Caption (ShortCaption?, (%block;)*)> +<!ELEMENT ShortCaption (%inline;)*> +<!ELEMENT colspecs (ColSpec+)> +<!ELEMENT ColSpec EMPTY> +<!ATTLIST ColSpec + alignment (AlignLeft | AlignRight | AlignCenter | AlignDefault) "AlignDefault" + col-width CDATA "0"> +<!ELEMENT TableHead (Row*)> +<!ATTLIST TableHead %attr;> +<!ELEMENT TableFoot (Row*)> +<!ATTLIST TableFoot %attr;> +<!ELEMENT TableBody (header, body)> +<!ATTLIST TableBody + row-head-columns CDATA "0" + %attr;> +<!ELEMENT header (Row*)> +<!ELEMENT body (Row*)> +<!ELEMENT Row (Cell*)> +<!ATTLIST Row %attr;> +<!ELEMENT Cell (%block;)*> +<!ATTLIST Cell + alignment (AlignLeft | AlignRight | AlignCenter | AlignDefault) "AlignDefault" + row-span CDATA "1" + col-span CDATA "1" + %attr; > + +<!ELEMENT Figure (Caption,(%block;)*)> +<!ATTLIST Figure %attr;> + +<!ELEMENT LineBlock (line+)> +<!ELEMENT line (%inline;)*> + +<!ELEMENT CodeBlock (#PCDATA)> +<!ATTLIST CodeBlock %attr;> + +<!ELEMENT RawBlock (#PCDATA)> +<!ATTLIST RawBlock format CDATA #REQUIRED> + +<!ELEMENT Space EMPTY> +<!ATTLIST Space count CDATA "1"> + +<!ELEMENT Str EMPTY> +<!ATTLIST Str content CDATA ""> + +<!ELEMENT Emph (%inline;)*> +<!ELEMENT Strong (%inline;)*> +<!ELEMENT Underline (%inline;)*> +<!ELEMENT Strikeout (%inline;)*> +<!ELEMENT Superscript (%inline;)*> +<!ELEMENT Subscript (%inline;)*> +<!ELEMENT SmallCaps (%inline;)*> + +<!ELEMENT Span (%inline;)*> +<!ATTLIST Span + custom-style CDATA #IMPLIED + %attr;> + +<!ELEMENT Quoted (%inline;)*> +<!ATTLIST Quoted quote-type (SingleQuote | DoubleQuote) "DoubleQuote"> + +<!ELEMENT Math (#PCDATA)> +<!ATTLIST Math math-type (DisplayMath | InlineMath) "InlineMath"> + +<!ELEMENT RawInline (#PCDATA)> +<!ATTLIST RawInline format CDATA #REQUIRED> + +<!ELEMENT Cite (#PCDATA | citations | %inline_element;)*> +<!ELEMENT citations (Citation)+> +<!ELEMENT Citation (prefix?, suffix?)> +<!ELEMENT prefix (%inline;)*> +<!ELEMENT suffix (%inline;)*> +<!ATTLIST Citation + id CDATA #IMPLIED + note-num CDATA #IMPLIED + hash CDATA "0" + mode (AuthorInText | SuppressAuthor | NormalCitation) "AuthorInText"> + +<!ELEMENT Code (#PCDATA)> +<!ATTLIST Code %attr;> + +<!ENTITY % target "title CDATA #IMPLIED url CDATA #IMPLIED"> + +<!ELEMENT Image (%inline;)*> +<!ATTLIST Image + title CDATA #IMPLIED + src CDATA #IMPLIED + %attr;> + +<!ELEMENT Link (%inline;)*> +<!ATTLIST Link + title CDATA #IMPLIED + href CDATA #IMPLIED + %attr;> + +<!ELEMENT SoftBreak EMPTY> +<!ELEMENT LineBreak EMPTY> + +<!ELEMENT Note (%block;)*> diff --git a/tools/pandoc-xml.rnc b/tools/pandoc-xml.rnc new file mode 100644 index 000000000..59043d650 --- /dev/null +++ b/tools/pandoc-xml.rnc @@ -0,0 +1,250 @@ +# A RELAX NG schema for Pandoc XML format. +# Copyright : Copyright (C) 2025- Massimiliano Farinella +# License : GNU GPL, version 2 or above +# Maintainer : Massimiliano Farinella <[email protected]> +# +# This is a RELAX NG schema for the XML representation of Pandoc AST. +# It's an equivalent of native and JSON formats, but modeled as XML. +# You can use this schema to validate Pandoc XML documents. +# It's translated from pandoc-xml.dtd with the "Trang" software by James Clark, +# and adjusted manually to add some constraints: +# - elements with Attr can have arbitrary attributes (this is not possible with a DTD) +# - Header's "level", OrderedList's "start" and Cell's "rowspan" and "colspan" attributes +# must be a positive integer and are equal to 1 if not specified +# - column widths in ColSpec must be between 0 and 1 (inclusive, with 0=ColWidthDefault) +# - the "count" attribute in the "<Space>" element must be positive and equal to 1 if not specified + +namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0" + +Pandoc = element Pandoc { attlist_Pandoc, meta, blocks } +attlist_Pandoc &= attribute api-version { text } +block = + Para + | Plain + | Header + | Div + | BlockQuote + | HorizontalRule + | BulletList + | OrderedList + | DefinitionList + | Table + | Figure + | LineBlock + | CodeBlock + | RawBlock +inline_element = + Str + | Space + | Emph + | Strong + | Underline + | Strikeout + | Superscript + | Subscript + | SmallCaps + | Quoted + | Cite + | Code + | SoftBreak + | LineBreak + | Math + | RawInline + | Link + | Image + | Note + | Span +inline = text | inline_element +attr = + attribute id { xsd:ID }?, + attribute class { text }?, + attribute * { text }* +metavalue = + MetaMap | MetaList | MetaBool | MetaString | MetaInlines | MetaBlocks +meta = element meta { attlist_meta, entry* } +attlist_meta &= empty +MetaMap = element MetaMap { attlist_MetaMap, entry* } +attlist_MetaMap &= empty +entry = element entry { attlist_entry, metavalue* } +attlist_entry &= attribute key { text } +MetaList = element MetaList { attlist_MetaList, metavalue* } +attlist_MetaList &= empty +MetaBool = element MetaBool { attlist_MetaBool, empty } +attlist_MetaBool &= attribute value { "true" | "false" } +MetaString = element MetaString { attlist_MetaString, text } +attlist_MetaString &= empty +MetaInlines = element MetaInlines { attlist_MetaInlines, inline* } +attlist_MetaInlines &= empty +MetaBlocks = element MetaBlocks { attlist_MetaBlocks, block* } +attlist_MetaBlocks &= empty +blocks = element blocks { attlist_blocks, block* } +attlist_blocks &= empty +Para = element Para { attlist_Para, inline* } +attlist_Para &= empty +Plain = element Plain { attlist_Plain, inline* } +attlist_Plain &= empty +Header = element Header { attlist_Header, inline* } +attlist_Header &= + [ a:defaultValue = "1" ] attribute level { xsd:positiveInteger }?, + attr +Div = element Div { attlist_Div, block* } +attlist_Div &= attr +BlockQuote = element BlockQuote { attlist_BlockQuote, block* } +attlist_BlockQuote &= empty +HorizontalRule = + element HorizontalRule { attlist_HorizontalRule, empty } +attlist_HorizontalRule &= empty +BulletList = element BulletList { attlist_BulletList, item+ } +attlist_BulletList &= empty +OrderedList = element OrderedList { attlist_OrderedList, item+ } +attlist_OrderedList &= + [ a:defaultValue = "1" ] attribute start { xsd:positiveInteger }?, + [ a:defaultValue = "DefaultStyle" ] + attribute number-style { + "DefaultStyle" + | "Example" + | "Decimal" + | "LowerRoman" + | "UpperRoman" + | "LowerAlpha" + | "UpperAlpha" + }?, + [ a:defaultValue = "DefaultDelim" ] + attribute number-delim { + "DefaultDelim" | "Period" | "OneParen" | "TwoParens" + }? +DefinitionList = + element DefinitionList { attlist_DefinitionList, item+ } +attlist_DefinitionList &= empty +item = + element item { + attlist_item, + (block* | (term, def+)) + } +attlist_item &= empty +term = element term { attlist_term, inline* } +attlist_term &= empty +def = element def { attlist_def, block* } +attlist_def &= empty +Table = + element Table { + attlist_Table, Caption, colspecs, TableHead, TableBody+, TableFoot + } +attlist_Table &= attr +Caption = element Caption { attlist_Caption, ShortCaption?, block* } +attlist_Caption &= empty +ShortCaption = element ShortCaption { attlist_ShortCaption, inline* } +attlist_ShortCaption &= empty +colspecs = element colspecs { attlist_colspecs, ColSpec+ } +attlist_colspecs &= empty +ColSpec = element ColSpec { attlist_ColSpec, empty } +attlist_ColSpec &= + [ a:defaultValue = "AlignDefault" ] + attribute alignment { + "AlignLeft" | "AlignRight" | "AlignCenter" | "AlignDefault" + }?, + [ a:defaultValue = "0" ] + attribute col-width { + xsd:double { minInclusive = "0" maxInclusive = "1" } + }? +TableHead = element TableHead { attlist_TableHead, Row* } +attlist_TableHead &= attr +TableFoot = element TableFoot { attlist_TableFoot, Row* } +attlist_TableFoot &= attr +TableBody = element TableBody { attlist_TableBody, header, body } +attlist_TableBody &= + [ a:defaultValue = "0" ] attribute row-head-columns { text }?, + attr +header = element header { attlist_header, Row* } +attlist_header &= empty +body = element body { attlist_body, Row* } +attlist_body &= empty +Row = element Row { attlist_Row, Cell* } +attlist_Row &= attr +Cell = element Cell { attlist_Cell, block* } +attlist_Cell &= + [ a:defaultValue = "AlignDefault" ] + attribute alignment { + "AlignLeft" | "AlignRight" | "AlignCenter" | "AlignDefault" + }?, + [ a:defaultValue = "1" ] attribute row-span { xsd:positiveInteger }?, + [ a:defaultValue = "1" ] attribute col-span { xsd:positiveInteger }?, + attr +Figure = element Figure { attlist_Figure, Caption, block* } +attlist_Figure &= attr +LineBlock = element LineBlock { attlist_LineBlock, line+ } +attlist_LineBlock &= empty +line = element line { attlist_line, inline* } +attlist_line &= empty +CodeBlock = element CodeBlock { attlist_CodeBlock, text } +attlist_CodeBlock &= attr +RawBlock = element RawBlock { attlist_RawBlock, text } +attlist_RawBlock &= attribute format { text } +Space = element Space { attlist_Space, empty } +attlist_Space &= + [ a:defaultValue = "1" ] attribute count { xsd:positiveInteger }? +Str = element Str { attlist_Str, empty } +attlist_Str &= [ a:defaultValue = "" ] attribute content { text }? +Emph = element Emph { attlist_Emph, inline* } +attlist_Emph &= empty +Strong = element Strong { attlist_Strong, inline* } +attlist_Strong &= empty +Underline = element Underline { attlist_Underline, inline* } +attlist_Underline &= empty +Strikeout = element Strikeout { attlist_Strikeout, inline* } +attlist_Strikeout &= empty +Superscript = element Superscript { attlist_Superscript, inline* } +attlist_Superscript &= empty +Subscript = element Subscript { attlist_Subscript, inline* } +attlist_Subscript &= empty +SmallCaps = element SmallCaps { attlist_SmallCaps, inline* } +attlist_SmallCaps &= empty +Span = element Span { attlist_Span, inline* } +attlist_Span &= attr +Quoted = element Quoted { attlist_Quoted, inline* } +attlist_Quoted &= + [ a:defaultValue = "DoubleQuote" ] + attribute quote-type { "SingleQuote" | "DoubleQuote" }? +Math = element Math { attlist_Math, text } +attlist_Math &= + [ a:defaultValue = "InlineMath" ] + attribute math-type { "DisplayMath" | "InlineMath" }? +RawInline = element RawInline { attlist_RawInline, text } +attlist_RawInline &= attribute format { text } +Cite = + element Cite { attlist_Cite, (text | citations | inline_element)* } +attlist_Cite &= empty +citations = element citations { attlist_citations, Citation+ } +attlist_citations &= empty +Citation = element Citation { attlist_Citation, prefix?, suffix? } +prefix = element prefix { attlist_prefix, inline* } +attlist_prefix &= empty +suffix = element suffix { attlist_suffix, inline* } +attlist_suffix &= empty +attlist_Citation &= + attribute id { text }?, + attribute note-num { text }?, + [ a:defaultValue = "0" ] attribute hash { text }?, + [ a:defaultValue = "AuthorInText" ] + attribute mode { + "AuthorInText" | "SuppressAuthor" | "NormalCitation" + }? +Code = element Code { attlist_Code, text } +attlist_Code &= attr +Image = element Image { attlist_Image, inline* } +attlist_Image &= + attribute title { text }?, + attribute src { text }?, + attr +Link = element Link { attlist_Link, inline* } +attlist_Link &= + attribute title { text }?, + attribute href { text }?, + attr +SoftBreak = element SoftBreak { attlist_SoftBreak, empty } +attlist_SoftBreak &= empty +LineBreak = element LineBreak { attlist_LineBreak, empty } +attlist_LineBreak &= empty +Note = element Note { attlist_Note, block* } +attlist_Note &= empty +start = Pandoc diff --git a/tools/pandoc-xml.rng b/tools/pandoc-xml.rng new file mode 100644 index 000000000..38188f781 --- /dev/null +++ b/tools/pandoc-xml.rng @@ -0,0 +1,913 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +A RELAX NG schema for Pandoc XML format. +Copyright : Copyright (C) 2025- Massimiliano Farinella +License : GNU GPL, version 2 or above +Maintainer : Massimiliano Farinella <[email protected]> + +This is a RELAX NG schema for the XML representation of Pandoc AST. +It's an equivalent of native and JSON formats, but modeled as XML. +You can use this schema to validate Pandoc XML documents. +It's translated from pandoc-xml.dtd with the "Trang" software by James Clark, +and adjusted manually to add some constraints: +- elements with Attr can have arbitrary attributes (this is not possible with a DTD) +- Header's "level", OrderedList's "start" and Cell's "rowspan" and "colspan" attributes + must be a positive integer and are equal to 1 if not specified +- column widths in ColSpec must be between 0 and 1 (inclusive, with 0=ColWidthDefault) +- the "count" attribute in the "<Space>" element must be positive and equal to 1 if not specified +--> +<grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" + xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> + <define name="Pandoc"> + <element name="Pandoc"> + <ref name="attlist_Pandoc" /> + <ref name="meta" /> + <ref name="blocks" /> + </element> + </define> + <define name="attlist_Pandoc" combine="interleave"> + <attribute name="api-version" /> + </define> + <define name="block"> + <choice> + <ref name="Para" /> + <ref name="Plain" /> + <ref name="Header" /> + <ref name="Div" /> + <ref name="BlockQuote" /> + <ref name="HorizontalRule" /> + <ref name="BulletList" /> + <ref name="OrderedList" /> + <ref name="DefinitionList" /> + <ref name="Table" /> + <ref name="Figure" /> + <ref name="LineBlock" /> + <ref name="CodeBlock" /> + <ref name="RawBlock" /> + </choice> + </define> + <define name="inline_element"> + <choice> + <ref name="Str" /> + <ref name="Space" /> + <ref name="Emph" /> + <ref name="Strong" /> + <ref name="Underline" /> + <ref name="Strikeout" /> + <ref name="Superscript" /> + <ref name="Subscript" /> + <ref name="SmallCaps" /> + <ref name="Quoted" /> + <ref name="Cite" /> + <ref name="Code" /> + <ref name="SoftBreak" /> + <ref name="LineBreak" /> + <ref name="Math" /> + <ref name="RawInline" /> + <ref name="Link" /> + <ref name="Image" /> + <ref name="Note" /> + <ref name="Span" /> + </choice> + </define> + <define name="inline"> + <choice> + <text /> + <ref name="inline_element" /> + </choice> + </define> + <define name="attr"> + <optional> + <attribute name="id"> + <data type="ID" /> + </attribute> + </optional> + <optional> + <attribute name="class" /> + </optional> + <zeroOrMore> + <attribute> + <anyName /> + </attribute> + </zeroOrMore> + </define> + <define name="metavalue"> + <choice> + <ref name="MetaMap" /> + <ref name="MetaList" /> + <ref name="MetaBool" /> + <ref name="MetaString" /> + <ref name="MetaInlines" /> + <ref name="MetaBlocks" /> + </choice> + </define> + <define name="meta"> + <element name="meta"> + <ref name="attlist_meta" /> + <zeroOrMore> + <ref name="entry" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_meta" combine="interleave"> + <empty /> + </define> + <define name="MetaMap"> + <element name="MetaMap"> + <ref name="attlist_MetaMap" /> + <zeroOrMore> + <ref name="entry" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_MetaMap" combine="interleave"> + <empty /> + </define> + <define name="entry"> + <element name="entry"> + <ref name="attlist_entry" /> + <zeroOrMore> + <ref name="metavalue" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_entry" combine="interleave"> + <attribute name="key" /> + </define> + <define name="MetaList"> + <element name="MetaList"> + <ref name="attlist_MetaList" /> + <zeroOrMore> + <ref name="metavalue" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_MetaList" combine="interleave"> + <empty /> + </define> + <define name="MetaBool"> + <element name="MetaBool"> + <ref name="attlist_MetaBool" /> + <empty /> + </element> + </define> + <define name="attlist_MetaBool" combine="interleave"> + <attribute name="value"> + <choice> + <value>true</value> + <value>false</value> + </choice> + </attribute> + </define> + <define name="MetaString"> + <element name="MetaString"> + <ref name="attlist_MetaString" /> + <text /> + </element> + </define> + <define name="attlist_MetaString" combine="interleave"> + <empty /> + </define> + <define name="MetaInlines"> + <element name="MetaInlines"> + <ref name="attlist_MetaInlines" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_MetaInlines" combine="interleave"> + <empty /> + </define> + <define name="MetaBlocks"> + <element name="MetaBlocks"> + <ref name="attlist_MetaBlocks" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_MetaBlocks" combine="interleave"> + <empty /> + </define> + <define name="blocks"> + <element name="blocks"> + <ref name="attlist_blocks" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_blocks" combine="interleave"> + <empty /> + </define> + <define name="Para"> + <element name="Para"> + <ref name="attlist_Para" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Para" combine="interleave"> + <empty /> + </define> + <define name="Plain"> + <element name="Plain"> + <ref name="attlist_Plain" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Plain" combine="interleave"> + <empty /> + </define> + <define name="Header"> + <element name="Header"> + <ref name="attlist_Header" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Header" combine="interleave"> + <optional> + <attribute name="level" a:defaultValue="1"> + <data type="positiveInteger" /> + </attribute> + </optional> + <ref name="attr" /> + </define> + <define name="Div"> + <element name="Div"> + <ref name="attlist_Div" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Div" combine="interleave"> + <ref name="attr" /> + </define> + <define name="BlockQuote"> + <element name="BlockQuote"> + <ref name="attlist_BlockQuote" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_BlockQuote" combine="interleave"> + <empty /> + </define> + <define name="HorizontalRule"> + <element name="HorizontalRule"> + <ref name="attlist_HorizontalRule" /> + <empty /> + </element> + </define> + <define name="attlist_HorizontalRule" combine="interleave"> + <empty /> + </define> + <define name="BulletList"> + <element name="BulletList"> + <ref name="attlist_BulletList" /> + <oneOrMore> + <ref name="item" /> + </oneOrMore> + </element> + </define> + <define name="attlist_BulletList" combine="interleave"> + <empty /> + </define> + <define name="OrderedList"> + <element name="OrderedList"> + <ref name="attlist_OrderedList" /> + <oneOrMore> + <ref name="item" /> + </oneOrMore> + </element> + </define> + <define name="attlist_OrderedList" combine="interleave"> + <optional> + <attribute name="start" a:defaultValue="1"> + <data type="positiveInteger" /> + </attribute> + </optional> + <optional> + <attribute name="number-style" a:defaultValue="DefaultStyle"> + <choice> + <value>DefaultStyle</value> + <value>Example</value> + <value>Decimal</value> + <value>LowerRoman</value> + <value>UpperRoman</value> + <value>LowerAlpha</value> + <value>UpperAlpha</value> + </choice> + </attribute> + </optional> + <optional> + <attribute name="number-delim" a:defaultValue="DefaultDelim"> + <choice> + <value>DefaultDelim</value> + <value>Period</value> + <value>OneParen</value> + <value>TwoParens</value> + </choice> + </attribute> + </optional> + </define> + <define name="DefinitionList"> + <element name="DefinitionList"> + <ref name="attlist_DefinitionList" /> + <oneOrMore> + <ref name="item" /> + </oneOrMore> + </element> + </define> + <define name="attlist_DefinitionList" combine="interleave"> + <empty /> + </define> + <define name="item"> + <element name="item"> + <ref name="attlist_item" /> + <choice> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + <group> + <ref name="term" /> + <oneOrMore> + <ref name="def" /> + </oneOrMore> + </group> + </choice> + </element> + </define> + <define name="attlist_item" combine="interleave"> + <empty /> + </define> + <define name="term"> + <element name="term"> + <ref name="attlist_term" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_term" combine="interleave"> + <empty /> + </define> + <define name="def"> + <element name="def"> + <ref name="attlist_def" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_def" combine="interleave"> + <empty /> + </define> + <define name="Table"> + <element name="Table"> + <ref name="attlist_Table" /> + <ref name="Caption" /> + <ref name="colspecs" /> + <ref name="TableHead" /> + <oneOrMore> + <ref name="TableBody" /> + </oneOrMore> + <ref name="TableFoot" /> + </element> + </define> + <define name="attlist_Table" combine="interleave"> + <ref name="attr" /> + </define> + <define name="Caption"> + <element name="Caption"> + <ref name="attlist_Caption" /> + <optional> + <ref name="ShortCaption" /> + </optional> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Caption" combine="interleave"> + <empty /> + </define> + <define name="ShortCaption"> + <element name="ShortCaption"> + <ref name="attlist_ShortCaption" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_ShortCaption" combine="interleave"> + <empty /> + </define> + <define name="colspecs"> + <element name="colspecs"> + <ref name="attlist_colspecs" /> + <oneOrMore> + <ref name="ColSpec" /> + </oneOrMore> + </element> + </define> + <define name="attlist_colspecs" combine="interleave"> + <empty /> + </define> + <define name="ColSpec"> + <element name="ColSpec"> + <ref name="attlist_ColSpec" /> + <empty /> + </element> + </define> + <define name="attlist_ColSpec" combine="interleave"> + <optional> + <attribute name="alignment" a:defaultValue="AlignDefault"> + <choice> + <value>AlignLeft</value> + <value>AlignRight</value> + <value>AlignCenter</value> + <value>AlignDefault</value> + </choice> + </attribute> + </optional> + <optional> + <attribute name="col-width" a:defaultValue="0"> + <data type="double"> + <param name="minInclusive">0</param> + <param name="maxInclusive">1</param> + </data> + </attribute> + </optional> + </define> + <define name="TableHead"> + <element name="TableHead"> + <ref name="attlist_TableHead" /> + <zeroOrMore> + <ref name="Row" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_TableHead" combine="interleave"> + <ref name="attr" /> + </define> + <define name="TableFoot"> + <element name="TableFoot"> + <ref name="attlist_TableFoot" /> + <zeroOrMore> + <ref name="Row" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_TableFoot" combine="interleave"> + <ref name="attr" /> + </define> + <define name="TableBody"> + <element name="TableBody"> + <ref name="attlist_TableBody" /> + <ref name="header" /> + <ref name="body" /> + </element> + </define> + <define name="attlist_TableBody" combine="interleave"> + <optional> + <attribute name="row-head-columns" a:defaultValue="0" /> + </optional> + <ref name="attr" /> + </define> + <define name="header"> + <element name="header"> + <ref name="attlist_header" /> + <zeroOrMore> + <ref name="Row" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_header" combine="interleave"> + <empty /> + </define> + <define name="body"> + <element name="body"> + <ref name="attlist_body" /> + <zeroOrMore> + <ref name="Row" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_body" combine="interleave"> + <empty /> + </define> + <define name="Row"> + <element name="Row"> + <ref name="attlist_Row" /> + <zeroOrMore> + <ref name="Cell" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Row" combine="interleave"> + <ref name="attr" /> + </define> + <define name="Cell"> + <element name="Cell"> + <ref name="attlist_Cell" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Cell" combine="interleave"> + <optional> + <attribute name="alignment" a:defaultValue="AlignDefault"> + <choice> + <value>AlignLeft</value> + <value>AlignRight</value> + <value>AlignCenter</value> + <value>AlignDefault</value> + </choice> + </attribute> + </optional> + <optional> + <attribute name="row-span" a:defaultValue="1"> + <data type="positiveInteger" /> + </attribute> + </optional> + <optional> + <attribute name="col-span" a:defaultValue="1"> + <data type="positiveInteger" /> + </attribute> + </optional> + <ref name="attr" /> + </define> + <define name="Figure"> + <element name="Figure"> + <ref name="attlist_Figure" /> + <ref name="Caption" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Figure" combine="interleave"> + <ref name="attr" /> + </define> + <define name="LineBlock"> + <element name="LineBlock"> + <ref name="attlist_LineBlock" /> + <oneOrMore> + <ref name="line" /> + </oneOrMore> + </element> + </define> + <define name="attlist_LineBlock" combine="interleave"> + <empty /> + </define> + <define name="line"> + <element name="line"> + <ref name="attlist_line" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_line" combine="interleave"> + <empty /> + </define> + <define name="CodeBlock"> + <element name="CodeBlock"> + <ref name="attlist_CodeBlock" /> + <text /> + </element> + </define> + <define name="attlist_CodeBlock" combine="interleave"> + <ref name="attr" /> + </define> + <define name="RawBlock"> + <element name="RawBlock"> + <ref name="attlist_RawBlock" /> + <text /> + </element> + </define> + <define name="attlist_RawBlock" combine="interleave"> + <attribute name="format" /> + </define> + <define name="Space"> + <element name="Space"> + <ref name="attlist_Space" /> + <empty /> + </element> + </define> + <define name="attlist_Space" combine="interleave"> + <optional> + <attribute name="count" a:defaultValue="1"> + <data type="positiveInteger" /> + </attribute> + </optional> + </define> + <define name="Str"> + <element name="Str"> + <ref name="attlist_Str" /> + <empty /> + </element> + </define> + <define name="attlist_Str" combine="interleave"> + <optional> + <attribute name="content" a:defaultValue="" /> + </optional> + </define> + <define name="Emph"> + <element name="Emph"> + <ref name="attlist_Emph" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Emph" combine="interleave"> + <empty /> + </define> + <define name="Strong"> + <element name="Strong"> + <ref name="attlist_Strong" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Strong" combine="interleave"> + <empty /> + </define> + <define name="Underline"> + <element name="Underline"> + <ref name="attlist_Underline" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Underline" combine="interleave"> + <empty /> + </define> + <define name="Strikeout"> + <element name="Strikeout"> + <ref name="attlist_Strikeout" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Strikeout" combine="interleave"> + <empty /> + </define> + <define name="Superscript"> + <element name="Superscript"> + <ref name="attlist_Superscript" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Superscript" combine="interleave"> + <empty /> + </define> + <define name="Subscript"> + <element name="Subscript"> + <ref name="attlist_Subscript" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Subscript" combine="interleave"> + <empty /> + </define> + <define name="SmallCaps"> + <element name="SmallCaps"> + <ref name="attlist_SmallCaps" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_SmallCaps" combine="interleave"> + <empty /> + </define> + <define name="Span"> + <element name="Span"> + <ref name="attlist_Span" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Span" combine="interleave"> + <ref name="attr" /> + </define> + <define name="Quoted"> + <element name="Quoted"> + <ref name="attlist_Quoted" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Quoted" combine="interleave"> + <optional> + <attribute name="quote-type" a:defaultValue="DoubleQuote"> + <choice> + <value>SingleQuote</value> + <value>DoubleQuote</value> + </choice> + </attribute> + </optional> + </define> + <define name="Math"> + <element name="Math"> + <ref name="attlist_Math" /> + <text /> + </element> + </define> + <define name="attlist_Math" combine="interleave"> + <optional> + <attribute name="math-type" a:defaultValue="InlineMath"> + <choice> + <value>DisplayMath</value> + <value>InlineMath</value> + </choice> + </attribute> + </optional> + </define> + <define name="RawInline"> + <element name="RawInline"> + <ref name="attlist_RawInline" /> + <text /> + </element> + </define> + <define name="attlist_RawInline" combine="interleave"> + <attribute name="format" /> + </define> + <define name="Cite"> + <element name="Cite"> + <ref name="attlist_Cite" /> + <zeroOrMore> + <choice> + <text /> + <ref name="citations" /> + <ref name="inline_element" /> + </choice> + </zeroOrMore> + </element> + </define> + <define name="attlist_Cite" combine="interleave"> + <empty /> + </define> + <define name="citations"> + <element name="citations"> + <ref name="attlist_citations" /> + <oneOrMore> + <ref name="Citation" /> + </oneOrMore> + </element> + </define> + <define name="attlist_citations" combine="interleave"> + <empty /> + </define> + <define name="Citation"> + <element name="Citation"> + <ref name="attlist_Citation" /> + <optional> + <ref name="prefix" /> + </optional> + <optional> + <ref name="suffix" /> + </optional> + </element> + </define> + <define name="prefix"> + <element name="prefix"> + <ref name="attlist_prefix" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_prefix" combine="interleave"> + <empty /> + </define> + <define name="suffix"> + <element name="suffix"> + <ref name="attlist_suffix" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_suffix" combine="interleave"> + <empty /> + </define> + <define name="attlist_Citation" combine="interleave"> + <optional> + <attribute name="id" /> + </optional> + <optional> + <attribute name="note-num" /> + </optional> + <optional> + <attribute name="hash" a:defaultValue="0" /> + </optional> + <optional> + <attribute name="mode" a:defaultValue="AuthorInText"> + <choice> + <value>AuthorInText</value> + <value>SuppressAuthor</value> + <value>NormalCitation</value> + </choice> + </attribute> + </optional> + </define> + <define name="Code"> + <element name="Code"> + <ref name="attlist_Code" /> + <text /> + </element> + </define> + <define name="attlist_Code" combine="interleave"> + <ref name="attr" /> + </define> + <define name="Image"> + <element name="Image"> + <ref name="attlist_Image" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Image" combine="interleave"> + <optional> + <attribute name="title"/> + </optional> + <optional> + <attribute name="src"/> + </optional> + <ref name="attr"/> + </define> + <define name="Link"> + <element name="Link"> + <ref name="attlist_Link" /> + <zeroOrMore> + <ref name="inline" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Link" combine="interleave"> + <optional> + <attribute name="title"/> + </optional> + <optional> + <attribute name="href"/> + </optional> + <ref name="attr"/> + </define> + <define name="SoftBreak"> + <element name="SoftBreak"> + <ref name="attlist_SoftBreak" /> + <empty /> + </element> + </define> + <define name="attlist_SoftBreak" combine="interleave"> + <empty /> + </define> + <define name="LineBreak"> + <element name="LineBreak"> + <ref name="attlist_LineBreak" /> + <empty /> + </element> + </define> + <define name="attlist_LineBreak" combine="interleave"> + <empty /> + </define> + <define name="Note"> + <element name="Note"> + <ref name="attlist_Note" /> + <zeroOrMore> + <ref name="block" /> + </zeroOrMore> + </element> + </define> + <define name="attlist_Note" combine="interleave"> + <empty /> + </define> + <start> + <choice> + <ref name="Pandoc" /> + </choice> + </start> +</grammar> diff --git a/tools/pandoc-xml.xsd b/tools/pandoc-xml.xsd new file mode 100644 index 000000000..01834f80f --- /dev/null +++ b/tools/pandoc-xml.xsd @@ -0,0 +1,602 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +A XML Schema definition for Pandoc XML format. +Copyright : Copyright (C) 2025- Massimiliano Farinella +License : GNU GPL, version 2 or above +Maintainer : Massimiliano Farinella <[email protected]> + +This is a XML Schema schema for the XML representation of Pandoc AST. +It's an equivalent of native and JSON formats, but modeled as XML. +You can use this schema to validate Pandoc XML documents. +It's translated from pandoc-xml.dtd with the "Trang" software by James Clark, +and adjusted manually to add some constraints: +- elements with Attr can have arbitrary attributes (this is not possible with a DTD) +- Header's "level", OrderedList's "start" and Cell's "rowspan" and "colspan" attributes + must be a positive integer and are equal to 1 if not specified +- column widths in ColSpec must be between 0 and 1 (inclusive, with 0=ColWidthDefault) +- the "count" attribute in the "<Space>" element must be a positive integer, + equal to 1 if not specified +- OrderedList's number style and delimiter, ColSpec's and Cell's alignment, + Quoted's QuoteType and Math's Mathtype + can only take the values specified in pandoc-types +- TableBody's row-head-columns must be zero (default, when not specified) or a positive integer +--> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> + <xs:element name="Pandoc"> + <xs:complexType> + <xs:sequence> + <xs:element ref="meta"/> + <xs:element ref="blocks"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_Pandoc"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Pandoc"> + <xs:attribute name="api-version" use="required"/> + </xs:attributeGroup> + <xs:element name="block" abstract="true"/> + <xs:element name="inline_element" abstract="true"/> + <xs:group name="inline"> + <xs:sequence> + <xs:element minOccurs="0" ref="inline_element"/> + </xs:sequence> + </xs:group> + <xs:attributeGroup name="attr"> + <xs:attribute name="id" type="xs:ID"/> + <xs:attribute name="class"/> + <xs:anyAttribute processContents="skip"/> + </xs:attributeGroup> + <xs:element name="metavalue" abstract="true"/> + <xs:element name="meta"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="entry"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="MetaMap" substitutionGroup="metavalue"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="entry"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="entry"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="metavalue"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_entry"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_entry"> + <xs:attribute name="key" use="required"/> + </xs:attributeGroup> + <xs:element name="MetaList" substitutionGroup="metavalue"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="metavalue"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="MetaBool" substitutionGroup="metavalue"> + <xs:complexType> + <xs:attributeGroup ref="attlist_MetaBool"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_MetaBool"> + <xs:attribute name="value" use="required"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="true"/> + <xs:enumeration value="false"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:attributeGroup> + <xs:element name="MetaString" substitutionGroup="metavalue" type="xs:string"/> + <xs:element name="MetaInlines" substitutionGroup="metavalue"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="MetaBlocks" substitutionGroup="metavalue"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="blocks"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="Para" substitutionGroup="block"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Plain" substitutionGroup="block"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Header" substitutionGroup="block"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + <xs:attributeGroup ref="attlist_Header"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Header"> + <xs:attribute name="level" default="1" type="xs:positiveInteger"/> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="Div" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_Div"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Div"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="BlockQuote" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="HorizontalRule" substitutionGroup="block"> + <xs:complexType/> + </xs:element> + <xs:element name="BulletList" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element maxOccurs="unbounded" ref="item"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="OrderedList" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element maxOccurs="unbounded" ref="item"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_OrderedList"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_OrderedList"> + <xs:attribute name="start" default="1" type="xs:positiveInteger"/> + <xs:attribute name="number-style" default="DefaultStyle"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="DefaultStyle"/> + <xs:enumeration value="Example"/> + <xs:enumeration value="Decimal"/> + <xs:enumeration value="LowerRoman"/> + <xs:enumeration value="UpperRoman"/> + <xs:enumeration value="LowerAlpha"/> + <xs:enumeration value="UpperAlpha"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="number-delim" default="DefaultDelim"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="DefaultDelim"/> + <xs:enumeration value="Period"/> + <xs:enumeration value="OneParen"/> + <xs:enumeration value="TwoParens"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:attributeGroup> + <xs:element name="DefinitionList" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element maxOccurs="unbounded" ref="item"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="item"> + <xs:complexType> + <xs:choice> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + <xs:sequence> + <xs:element ref="term"/> + <xs:element maxOccurs="unbounded" ref="def"/> + </xs:sequence> + </xs:choice> + </xs:complexType> + </xs:element> + <xs:element name="term"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="def"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="Table" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element ref="Caption"/> + <xs:element ref="colspecs"/> + <xs:element ref="TableHead"/> + <xs:element maxOccurs="unbounded" ref="TableBody"/> + <xs:element ref="TableFoot"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_Table"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Table"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="Caption"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" ref="ShortCaption"/> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="ShortCaption"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="colspecs"> + <xs:complexType> + <xs:sequence> + <xs:element maxOccurs="unbounded" ref="ColSpec"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="ColSpec"> + <xs:complexType> + <xs:attributeGroup ref="attlist_ColSpec"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_ColSpec"> + <xs:attribute name="alignment" default="AlignDefault"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="AlignLeft"/> + <xs:enumeration value="AlignRight"/> + <xs:enumeration value="AlignCenter"/> + <xs:enumeration value="AlignDefault"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="col-width" default="0"> + <xs:simpleType> + <xs:restriction base="xs:double"> + <xs:minInclusive value="0"/> + <xs:maxInclusive value="1"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:attributeGroup> + <xs:element name="TableHead"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_TableHead"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_TableHead"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="TableFoot"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_TableFoot"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_TableFoot"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="TableBody"> + <xs:complexType> + <xs:sequence> + <xs:element ref="header"/> + <xs:element ref="body"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_TableBody"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_TableBody"> + <xs:attribute name="row-head-columns" default="0"/> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="header"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="body"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="Row"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="Row"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="Cell"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_Row"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Row"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="Cell"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_Cell"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Cell"> + <xs:attribute name="alignment" default="AlignDefault"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="AlignLeft"/> + <xs:enumeration value="AlignRight"/> + <xs:enumeration value="AlignCenter"/> + <xs:enumeration value="AlignDefault"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="row-span" default="1" type="xs:positiveInteger"/> + <xs:attribute name="col-span" default="1" type="xs:positiveInteger"/> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="Figure" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element ref="Caption"/> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_Figure"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Figure"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="LineBlock" substitutionGroup="block"> + <xs:complexType> + <xs:sequence> + <xs:element maxOccurs="unbounded" ref="line"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="line"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="CodeBlock" substitutionGroup="block"> + <xs:complexType mixed="true"> + <xs:attributeGroup ref="attlist_CodeBlock"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_CodeBlock"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="RawBlock" substitutionGroup="block"> + <xs:complexType mixed="true"> + <xs:attributeGroup ref="attlist_RawBlock"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_RawBlock"> + <xs:attribute name="format" use="required"/> + </xs:attributeGroup> + <xs:element name="Space" substitutionGroup="inline_element"> + <xs:complexType> + <xs:attributeGroup ref="attlist_Space"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Space"> + <xs:attribute name="count" default="1" type="xs:positiveInteger"/> + </xs:attributeGroup> + <xs:element name="Str" substitutionGroup="inline_element"> + <xs:complexType> + <xs:attributeGroup ref="attlist_Str"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Str"> + <xs:attribute name="content" default=""/> + </xs:attributeGroup> + <xs:element name="Emph" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Strong" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Underline" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Strikeout" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Superscript" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Subscript" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="SmallCaps" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="Span" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + <xs:attributeGroup ref="attlist_Span"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Span"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="Quoted" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + <xs:attributeGroup ref="attlist_Quoted"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Quoted"> + <xs:attribute name="quote-type" default="DoubleQuote"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="SingleQuote"/> + <xs:enumeration value="DoubleQuote"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:attributeGroup> + <xs:element name="Math" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:attributeGroup ref="attlist_Math"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Math"> + <xs:attribute name="math-type" default="InlineMath"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="DisplayMath"/> + <xs:enumeration value="InlineMath"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:attributeGroup> + <xs:element name="RawInline" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:attributeGroup ref="attlist_RawInline"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_RawInline"> + <xs:attribute name="format" use="required"/> + </xs:attributeGroup> + <xs:element name="Cite" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:choice minOccurs="0" maxOccurs="unbounded"> + <xs:element ref="citations"/> + <xs:element ref="inline_element"/> + </xs:choice> + </xs:complexType> + </xs:element> + <xs:element name="citations"> + <xs:complexType> + <xs:sequence> + <xs:element maxOccurs="unbounded" ref="Citation"/> + </xs:sequence> + </xs:complexType> + </xs:element> + <xs:element name="Citation"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" ref="prefix"/> + <xs:element minOccurs="0" ref="suffix"/> + </xs:sequence> + <xs:attributeGroup ref="attlist_Citation"/> + </xs:complexType> + </xs:element> + <xs:element name="prefix"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:element name="suffix"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Citation"> + <xs:attribute name="id"/> + <xs:attribute name="note-num"/> + <xs:attribute name="hash" default="0"/> + <xs:attribute name="mode" default="AuthorInText"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="AuthorInText"/> + <xs:enumeration value="SuppressAuthor"/> + <xs:enumeration value="NormalCitation"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:attributeGroup> + <xs:element name="Code" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:attributeGroup ref="attlist_Code"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Code"> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="Image" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + <xs:attributeGroup ref="attlist_Image"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Image"> + <xs:attribute name="title"/> + <xs:attribute name="src"/> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="Link" substitutionGroup="inline_element"> + <xs:complexType mixed="true"> + <xs:group minOccurs="0" maxOccurs="unbounded" ref="inline"/> + <xs:attributeGroup ref="attlist_Link"/> + </xs:complexType> + </xs:element> + <xs:attributeGroup name="attlist_Link"> + <xs:attribute name="title"/> + <xs:attribute name="href"/> + <xs:attributeGroup ref="attr"/> + </xs:attributeGroup> + <xs:element name="SoftBreak" substitutionGroup="inline_element"> + <xs:complexType/> + </xs:element> + <xs:element name="LineBreak" substitutionGroup="inline_element"> + <xs:complexType/> + </xs:element> + <xs:element name="Note" substitutionGroup="inline_element"> + <xs:complexType> + <xs:sequence> + <xs:element minOccurs="0" maxOccurs="unbounded" ref="block"/> + </xs:sequence> + </xs:complexType> + </xs:element> +</xs:schema> |
