aboutsummaryrefslogtreecommitdiff
path: root/test/docx
AgeCommit message (Collapse)Author
2025-11-30Docx reader: Handle REF link instruction (#11296)Ezwal
This PR aims to handle a common run field instruction (fieldInstr) from docx format : REF, specifically those with the "link" switch \h. In word software, you can create REF field instruction with the Cross-reference button. You can create cross-reference to many things such as Equation, Table, Title...
2025-10-24Use latest dev citeproc.John MacFarlane
2025-10-18Update to use latest dev citeproc.John MacFarlane
Fixed golden test regeneration in Docx reader test.
2025-10-15Docx writer: properly handle nested comment spans.John MacFarlane
Patch credit: @mmourino. Closes #8189. Closes #6959.
2025-09-17Docx reader: properly calculate table column widths.John MacFarlane
Previously we assumed that every table took up the full text width. Now we read the text width from the document's sectPr. Closes #9837. Closes #11147.
2025-08-04reference.docx: don't left-align table header rown.west
See #11019. Previously, centering tables in `reference.docx` would leave the header row left-aligned. Why the OOXML 'standard' would allow table elements to be aligned differently from the rest of the table in the first place is anyone's guess.
2025-06-10Fix docx golden tests for East Asian default style changes.John MacFarlane
2025-04-05Docx writer: preserve Relationships for images from reference docx.John MacFarlane
This should allow one to include an image in a reference.docx and reference it in an openxml template. Closes #10759.
2025-03-05Fix invalid OOXML in definition_list.docx test.John MacFarlane
Closes #10394.
2025-02-19Revert "Docx reader and writer: support row heads."John MacFarlane
This reverts commit cbe67b9602a736976ef6921aefbbc60d51c6755a. Word sets `w:firstColumn="1"` by default for tables. You have to find the Table Design tab and explicitly uncheck "First Column" to make this go away. In most cases, I don't think writers intend to designate the first column as a row head, so this commit is going to produce unexpected results. In addition, because of the table normalization done by pandoc-type's `tableWith`, any table containing a colspanned cell in the left-hand column will get broken if the first column is designated a row head. For these reasons it seems best to revert this change, which was made in response to #9495. Closes #10627.
2025-01-31Docx writer: repeat reference doc's sectPr for each new section.John MacFarlane
Previously we were only carrying over the reference doc's sectPr at the end of the document, so it wouldn't affect the intermediate sections that are now added if `--top-level-division` is `chapter` or `part`. This could lead to bad results (e.g. page numbering starting only on the last chapter). Closes #10577.
2025-01-31Update docx golden tests for reference doc changes.John MacFarlane
2025-01-10Docx reader and writer: support row heads.John MacFarlane
Reader: When `w:tblLook` has `w:firstColumn` set (or an equivalent bit mask), we set row heads = 1 in the AST. Writer: set `w:firstColumn` in `w:tblLook` when there are row heads. (Word only allows one, so this is triggered by any number of row heads > 0.) Closes #9495.
2024-12-22Docx writer: restart footnotes by section by default.John MacFarlane
This can be overridden by a final sectPr element in the body of the reference.docx. It will only change things for `--top-level-division=chapter`, since only top-level chapters are put in separate sections. For that use it will mean that footnote numbers start over with each chapter, which is usually what is wanted. Closes #2773.
2024-12-05Docx reader: parse index references as empty Spans.John MacFarlane
See #10171.
2024-09-30Fix invalid XML in test/docx/normalize.docx.John MacFarlane
Closes #10242.
2024-07-27Docx writer: fix regression with nested lists.John MacFarlane
Closes #9994. The bug affects e.g. ordered lists with bullet sublists; after the sublist the top-level list reverts to bullets instead of being properly numbered. This regression was introduced in version 3.2.1 and was caused by commit f5531f1.
2024-06-22OpenXML writer: be craftier in adding East Asian font hints.John MacFarlane
In some cases we need to break up a long text run including both western and East Asian text, so that the punctuation in the western text doesn't become double-wide. Closes #9817.
2024-06-12Docx reader: improve handling of captions.John MacFarlane
- Turn captioned images into Figure elements. Closes #9391. - Improve the logic for associating elements with captions. Closes #9358. - Ensure that captions that can't be associated with an element aren't just silently dropped. Closes #9610.
2024-06-04Docx reader: support task lists.John MacFarlane
This also fixes a small bug in parsing delimiters in numbered lists, which led to the default delimiter being used wrongly in some cases. Closes #8211.
2024-06-04T.P.Writers.Shared: export toTaskListItem instead of isTaskList.John MacFarlane
This is more useful. Use this in OpenXML and HTML writers.
2024-06-04Docx writer: better formatting for task lists.John MacFarlane
Task lists are now properly formatted, with no bullet. In addition, we have removed an expensive generic traverse to remove Space elements, and replaced it with code in `inlinesToOpenXML`. This should give better performance; it also reduces XML size in the metadata, which wasn't previously affected by the de-Spacing. TODO: parse this in the reader so that we can have task lists round-trip. Closes #5198.
2024-06-01Docx writer: omit jc attribute on table cells with AlignDefault.John MacFarlane
Closes #5662.
2024-06-01Docx reader: react to "left" value on jc attribute.John MacFarlane
Also fix tests.
2024-05-31Fix metadata in docx writer.John MacFarlane
The new OpenXML template had spaces for metadata that need to be filled with OpenXML fragments with the proper shape. This patch ensures that everything is the right shape. Closes #9828.
2024-05-29Docx writer: add eastAsia font hints to `w:r`.John MacFarlane
We do this when the text in the run contains any CJK characters. This ensures that ambiguous code points (e.g. quotation marks) will be represented as "wide" characters when together with CJK characters. Closes #9817.
2024-05-19Update reference docx for docx test.John MacFarlane
2024-05-19Fix validation errors in docx golden test.John MacFarlane
2024-05-19Fix abstract-title in openxml template.John MacFarlane
2024-05-19Allow OpenXML templates to be used with `docx`.John MacFarlane
The `--reference-doc` option allows customization of styles in docx output, but it does not allow one to adjust the content of the output (e.g., changing the order in which metadata, the table of contents, and the body of the document are displayed), or adding boilerplate text before or after the document body. For these changes, one can now use `--template` with an OpenXML template. (See the default `openxml` template for a sample.) This patch also allows `--include-before-body` and `--include-after-body` to be used with `docx` output. The included files must be OpenXML fragments suitable for inclusion in the document body. Closes #8338 (`--include-before-body`, `--include-after-body`). Closes #9069 (a custom template can be used to omit the title page). Closes #7256. Closes #2928.
2024-05-18Cleaned up Abstract Title and Subtitle in default reference docx.John MacFarlane
Center Subtitle, remove color.
2024-04-18Docx reader: fix anchor in header after anchor (#9626)mbracke
When the last parPart before a header was a bookmark, no span with an anchor was added for a bookmark in the header. But the function that adds header anchors to the anchor map, needs a span with an anchor. So this commit adds that span.
2024-04-13reference.docx: use current standard Word theme.John MacFarlane
This includes using the sans-serif font Aptos instead of the serif font Cambria. See #7280.
2024-04-13reference.docx: stay closer to Word's current defaults.John MacFarlane
We use the default styles for headings and the title instead of what pandoc was using. See #7280.
2024-04-13Use conventional styles/indents for Word bullet lists.John MacFarlane
See #7280.
2024-02-28Docx writer: don't copy over footnotePr in settings.xml...John MacFarlane
rom reference.docx. Closes #9522.
2024-02-28Docx reader: ensure that table captions are counted.John MacFarlane
Normally these occur outside the table element itself, but they should still be parsed as captions in this case. Closes #9518.
2024-02-03Docx writer: restore ability to center-justify table.John MacFarlane
The fix to #5947 caused all tables to be left indented. This was necessary to avoid extra indentation in table cells when a table appeared in a list item. This change makes the changes conditional, so that they only affect tables in list items. Closes #9393.
2023-12-19fix(docx): sort inline elements in schema orderEdwin Török
Fixes #9273 ``` [ { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:b'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:document[1]/w:body[1]/w:p[1]/w:r[7]/w:rPr[1]", "PartUri": "/word/document.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" } ] ``` Signed-off-by: Edwin Török <[email protected]>
2023-12-18fix(docx): fix validation error on endnotePrEdwin Török
Copying `endnotePr` causes validation errors, because it is now referencing something that doesn't exist in the document: ``` { "FilePath": "test/docx/golden/custom_style_reference.docx", "ValidationErrors": "[{\"Description\":\"Element 'w:endnote' referenced by 'endnote@http://schemas.openxmlformats.org/wordprocessingml/2006/main:id' does not exist in part '/MainDocumentPart/EndnotesPart'. The reference value is '0'.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:settings[1]/w:endnotePr[1]/w:endnote[2]\",\"PartUri\":\"/word/settings.xml\"},\"Id\":\"Sem_MissingReferenceElement\",\"ErrorType\":\"Semantic\"},{\"Description\":\"Element 'w:endnote' referenced by 'endnote@http://schemas.openxmlformats.org/wordprocessingml/2006/main:id' does not exist in part '/MainDocumentPart/EndnotesPart'. The reference value is '-1'.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:settings[1]/w:endnotePr[1]/w:endnote[1]\",\"PartUri\":\"/word/settings.xml\"},\"Id\":\"Sem_MissingReferenceElement\",\"ErrorType\":\"Semantic\"}]" } ``` For now don't copy this element, it wasn't copied before, and it doesn't seem necessary to fix the ordering problems we had with settings. Fixes: c9bf4da74 ("Docx writer: ensure that elements in settings are ordered correctly.") Signed-off-by: Edwin Török <[email protected]>
2023-12-18fix(docx): fix validation error on w:tblHeaderEdwin Török
``` { "FilePath": "test/docx/golden/tables.docx", "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'true'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tr[1]/w:trPr[1]/w:tblHeader[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]" } ``` Although this one might actually be a bug in Open-XML-SDK similar to this, or a subtle difference between standard versions: https://github.com/dotnet/Open-XML-SDK/issues/780 Signed-off-by: Edwin Török <[email protected]>
2023-12-18fix(docx): use left vs start consistentlyEdwin Török
They are equivalent, but OOXML-Validator complains: ``` { "FilePath": "test/docx/golden/tables_separated_with_rawblock.docx", "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'start'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[2]/w:tblPr[1]/w:jc[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"},{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'start'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tblPr[1]/w:jc[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]" } ``` pandoc already uses 'left' elsewhere, so be consistent, we still produce the transitional schema, not the strict one which would have the 'start' attribute. Signed-off-by: Edwin Török <[email protected]>
2023-12-18fix(docx): fix validation error on inline w:i/w:iCs orderEdwin Török
From `make validate-docx-golden-tests2`: ``` { "FilePath": "test/docx/golden/definition_list.docx", "ValidationErrors": "[{\"Description\":\"The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:i'.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:p[3]/w:r[3]/w:rPr[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_UnexpectedElementContentExpectingComplex\",\"ErrorType\":\"Schema\"}]" }, ``` Signed-off-by: Edwin Török <[email protected]>
2023-12-18fix(docx): fix OOXMLValidator error on KeywordTok outputEdwin Török
xmllint doesn't warn about this (maybe because the tag is empty?), but the order doesn't match wml.xsd: ``` <w:rPr> <w:color w:val="007020"/> <w:b/> </w:rPr> ``` And OOXMLValidatorCLI does warn about it: ``` { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:b'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:styles[1]/w:style[40]/w:rPr[1]", "PartUri": "/word/styles.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" } ``` Signed-off-by: Edwin Török <[email protected]>
2023-12-18fix(docx): fix validation error on w:annotationRefEdwin Török
annotationRef is not valid for `w:rPr`, only for `w:r` according to wml.xsd. See https://github.com/jgm/pandoc/issues/9269 Signed-off-by: Edwin Török <[email protected]>
2023-12-18fix(docx): fix validation error in w:nsidEdwin Török
The length here seems to refer to length in bytes (so twice as long in hex): ``` ./tmp/numbering-pretty.xml:4: element nsid: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}nsid', attribute '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': [facet 'length'] The value 'A990' has a length of '2'; this differs from the allowed length of '4'. ``` [This](https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.nsid?view=openxml-2.8.1) also documents the longer values. Signed-off-by: Edwin Török <[email protected]>
2023-12-18Docx writer: fixed validation errors in tables.John MacFarlane
Closes #9266.
2023-12-18Docx writer: fix validation error.John MacFarlane
The elements in pPr in lists were not properly ordered. This doesn't seem to cause problems for Word, but it makes validation fail and may pose problems for other consumers of docx. Closes #9265.
2023-12-17Docx writer: ensure that elements in settings are ordered correctly.John MacFarlane
The elements must occur in a specific order. This was being messed up when integrating a custom reference.docx. Closes #9264.
2023-12-17test/docx/golden: regenerateEdwin Török
Using `make test TESTARGS=--accept` Signed-off-by: Edwin Török <[email protected]>