github.com/jgm/pandoc - Pandoc — The universal markup converter

Age	Commit message (Collapse)	Author
2026-01-07	Docx reader: handle tables without tblGrid.	John MacFarlane
	Closes #11380.
2025-12-12	Fix some more imports involving foldl'.	John MacFarlane

2025-11-30	Docx reader: Handle REF link instruction (#11296)	Ezwal
	This PR aims to handle a common run field instruction (fieldInstr) from docx format : REF, specifically those with the "link" switch \h. In word software, you can create REF field instruction with the Cross-reference button. You can create cross-reference to many things such as Equation, Table, Title...
2025-11-24	Support pptx (PowerPoint) as an input format.	Anton Antich
	New module `Text.Pandoc.Readers.Pptx`, exporting `readPptx`. [API change] Factored out some common OOXML functions from Text.Pandoc.Readers.Docx.Util into a non-exported module Text.Pandoc.Readers.OOXML.Shared.
2025-11-12	Docx reader: check recursively for caption styles.	Albert Krewinkel
	The docx reader uses caption styles to identify figures and captioned tables. It now checks for known caption styles in the full styles hierarchy of a paragraph instead of just checking the style directly. This allows to recognize caption styles that are built on top of the basic caption style, as is sometimes the case in sophisticated styles.
2025-09-17	Docx reader: change default for textwidth.	John MacFarlane
	This should only be used if sectPr is not found.
2025-09-17	Docx reader: properly calculate table column widths.	John MacFarlane
	Previously we assumed that every table took up the full text width. Now we read the text width from the document's sectPr. Closes #9837. Closes #11147.
2025-09-06	Docx reader: better handling of AlternateContent.	John MacFarlane
	This revises the solution to #9214 in commit 2e8ecb3 in order to handle a standard Word way of inserting emojis. Closes #11113.
2025-09-06	Partially undo commit 2e8ecb3.	John MacFarlane
	This was too heavy-handed a fix, and it interferes with processing Word emojis (#11109).
2025-07-27	Docx reader: fix `stringToInteger`.	John MacFarlane
	It previously converted things like `11ccc` to an integer; now it requires that the whole string be parsable as an integer. Closes #9184.
2025-06-03	Docx reader: handle strict OpenXML as well as transitional.	John MacFarlane
	Closes #7691.
2025-03-19	T.P.Readers.Docx.Util: use xml-lights's `onlyElems`...	John MacFarlane
	...instead of defining it again.
2025-02-19	Revert "Docx reader and writer: support row heads."	John MacFarlane
	This reverts commit cbe67b9602a736976ef6921aefbbc60d51c6755a. Word sets `w:firstColumn="1"` by default for tables. You have to find the Table Design tab and explicitly uncheck "First Column" to make this go away. In most cases, I don't think writers intend to designate the first column as a row head, so this commit is going to produce unexpected results. In addition, because of the table normalization done by pandoc-type's `tableWith`, any table containing a colspanned cell in the left-hand column will get broken if the first column is designated a row head. For these reasons it seems best to revert this change, which was made in response to #9495. Closes #10627.
2025-01-10	Docx reader and writer: support row heads.	John MacFarlane
	Reader: When `w:tblLook` has `w:firstColumn` set (or an equivalent bit mask), we set row heads = 1 in the AST. Writer: set `w:firstColumn` in `w:tblLook` when there are row heads. (Word only allows one, so this is triggered by any number of row heads > 0.) Closes #9495.
2025-01-10	Docx reader: read table styles as custom styles...	John MacFarlane
	...when `styles` extension is enabled. Closes #9603. Also improve manual's coverage of custom styles.
2024-12-07	Docx reader: handle `\b`, `\i`, `\y` modifiers in `XE` index entries.	John MacFarlane
	See #10171.
2024-12-05	Docx reader: improve index reference support.	John MacFarlane
	Support crossrefs. Clean up and unify switch parsing for fields.
2024-12-05	Docx reader: parse index references as empty Spans.	John MacFarlane
	See #10171.
2024-10-03	Docx reader: reset lists after headers in same list numId.	John MacFarlane
	Headings in docx, even ones that do not have a visible number, can have a numId, and in odd cases can even share a numId with a list that continues after the header. In this case the list numbering should be reset by the header. To accomplish this, we add a Heading constructor to BodyPart and include on it all the information list items have. Closes #10258.
2024-09-08	Remove most uses of partial function 'head'.	John MacFarlane

2024-06-12	Docx reader: improve handling of captions.	John MacFarlane
	- Turn captioned images into Figure elements. Closes #9391. - Improve the logic for associating elements with captions. Closes #9358. - Ensure that captions that can't be associated with an element aren't just silently dropped. Closes #9610.
2024-06-12	Docx reader: rename TblCaption to Capt.	John MacFarlane
	We'll use this for image captions as well. Word does not really distinguish these.
2024-06-04	Docx reader: support task lists.	John MacFarlane
	This also fixes a small bug in parsing delimiters in numbered lists, which led to the default delimiter being used wrongly in some cases. Closes #8211.
2024-06-04	T.P.Readers.Docx.Lists: replace a generic traversal...	John MacFarlane
	using `bottomUp` with a faster one using `walk`.
2024-06-01	Docx reader: react to "left" value on jc attribute.	John MacFarlane
	Also fix tests.
2024-06-01	Docx reader: handle column and cell alignments.	John MacFarlane
	OpenXML doesn't have a way of indicating column alignments, but we guess them by looking at the justification property on the first paragraph of a cell, if there is one. We take the column alignments from the first body row. Closes #8551.
2024-06-01	Docx reader: allow insertion/deletion to contain arbitrary ParParts...	John MacFarlane
	...and not just Runs. This fixes a problem wherein comments inside insertions or deletions would be ignored. Closes #9833.
2024-06-01	Support HorizontalRule in docx reader.	John MacFarlane
	We support both pandoc-style and the style described at https://support.microsoft.com/en-us/office/insert-a-horizontal-line-9bf172f6-5908-4791-9bb9-2c952197b1a9 Closes #6285.
2024-06-01	T.P.Readers.Docx.Parse: add HRule constructor to BodyPart.	John MacFarlane
	This paves the way to supporting horizontal rules in the reader. We still need to adjust the parser to create HRule appropriately; so far, this change has no effect, but it's a step on the way to #6285.
2024-04-25	Update copyright dates to 2024.	John MacFarlane

2024-02-28	Docx reader: ensure that table captions are counted.	John MacFarlane
	Normally these occur outside the table element itself, but they should still be parsed as captions in this case. Closes #9518.
2024-02-28	Docx reader: detect caption by style name not id.	John MacFarlane
	The styleId can change depending on the localization. Partially resolves #9518.
2023-12-26	fix(docx): support absolute header/footer paths	Edwin Török
	Header and footer references may be absolute in the reference.docx. E.g. editing it with dotnet's Open-XML-SDK causes this error: ``` + pandoc test.md -t docx --reference-doc referenceh.docx -o test.docx word//word/header1.xml missing in reference docx ``` There was already code in pandoc to handle relative vs absolute paths in references, so use it. Signed-off-by: Edwin Török <[email protected]>
2023-12-18	Docx reader: fix HYPERLINK with only switch and no argument.	John MacFarlane
	The argument can apparently be omitted, and then we just have a fragment URL. Closes #9246.
2023-12-11	Whitespace fix.	John MacFarlane

2023-11-29	Docx reader: unwrap content of shaped textboxes...	Stephan Meijer
	* #9214 text in shape format test document * #9214 support Text in Shape Format * #9214 remove irrelevant code
2023-11-28	Docx reader: Improve handling of w:sym.	John MacFarlane
	Add T.P.Readers.Docx.Symbols. This gives us a table to use to resolve characters included in docx via w:sym element. Use this table to resolve characters when symbol fonts are specified. Closes #9220.
2023-11-28	Correct comment.	John MacFarlane

2023-08-18	Docx reader: omit "Table NN" from caption.	John MacFarlane
	Closes #9002.
2023-07-14	Docx reader: use SVG version of image if present.	John MacFarlane
	Previously the backup PNG was exported even if an SVG was present, but the SVG should be preferred. Closes #7244.
2023-02-18	Docx reader: parse image alt texts in LibreOffice generated files	Albert Krewinkel
	LibreOffice tags images slightly differently than Word; this change lets the parses take that difference into account when looking for an image description (alt text).
2023-01-10	Update copyright years, it's 2023!	Albert Krewinkel

2022-12-11	Docx reader: fix handling of oMathPara in w:p with other content.	John MacFarlane
	Closes #8483. The problem is that oMathPara can either occur at the block-level (child of w:body) or at the inline level (child of w:p, potentially with other content). We need to handle both cases. Previously the code just assumed that if we had a w:p with an oMathPara, the math would be the sole content. This patch removes OMathPara as a constructor of BodyPart and adds it as a constructor of ParPart.
2022-11-19	Docx reader: Support parsing of highlighted text.	John MacFarlane

2022-10-31	First stab at mtl 2.3 compliance.	John MacFarlane
	This will no doubt produce a bunch of warnings and hence CI failures, which we'll need to work around with explicit imports.
2022-10-16	T.P.Parsing: Remove gratuitious renaming of Parsec types.	John MacFarlane
	We were exporting Parser, ParserT as synonyms of Parsec, ParsecT. There is no good reason for this and it can cause confusion. Also, when possible, we replace imports of Text.Parsec with T.P.Parsing. The idea is to make it easier, at some point, to switch to megaparsec or another parsing engine if we want to. T.P.Parsing new exports: Stream(..), updatePosString, SourceName, Parsec, ParsecT [API change]. Removed exports: Parser, ParserT [API change].
2022-10-15	Minor code cleanups.	John MacFarlane

2022-09-27	Fix small whitespace things.	John MacFarlane

2022-08-30	Docx reader: mark unnumbered headings with class 'unnumbered'	Albert Krewinkel
	If a document uses numbered headings, then headings without numbers are marked with class `unnumbered`, the default class used by pandoc to convey this kind of information. The classes are not added if none of the headings in a document are. This change ensures good conversion results when converting with `--number-sections`. Closes: #8148
2022-02-04	Docx reader: parse EN.CITE and EN.REFLIST fields.	John MacFarlane