github.com/jgm/pandoc - Pandoc — The universal markup converter

Age	Commit message (Collapse)	Author
13 days	PPTX writer: support notes field in metadata for title slide (#11396)	Chris Callison-Burch
	This adds support for a `notes` field in the YAML metadata block that will be used as speaker notes for the title slide in PowerPoint output. Previously, there was no way to add speaker notes to the title slide since it is generated from metadata rather than from content blocks. The `::: notes` syntax only works for content slides. Example usage: --- title: My Presentation notes: \| Welcome everyone to this presentation. Remember to introduce yourself. --- Closes #5844 (for PPTX output). Co-authored-by: Chris Callison-Burch <[email protected]>
2026-01-07	Fix docx writer: skip directory entries when building media overrides (#11379)	You Jiangbin
	Pandoc's docx writer was previously adding an `<Override>` for `/word/media/` in `[Content_Types].xml` when the reference doc contains media, which violates OPC rules and causes Word to report corruption.
2025-12-28	ODT reader: Add table row and column spans (#11366)	Tuong Nguyen Manh
	Parse the number-rows-spanned and number-columns-spanned attributes to create Cells for the Table.
2025-12-10	Org: don't include 'example' class when parsing org example blocks.	John MacFarlane
	These are just unmarked code blocks. Closes #11339.
2025-11-30	pptx writer: Handle reference doc without slides (#11310)	Tuong Nguyen Manh
	An empty `sldIdLst` is now added if the reference doc is missing one so that `modifySldIdLst` can replace it. To ensure PowerPoint doesn't say that the file will need fixing, the `sldIdLst` has to be placed after the `sldMasterIdLst`. I also added a test to ensure that if there are notes, they will be placed between the `sldMasterIdLst` and `sldIdLst`. Otherwise PowerPoint wouldn't show the slide of a note when viewing Notes Pages. Closes #7536.
2025-11-29	Add asciidoc as an input format.	John MacFarlane
	New exported module Text.Pandoc.Readers.AsciiDoc, exporting readAsciiDoc [API change]. The bulk of parsing is handled by the asciidoc library. Closes #1456.
2025-11-24	Fix warning in Docx reader test.	John MacFarlane

2025-11-24	Add `xlsx` (Microsoft Excel) as an input format.	Anton Antich
	Each worksheet turns into a section containing a table. The common file `nativeDiff` has been extract from the Docx and Pptx text files and put in Tests.Helpers.
2025-11-24	Support pptx (PowerPoint) as an input format.	Anton Antich
	New module `Text.Pandoc.Readers.Pptx`, exporting `readPptx`. [API change] Factored out some common OOXML functions from Text.Pandoc.Readers.Docx.Util into a non-exported module Text.Pandoc.Readers.OOXML.Shared.
2025-11-05	Add BBCode writer (#11242)	reptee
	`bbcode` is now supported as an output format, as well as variants `bbcode_fluxbb` (FluxBB), `bbcode_phpbb` (phpBB), `bbcode_steam` (Hubzilla), `bbcode_hubzilla` (Hubzilla), and `bbcode_xenforo` (xenForo). [API change] Adds a new module Text.Pandoc.Writers.BBCode, exporting a number of functions. Also exports `writeBBCode`, `writeBBCodeSteam`, `writeBBCodeFluxBB`, `writeBBCodePhpBB`, `writeBBCodeHubzilla`, `writeBBCodeXenforo` from Text.Pandoc.Writers.
2025-10-18	Update to use latest dev citeproc.	John MacFarlane
	Fixed golden test regeneration in Docx reader test.
2025-09-17	Use Tasty.Golden for Docx reader tests.	John MacFarlane
	This way we can update them with `--accept`.
2025-09-15	Vimdoc writer (#11132)	reptee
	Support for vimdoc, documentation format used by vim in its help pages. Relies heavily on definition lists and precise text alignment to generate tags.
2025-09-08	pptx writer: Handle single column	Tuong Nguyen Manh
	Add an additional guard for a single column to be able to process it.
2025-09-02	Refactor highlighting options [API Change]	Albert Krewinkel
	A new command line option `--syntax-highlighting` is provided; it takes the values `none`, `default`, `idiomatic`, a style name, or a path to a theme file. It replaces the `--no-highlighting`, `--highlighting-style`, and `--listings` options. The `writerListings` and `writerHighlightStyle` fields of the `WriterOptions` type are replaced with `writerHighlightStyle`. Closes: #10525
2025-09-02	Change `latex-pos` to `latex-placement`.	John MacFarlane

2025-09-01	LaTeX writer: control figure placement with attribute (#11094)	Sean Soon
	If a `latex-pos` attribute is present on a figure, it will be used as the optional positioning hint in LaTeX (e.g. `ht`). With implicit figures, `latex-pos` will be added to the figure (and removed from the image) if it is present on the image. Closes #10369.
2025-08-27	Org reader: improve sub- and superscript parsing.	Albert Krewinkel
	Sub- and superscript must be preceded by a string in Org mode. Some text preceded by space or at the start of a paragraph was previously parsed incorrectly as sub- or superscript.
2025-08-26	HTML reader: don't drop the initial newline in a pre element.	John MacFarlane
	Closes #11064.
2025-08-10	ODT Reader: Add table-header-rows	Tuong Nguyen Manh

2025-08-06	Add `smart_quotes` and `special_strings` extensions for Org	Albert Krewinkel
	Org mode makes a distinction between smart parsing of quotes, and smart parsing of special strings like `...`. The finer grained control over these features is necessary to truthfully reproduce Emacs Org mode behavior. Special strings are enabled by default, while smart quotes are disabled. The behavior of `special_string` is brought closer to the reference implementation in that `\-` is now treated as a soft hyphen.
2025-08-03	Fix named entity lookup in POD reader	Evan Silberman
	Translating entities by name ultimately relies on Commonmark.Entity.lookupEntity, which de facto requires the entity name to be followed by a semicolon. Paste a semicolon onto the end of the entity name read from POD to look it up. Fixes #11015
2025-07-26	New `xml` format exactly representing a Pandoc AST.	massifrg
	This adds a reader and writer for an XML format equivalent to `native` and `json`. XML schemas for validation can be found in `tools/pandoc-xml.*`. The format is documented in `doc/xml.md`. API changes: - Add module Text.Pandoc.Readers.XML, exporting `readXML`. - Add module Text.Pandoc.Writers.XML, exporting `writeXML`. A new unexported module Text.Pandoc.XMLFormat is also added.
2025-07-24	Org reader: Recognize "fast access" characters in TODO state definitions ↵	Ryan Gibb
	(#10990)
2025-06-02	Markdown reader: make definition lists behave like other lists.	John MacFarlane
	If the `four_space_rule` extension is not enabled, figure out the indentation needed for child blocks dynamically, by looking at the first nonspace content after the `:` marker. Previously the four-space rule was always obeyed. Remove the old `compact_definition_lists` extension. This was neded to preserve backwards compatibility after pandoc 1.12 was released, but at this point we can get rid of it. T.P.Extensions: remove `Ext_compact_definition_lists` constructor for `Extension` [API change]. Fix tight/loose detection for definition lists, to conform to the documentation. Closes #10889.
2025-05-28	Fix whitespace bugs.	John MacFarlane

2025-05-28	Adding support for sidebars to Asciidoc writer	Greg

2025-05-26	LaTeX writer: include alt option in `\includegraphics`.	John MacFarlane
	Closes #6095.
2025-05-16	Fix problems with gridTable and add tests.	John MacFarlane
	Closes #10848.
2025-05-11	Remove some redundant code in test.	John MacFarlane

2025-05-11	Org reader: change handling of inline TeX.	John MacFarlane
	Previously inline TeX was handled in a way that was different from org's own export, and that could lead to information loss. This was particularly noticeable for inline math environments such as `equation`. Previously, an `equation` environment starting at the beginning of a line would create a raw block, splitting up the paragraph containing it (see #10836). On the other hand, an `equation` environment not at the beginning of a line would be turned into regular inline elements representing the math. (This would cause the equation number to go missing and in some cases degrade the math formatting.) Now, we parse all of these as raw "latex" inlines, which will be omitted when converting to formats other than LaTeX (and other formats like pandoc's Markdown that allow raw LaTex). Closes #10836.
2025-03-29	Use `pdf-engine` variable instead of extensions...	John MacFarlane
	...to determine what to do about `.pdfhref` macros in `ms` output. When no PDF engine is specified, we don't use the `.pdfhref` macros at all. This gives better results for links in formats other than PDF, since the link text would simply disappear if it exists only in a `.pdfhref` macro. When a PDF engine is specified, escape the argument of `.pdfhref O` in a way that is appropriate. Remove `groff` extension. Text.Pandoc.Extensions: remove `Ext_groff` constructor. See #10738. This revises the earlier commit 3adcb4bd8089cdb8408da5f17780cd49513b7cec.
2025-03-17	Markdown writer: avoid spaces after/before open/close delimiters.	John MacFarlane
	E.g. instead of rendering `x<em> space </em>y` as `x* space y` we render it as `x space* y`. Closes #10696.
2025-03-14	Markdown reader: remove some misguided list fanciness.	John MacFarlane
	Previously we tried to handle things like commented out list items: - one <!-- - two --> - three and also things like: - one `and - two` and But the code we added to handle these cases caused problems with other, more straightforward things, like: - one - ``` code ``` - three So we are rolling back all the fanciness, so that the markdown parser now behaves more like the commonmark parser, in which indicators of block-level structure always take priority over indicators of inline structure. Closes #9865. Closes #7778. See also #5628.
2025-02-12	Markdown writer: omit extra space after bullets.	John MacFarlane
	We used to insert extra spaces to ensure that the content respected the four-space rule. That is not really necessary now, since pandoc's markdown and most markdowns don't follow the four-space rule. Those who want the old behavior can obtain it by using `-t markdown+four_space_rule`. Closes #7172.
2025-02-07	Track wikilinks with a class instead of a title	Evan Silberman
	Once upon a time the only metadata element for links in Pandoc's AST was a title, and it was hijacked to track certain links as having originated in the wikilink syntax. Now we have Attrs and we can use a class to handle wikilinks instead. Requires coordinated changes to commonmark-hs.
2025-01-30	DOCX reader: do not issue warning for comments with `+styles` (#10572)	Stephen Reindl
	Closes #10571. Co-authored-by: Stephen Reindl <[email protected]>
2025-01-21	Prefer MIME type when determining extensions for MediaBag items (#10557)	Max Heller
	Currently, remote images added to the MediaBag are stored at paths with extensions determined based on the external URI. For instance, an image from https://example.com/image.png is stored as <hash>.png. If the URI does not contain an extension (e.g., https://example.com/image), then the content-type of the downloaded image is used to determine the extension. This change switches the precedence such that content-type is preferred over extensions contained in the URI. This is necessary because some images are located at URIs with misleading extensions -- shields.io, for instance, serves SVGs from URIs with .yml extensions. With this change, the image/svg+xml content-type is now preferred over the .yml URI extension. This fixes a bug in the PDF writer in which such an image would be mishandled due to not being identified as an SVG.
2024-12-28	AsciiDoc writer: improve escaping.	John MacFarlane
	Closes #10385. Closes #2337. Closes #6424.
2024-12-27	Add Pod reader	Evan Silberman
	Pod ("Plain old documentation") is a markup languaged used principally to document Perl modules and programs. Since it was originally meant to be translated pretty directly to man, the semantics are fairly simple. This Pod reader was developed with reference to the canonical user and implementer documentation of Pod: https://perldoc.perl.org/perlpod and https://perldoc.perl.org/perlpodspec. There are 1490 .pod, .pl, and .pm in the Perl 5.34 distribution found in /System/Library/Perl on my mac. Of those, this reader dies with a parse error on 7 of them. All of them seem to be cases where pod commands are found within a non-colon-prefixed =begin/=end. perlpodspec says I may treat this as an error. [API change] adds readPod
2024-12-05	Add mdoc reader	Evan Silberman
	This change introduces a reader for mdoc, a roff-derived semantic markup language for manual pages. The two relevant contemporary implementations of mdoc for manual pages are mandoc (https://mandoc.bsd.lv/), which implements the language from scratch in C, and groff (https://www.gnu.org/software/groff/), which implements it as roff macros. mdoc has a lot of semantics specific to technical manuals that aren't representable in Pandoc's AST. I've taken a cue from the mandoc HTML output and many mdoc elements are encoded as Codes or Spans with classes named for the mdoc macro that produced them. Much like web browsers with HTML, mandoc attempts to produce best-effort output given all kinds of weird and crappy mdoc input. Part of the reason it's able to do this is it uses a very accommodating parse tree and stateful output routines specialized to the output mode, and when it encounters some macro it wasn't expecting, it can easily give up on whatever it was outputting and output something else. I've encoded as much flexibility as I reasonably could into the mdoc reader here, but I don't know how to be as flexible as mandoc. This branch has been developed almost exclusively against mandoc's documentation and implementation of mdoc as a reference, and the real-world manual pages tested against are those from the OpenBSD base system. Of ~3500 manuals in mdoc format shipped with a fresh OpenBSD install, 17 cause the mdoc reader to exit with a parse error. Any further chasing of edge cases is deferred to future work. Many of the tests in test/Tests/Readers/Mdoc.hs are derived directly from mandoc's extensive regression tests. [API change] Adds readMdoc to the public API
2024-10-15	RST reader: avoid putting metadata in Para.	John MacFarlane
	Create MetaInlines when possible, just as with markdown input. MetaBlocks is still used when there are multiple paragraphs or non-paragraph content. This change also affects field lists. Closes #7766.
2024-10-01	RST writer: change bullet list hang from 3 to 2.	John MacFarlane
	This accords with the style in the reference docs.
2024-09-21	DokuWiki reader: fix block quote behavior.	John MacFarlane
	Closes #6461. Blockquotes are not really block containers in DokuWiki; the lines are interpreted literally (so, e.g., you can't start a list), and line breaks are added at the ends.
2024-09-09	Tests.Readers.Markdown: avoid use of 'head'.	John MacFarlane

2024-09-09	Tests: use 'drop 1' instead of partial function 'tail'.	John MacFarlane

2024-09-09	Avoid use of 'head' in Tests.Shared.	John MacFarlane

2024-09-03	Add ansi writer tests.	John MacFarlane

2024-09-03	HTML reader: only parse main element's contents (if present).	John MacFarlane
	If main has an id or class, we include a div with that id or class; otherwise just the contents. Closes #10140.
2024-07-27	Docx writer: fix regression with nested lists.	John MacFarlane
	Closes #9994. The bug affects e.g. ordered lists with bullet sublists; after the sublist the top-level list reverts to bullets instead of being properly numbered. This regression was introduced in version 3.2.1 and was caused by commit f5531f1.