github.com/jgm/pandoc - Pandoc — The universal markup converter

Age	Commit message (Collapse)	Author
2025-04-05	Markdown writer: improve use of implicit figures when possible.	John MacFarlane
	Closes #10758. When the alt differs from the caption, but only as regards formatting, we still use an implicit figure.
2025-04-04	Markdown writer: render a figure with Para caption as implicit figure.	John MacFarlane
	Also, when falling back to a Div with class `figure` for a figure that can't be represented any other way, include a Div with class `caption` containing the caption. Closes #10755.
2025-04-01	Typst writer: support `mark` class on spans.	John MacFarlane
	Closes #10747.
2025-03-29	Org reader: don't include newlines in inine code/verbatim.	John MacFarlane
	Convert newlines to spaces as we do in other formats. Closes #10730.
2025-03-23	Use the most compatible form for roff escapes.	John MacFarlane
	This affects T.P.RoffChar, T.P.Writers.Roff, and the Man and Ms writers. That is, `\(xy` instead of `\[xy]`. This was the original AT&T troff form and is the most widely supported. The bracketed form causes problem for some tools, e.g. `makewhatis` on macOS. Closes #10716.
2025-03-22	Commonmark Reader: handle GFM math irregularity with braces.	John MacFarlane
	In GFM, you need to use `\\{` rather than `\{` for a literal brace. Closes #10631.
2025-03-21	MediaWiki reader/writer: allow definition on same line as term.	John MacFarlane
	Closes #10708.
2025-03-19	Skip at most one argument to LaTeX tabular newline (#10707)	silby
	In LaTeX's tabular environment, the tabular newline takes an optional argument that we skip. But it only takes a single optional argument, and any further square-bracketed text that follows shouldn't be skipped. Fixes #7512, and also adds a test for the original problem raised in that issue which was already fixed at some point.
2025-03-15	Update tests for previous commit (protecting phantomsection).	John MacFarlane

2025-03-14	Markdown reader: remove some misguided list fanciness.	John MacFarlane
	Previously we tried to handle things like commented out list items: - one <!-- - two --> - three and also things like: - one `and - two` and But the code we added to handle these cases caused problems with other, more straightforward things, like: - one - ``` code ``` - three So we are rolling back all the fanciness, so that the markdown parser now behaves more like the commonmark parser, in which indicators of block-level structure always take priority over indicators of inline structure. Closes #9865. Closes #7778. See also #5628.
2025-03-07	Markdown reader: fixed `escapedChar'` parser.	John MacFarlane
	It should not accept escaped newlines. See #10672.
2025-03-05	Disable citations extension in writers if `--citeproc` is used.	John MacFarlane
	Otherwise we get undesirable results, as the format's native citation mechanism is used instead of (or in addition to) the citeproc-generated citations. Closes #10662.
2025-03-04	LaTeX reader: better handle comments/whitespace in option lists and includes.	John MacFarlane
	Closes #10659.
2025-02-27	Typst writer: better heuristics for escaping potential list markers.	John MacFarlane
	Closes #10650.
2025-02-19	Revert "Docx reader and writer: support row heads."	John MacFarlane
	This reverts commit cbe67b9602a736976ef6921aefbbc60d51c6755a. Word sets `w:firstColumn="1"` by default for tables. You have to find the Table Design tab and explicitly uncheck "First Column" to make this go away. In most cases, I don't think writers intend to designate the first column as a row head, so this commit is going to produce unexpected results. In addition, because of the table normalization done by pandoc-type's `tableWith`, any table containing a colspanned cell in the left-hand column will get broken if the first column is designated a row head. For these reasons it seems best to revert this change, which was made in response to #9495. Closes #10627.
2025-02-14	Markdown reader: allow line break between URL and title of link.	John MacFarlane
	Closes #10621.
2025-02-13	Update pandoc-citeproc-320a test.	John MacFarlane
	See #10610.
2025-02-13	Smart quote parsing: ignore curly quotes.	John MacFarlane
	Previously we tried to match curly quotes as well as straight quotes, producing Quoted inlines. But it seems better just to assume that those who use curly quotes want them passed through verbatim. This also fixes an (unintended) bug whereby curly single left quotes would sometimes be changed to single right quotes. Closes #10610.
2025-02-12	Markdown writer: omit extra space after bullets.	John MacFarlane
	We used to insert extra spaces to ensure that the content respected the four-space rule. That is not really necessary now, since pandoc's markdown and most markdowns don't follow the four-space rule. Those who want the old behavior can obtain it by using `-t markdown+four_space_rule`. Closes #7172.
2025-02-10	Use babel options `shorthands=off`.	John MacFarlane
	This has been fixed now in Babel for some time. So we can now get rid of the ugly code that disabled language-specific shorthands (see e26d31d). Closes #6817.
2025-02-10	Remove selnolig-langs.	John MacFarlane
	We now specify the language as a global option again, so we no longer need to specify it when invoking selnolig. See #9863.
2025-02-08	LaTeX writer/template: Improve babel support.	John MacFarlane
	Previously we used the `.ini` files for every language, but for European languages these tend to provide inferior results to the `.ldf` files used by classic Babel. Currently Babel documentation recommends using the classic system for European languages written in Latin and Cyrillic scripts and Vietnamese. So the LaTeX writer and template now follow this guidance. Main languages in the list of languages with good "classic" support are added to global documentclass options and will be automatically handled by Babel using the `.ldf` files. If the main language is not in this list, the `babeloptions` variable will be set to `provide=*`, which will cause support to be loaded from the `.ini` file rather than an `.ldf`. So, for example, setting `-V babeloptions=''` with a polytonic Greek document will cause the `.ldf` support to be used instead of the `.ini`. The default setting of this variable can be overwritten, but in most cases the default should give good results. Closes #8283.
2025-02-07	Track wikilinks with a class instead of a title	Evan Silberman
	Once upon a time the only metadata element for links in Pandoc's AST was a title, and it was hijacked to track certain links as having originated in the wikilink syntax. Now we have Attrs and we can use a class to handle wikilinks instead. Requires coordinated changes to commonmark-hs.
2025-02-05	Add CRediT roles to JATS	Charles Tapley Hoyt
	Enable annotating author roles using the Contribution Role Taxonomy (CRediT) and export this information in conformant JATS Closes #10152. Co-Authored-By: Jez Cope <[email protected]>
2025-02-03	DocBook reader: Handle title inside orderedlist.	John MacFarlane
	Also some other elements that allow title: blockquote, calloutlist, etc. Closes #10594.
2025-02-01	DocBook reader: better handle formalpara, example, and sidebar.	John MacFarlane
	Include identifiers and titles in each case. The code should be credited to @tombolano. Closes #8666.
2025-01-29	Handle <abbr> as a span-like inline	Evan Silberman
	Closes #5793
2025-01-29	Test \{,re}newcommand arguments (#10573)	silby
	Closes #4470
2025-01-24	brace tables with typst:no-figure and typst:text attributes (#10563)	Gordon Woodhull
	The combination of #9648 Typst property output and #9778 `typst:no-figure` can cause fonts to spill out of tables. This is because setting Typst text properties across a table requires `set text(...)` outside the table, and previously we were relying on the figure to provide a scope. This adds an extra `#{...}` when the table has class `typst:no-figure` and also has `typst:text:*` attributes.
2025-01-16	Citeproc: fix moving punctuation before citation notes.	John MacFarlane
	This previously worked with regular citations, but not author-in-text citations. Now it works with both.
2025-01-15	Consume blanks after =encoding in pod reader (#10544)	silby
	The reader did not properly consume empty lines after =encoding commands, which produced various incorrect parses depending on the content between there and the next command. Fixes #10537
2025-01-10	Fix 9495 command test for windows.	John MacFarlane

2025-01-10	Docx reader and writer: support row heads.	John MacFarlane
	Reader: When `w:tblLook` has `w:firstColumn` set (or an equivalent bit mask), we set row heads = 1 in the AST. Writer: set `w:firstColumn` in `w:tblLook` when there are row heads. (Word only allows one, so this is triggered by any number of row heads > 0.) Closes #9495.
2025-01-10	Docx reader: read table styles as custom styles...	John MacFarlane
	...when `styles` extension is enabled. Closes #9603. Also improve manual's coverage of custom styles.
2025-01-01	Typst writer: fix handling of pixel image dimensions.	John MacFarlane
	These are now converted to inches as in the LaTeX writer. Closes #9945.
2024-12-28	AsciiDoc writer: improve escaping.	John MacFarlane
	Closes #10385. Closes #2337. Closes #6424.
2024-12-27	RST reader: fix handling of underscores.	John MacFarlane
	Fixes a regression in 3.6 that caused problems parsing text with underscores. Closes #10497.
2024-12-23	MediaWiki reader: allow empty quoted attributes.	John MacFarlane
	Closes #10490.
2024-12-23	MediaWiki reader: allow cells starting with `+`.	John MacFarlane
	Closes #10491.
2024-12-22	RST reader: handle explicit reference links (#10485)	silby
	This case was missed when changing the reference link strategy for RST to allow a single pass. Closes #10484.
2024-12-20	Mediawiki writer: escape line-initial characters...	John MacFarlane
	...that would otherwise be interpreted as list starts. Closes #9700.
2024-12-19	Allow `--shift-heading-level-by=-1` to work in djot...	John MacFarlane
	...in the same way it works for other formats (with the top-level heading being promoted to metadata title). This needed special treatment because of the way djot surrounds sections with Divs. Closes #10459.
2024-12-18	LaTeX reader: handle `figure*` environment as a figure.	John MacFarlane
	Closes #10472.
2024-12-17	Textile reader: improve parsing of spans.	John MacFarlane
	The span needs to be separated from its surroundings by spaces. Also, a span can have attributes, which we now attach. Closes #9878.
2024-12-17	Textile reader: inline constructors don't trigger if closer...	John MacFarlane
	...is preceded by whitespace. Closes #10414.
2024-12-05	Add mdoc reader	Evan Silberman
	This change introduces a reader for mdoc, a roff-derived semantic markup language for manual pages. The two relevant contemporary implementations of mdoc for manual pages are mandoc (https://mandoc.bsd.lv/), which implements the language from scratch in C, and groff (https://www.gnu.org/software/groff/), which implements it as roff macros. mdoc has a lot of semantics specific to technical manuals that aren't representable in Pandoc's AST. I've taken a cue from the mandoc HTML output and many mdoc elements are encoded as Codes or Spans with classes named for the mdoc macro that produced them. Much like web browsers with HTML, mandoc attempts to produce best-effort output given all kinds of weird and crappy mdoc input. Part of the reason it's able to do this is it uses a very accommodating parse tree and stateful output routines specialized to the output mode, and when it encounters some macro it wasn't expecting, it can easily give up on whatever it was outputting and output something else. I've encoded as much flexibility as I reasonably could into the mdoc reader here, but I don't know how to be as flexible as mandoc. This branch has been developed almost exclusively against mandoc's documentation and implementation of mdoc as a reference, and the real-world manual pages tested against are those from the OpenBSD base system. Of ~3500 manuals in mdoc format shipped with a fresh OpenBSD install, 17 cause the mdoc reader to exit with a parse error. Any further chasing of edge cases is deferred to future work. Many of the tests in test/Tests/Readers/Mdoc.hs are derived directly from mandoc's extensive regression tests. [API change] Adds readMdoc to the public API
2024-12-05	Parameterize Roff escaping	Evan Silberman
	The existing lexRoff does some stuff I don't want to deal with in mdoc just yet, like lexing tbl, and some stuff I won't do at all, like handling macro and text string definitions and switching between modes. Uses a typeclass with associated type families to reuse most of the escaping code between Roff (i.e. man) and Mdoc. Future work could improve on this so that more lexing code could be shared between Man and Mdoc. Mdoc inherits Roff's surface syntax so hypothetically it makes sense to lex it into tokens that make sense for roff. But it happens that the Mdoc parser is much easier to build with an Mdoc specific token stream. Some discussion in jgm/pandoc#10225 about the rationale. Adds a test for the roff \A escape, which I accidentally dropped support for in an earlier iteration without anything complaining.
2024-11-19	MediaWiki reader: fix indented tables with caption.	John MacFarlane
	Closes #10390.
2024-11-11	Respect empty LineBlock lines in plain writer	Evan Silberman
	The plain writer behaved as a markdown variant with Ext_line_blocks turned off, and so empty lines in a line block would get eliminated. This is surprising, since if there's anything where the intent can be preserved in plain text output it's empty lines. It's still a bit surprising to have nbsps in plain text output, as in the test, where the distinction doesn't really matter, but that'd be an orthogonal change.
2024-11-04	JATS writer: correct spelling of suppress attribute (#10350)	Andreas Deininger