github.com/jgm/pandoc - Pandoc — The universal markup converter

Age	Commit message (Collapse)	Author
2024-12-17	Textile reader: improve parsing of spans.issue9878	John MacFarlane
	The span needs to be separated from its surroundings by spaces. Also, a span can have attributes, which we now attach. Closes #9878.
2024-12-17	Textile reader: inline constructors don't trigger if closer...	John MacFarlane
	...is preceded by whitespace. Closes #10414.
2024-12-05	Add mdoc reader	Evan Silberman
	This change introduces a reader for mdoc, a roff-derived semantic markup language for manual pages. The two relevant contemporary implementations of mdoc for manual pages are mandoc (https://mandoc.bsd.lv/), which implements the language from scratch in C, and groff (https://www.gnu.org/software/groff/), which implements it as roff macros. mdoc has a lot of semantics specific to technical manuals that aren't representable in Pandoc's AST. I've taken a cue from the mandoc HTML output and many mdoc elements are encoded as Codes or Spans with classes named for the mdoc macro that produced them. Much like web browsers with HTML, mandoc attempts to produce best-effort output given all kinds of weird and crappy mdoc input. Part of the reason it's able to do this is it uses a very accommodating parse tree and stateful output routines specialized to the output mode, and when it encounters some macro it wasn't expecting, it can easily give up on whatever it was outputting and output something else. I've encoded as much flexibility as I reasonably could into the mdoc reader here, but I don't know how to be as flexible as mandoc. This branch has been developed almost exclusively against mandoc's documentation and implementation of mdoc as a reference, and the real-world manual pages tested against are those from the OpenBSD base system. Of ~3500 manuals in mdoc format shipped with a fresh OpenBSD install, 17 cause the mdoc reader to exit with a parse error. Any further chasing of edge cases is deferred to future work. Many of the tests in test/Tests/Readers/Mdoc.hs are derived directly from mandoc's extensive regression tests. [API change] Adds readMdoc to the public API
2024-12-05	Parameterize Roff escaping	Evan Silberman
	The existing lexRoff does some stuff I don't want to deal with in mdoc just yet, like lexing tbl, and some stuff I won't do at all, like handling macro and text string definitions and switching between modes. Uses a typeclass with associated type families to reuse most of the escaping code between Roff (i.e. man) and Mdoc. Future work could improve on this so that more lexing code could be shared between Man and Mdoc. Mdoc inherits Roff's surface syntax so hypothetically it makes sense to lex it into tokens that make sense for roff. But it happens that the Mdoc parser is much easier to build with an Mdoc specific token stream. Some discussion in jgm/pandoc#10225 about the rationale. Adds a test for the roff \A escape, which I accidentally dropped support for in an earlier iteration without anything complaining.
2024-11-19	MediaWiki reader: fix indented tables with caption.	John MacFarlane
	Closes #10390.
2024-11-11	Respect empty LineBlock lines in plain writer	Evan Silberman
	The plain writer behaved as a markdown variant with Ext_line_blocks turned off, and so empty lines in a line block would get eliminated. This is surprising, since if there's anything where the intent can be preserved in plain text output it's empty lines. It's still a bit surprising to have nbsps in plain text output, as in the test, where the distinction doesn't really matter, but that'd be an orthogonal change.
2024-11-04	JATS writer: correct spelling of suppress attribute (#10350)	Andreas Deininger

2024-10-25	LaTeX reader: put minipage in specially marked Div.	John MacFarlane
	Closes #10266.
2024-10-23	RST reader: implement option lists.	John MacFarlane
	Closes #10318.
2024-10-23	HTML writer: unwrap empty incremental divs	Albert Krewinkel
	Divs are unwrapped if the only purpose of the div seems to be to control whether lists are presented incrementally on slides. Closes: #10328
2024-10-17	Fix typo in test case.	John MacFarlane

2024-10-16	RST reader: handle block level substitutions.	John MacFarlane

2024-10-15	RST reader: fix linked substitutions.	John MacFarlane
	E.g. `\|Python\|_`. Closes #6588.
2024-10-15	RST reader: support inline anchors.	John MacFarlane
	Closes #9196.
2024-10-15	RST reader: explicit links define references.	John MacFarlane
	For example, ``Go to `g`_ `g <www.example.com>`_.`` should produce two links to www.example.com. Closes #5081.
2024-10-13	RST reader: Use a new one-pass parsing strategy.	John MacFarlane
	Instead of having an initial pass where we collect reference definitions, we create links with target `##SUBST##something` or `##REF##something` or `##NOTE##something`, and resolve these in a pass over the parsed AST. This allows us to handle link references that are not at the top level. Closes #10281.
2024-10-09	RST reader: ignore newlines in URL in explicit link.	John MacFarlane
	Closes #10279.
2024-10-08	Typst writer: make `smart` extension work.	John MacFarlane
	If `smart` is not enabled, a command in the default template will disable smartquote substitutions. When `smart` is enabled, render curly apostrophes as straight and escape straight apostrophes. When `smart` is disabled, render curly apostrophes as curly and don't escape straight apostrophes. And similarly for quotes, em and en dashes. This should give more idiomatic typst output, with fewer unnecessary escapes. Closes #10271.
2024-10-08	MediaWiki reader: Fix parsing of col/rowspan.	John MacFarlane
	Closes #6992.
2024-10-01	Make --number-sections work with beamer (#10245)	Thomas Hodgson
	+ Remove section numbering code from common.latex + Add section numbering to default.latex + Add logic for numbering sections in default.beamer. I moved the template setting code to where other beamer templates are set. This makes the section-titles and numbersections variables independent. This should make --number-sections work with beamer.
2024-10-01	RST writer: change bullet list hang from 3 to 2.	John MacFarlane
	This accords with the style in the reference docs.
2024-10-01	Amend the fix to #10236 to handle list tables.	John MacFarlane
	With this patch, we also reuse bullet list code for list tables, which simplifies the code.
2024-09-30	RST writer: handle cases where indented context starts with block quote.	John MacFarlane
	In these cases we emit an empty comment to fix the point from which indentation is measured; otherwise the block quote is not parsed as a block quote. This affects list items and admonitions. Cloess #10236.
2024-09-30	LaTeX writer: better fix for lists in definition lists.	John MacFarlane
	In commit a26ec96d89ccf532f7bca7591c96ba30d8544e4a we added an empty `\item[]` to the beginning of a list that occurs first in a definition list, to avoid having one item on the line with the label. This gave bad results in some cases (#10241) and there is a more idiomatic solution anyway: using `\hfill`. Closes #10241.
2024-09-29	Refactor latex template using partials.	John MacFarlane
	+ Split out common parts of latex template into partials: common.latex, fonts.latex, font-settings.latex, passoptions.latex, hypersetup.latex, after-header-includes.latex. + Split out old latex template into default.latex and default.beamer. + Make default.beamer the default template for beamer.
2024-09-27	RST writer: fix two issues with list tables.	John MacFarlane
	- Fix alignment of list items corresponding to cells. - Don't enclose the list table in a `.. table::`; this leads to doubled captions. Closes #10226. Closes #10227. Modified test output for #4564.
2024-09-21	Dokuwiki reader: be more forgiving about misaligned lists...	John MacFarlane
	like dokuwiki itself. Closes #8863.
2024-09-21	Improve blockquote parsing in dokuwiki.	John MacFarlane
	Allow for quoted code blocks.
2024-09-14	Fix test case #10185.	John MacFarlane

2024-09-14	LaTeX writer: avoid error on `refs` div with empty citations.	John MacFarlane
	If there are no citations, don't emit an empty CSLReferences environment. Closes #10185.
2024-09-13	T.P.Shared addPandocAttributes - modify for new commonmark-pandoc.	John MacFarlane
	The new commonmark-pandoc version automatically adds the attribute `wrapper="1"` on all Divs and Spans that are introduced just as containers for attributes that belong properly to their contents. So we don't need to add the attribute here. This gives much better results in some cases. Previously the wrapper attribute was being added even for explicit Divs and Spans in djot, but it is not needed in these cases.
2024-09-10	Add command test for #9121.pandoc-server-0.1.0.8 pandoc-lua-engine-0.3.2 pandoc-cli-3.4 3.4	John MacFarlane

2024-09-08	Text.Pandoc.Shared: add `makeSectionsWithOffsets` [API change].	John MacFarlane
	This is like `makeSections` but has an additional parameter specifying number offsets, for use with the `--number-offset` option. Use `makeSectionsWithOffsets` in HTML writer instead of ad hoc and inefficient number-adjusting code. Clarify MANUAL.txt: the `--number-offset` option should only directly affect numbering of the first section heading in a document; subsequent headings will increment normally. Fix test output for #5071 to reflect this.
2024-09-08	Add options to change table/figure caption positions.	John MacFarlane
	+ Add command line options `--table-caption-position` and `--figure-caption-position`. These allow the user to specify whether to put captions above or below tables and figures, respectively. The following output formats are supported: HTML (and related such as EPUB), LaTeX (and Beamer), Docx, ODT/OpenDocument, Typst. + Text.Pandoc.Options: add `CaptionPosition` and new `WriterOptions` fields `writerFigureCaptionPosition` and `writerTableCaptionPosition` [API change]. + Text.Pandoc.Opt: add `Opt` fields `optFigureCaptionPosition` and `optTableCaptionPosition` [API change]. + Docx writer: make table/figure rendering sensitive to caption position settings. + OpenDocument writer: make table/figure rendering sensitive to caption position settings. + Typst writer/template: implement figure caption positions by triggering a show rule in the default template, which determines caption positions for figures and tables globally. + LaTeX writer: make table/figure rendering sensitive to caption position settings. Closes #5116. + HTML writer/template: make `<figcaption>` placement sensitive to caption position settings. For tables, `<caption>` must be the first element, and positioning is determined by CSS, for here we set a variable which the default template is sensitive to.
2024-09-07	LaTeX reader: math environments don't have bracketed options.	John MacFarlane
	We were checking for these, and that caused problems if the math began with `[`. Closes #10160.
2024-09-03	RTF reader: handle images inside shp contexts.	John MacFarlane
	We look for the `pib` property and parse image data inside its value. Closes #10145.
2024-09-03	Typst writer: don't include trailing semicolon after...	John MacFarlane
	`@` style citations with suffixes. Closes #10148.
2024-09-01	LaTeX reader: parse nested tabular environments.	John MacFarlane
	Closes #4746.
2024-08-29	Markdown writer: avoid emitting markdown caption if...	John MacFarlane
	...table has fallen back to raw HTML, which will then contain a `<caption>` tag. Closes #10094.
2024-08-29	RST reader: improve simple table support.	John MacFarlane
	Multiline rows occur only when the first cell is empty; we were previously treating lines with any empty cell as row continuations. Closes #10093. In addition, we no longer wrap multiline cells in Para if they can be represented as Plain. This is consistent with docutils behavior.
2024-08-23	AsciiDoc writer: add `link:` prefix when needed.	John MacFarlane
	AsciiDoc requires it except for http, https, irc, mailto, ftp schemes. Closes #10105.
2024-08-08	Org reader: fix parsing of src blocks with an `-i` flag.	Albert Krewinkel
	Tabs are now preserved in the contents of src blocks if the the block has the `-i` flag. Fixes: #10071
2024-08-06	Asciidoc writer: preserve original base level.	John MacFarlane
	We used to normalize so that the base level is always 1, but asciidoc no longer seems to care about that, and the behavior creates difficulties when we are converting fragments. Closes #10062.
2024-08-06	LaTeX writer: preserve locator labels with `--natbib`.	John MacFarlane
	In #9275 we made pandoc strip off locator labels (e.g. `p.`) for natbib and biblatex output. In fact, this is only desirable for biblatex. natbib needs the locators to be preserved. Closes #10057.
2024-07-15	BibTeX writer: ensure that "literal" names are enclosed in braces.	John MacFarlane
	Closes #9987.
2024-07-05	Change sr -> sr-Latn in translation tests	Stephen Huan

2024-07-03	Adjust test for #9943 so it works with Windows.	John MacFarlane

2024-07-03	Man writer: use default middle header when metadata does not...	John MacFarlane
	include `header`. This change causes pandoc to omit the middle header parameter when `header` is not set, rather than emitting `""`. The parameter is optional and man will use a default based on the section if it is not specified. Closes #9943.
2024-06-23	LaTeX writer: new method for ensuring images don't overflow.	John MacFarlane
	Previously we relied on graphicx internals and made global changes to Gin to force images to be resized if they exceed textwidth. This approach is brittle and caused problems with `\includesvg` (see #9660). The new approach uses a new macro `\pandocbounded` that is now defined in the LaTeX template. (Thanks here to Falk Hanisch in https://github.com/mrpiggi/svg/issues/60.) The LaTeX writer has been changed to enclose `\includegraphics` and `\includesvg` commands in this macro when they don't explicitly specify a width or height. In addition, the writer now adds `keepaspectratio` to the `\includegraphics` or `\includesvg` options if `height` is specified without width, or vice versa. Previously, this was set in the preamble as a global option. Compatibility issues: - If custom templates are used with the new LaTeX writer, they will have to be updated to include the new `\pandocbounded` macro, or an error will be raised because of the undefined macro. - Documents that specify explicit dimensions for an image may render differently, if the dimensions are greater than the line width or page height. Previously pandoc would shrink these images to fit, but the new behavior takes the specified dimensions literally. In addition, pandoc previously always enforced `keepaspectratio`, even when width and height were both specified, so images with width and height specified that do not conform to their intrinsic aspect ratio will appear differently. Closes #9660.
2024-06-23	Markdown writer: fix bug with block quotes in lists.	John MacFarlane
	Closes #9908.