aboutsummaryrefslogtreecommitdiff
path: root/test
AgeCommit message (Collapse)Author
2024-12-05Add mdoc readerEvan Silberman
This change introduces a reader for mdoc, a roff-derived semantic markup language for manual pages. The two relevant contemporary implementations of mdoc for manual pages are mandoc (https://mandoc.bsd.lv/), which implements the language from scratch in C, and groff (https://www.gnu.org/software/groff/), which implements it as roff macros. mdoc has a lot of semantics specific to technical manuals that aren't representable in Pandoc's AST. I've taken a cue from the mandoc HTML output and many mdoc elements are encoded as Codes or Spans with classes named for the mdoc macro that produced them. Much like web browsers with HTML, mandoc attempts to produce best-effort output given all kinds of weird and crappy mdoc input. Part of the reason it's able to do this is it uses a very accommodating parse tree and stateful output routines specialized to the output mode, and when it encounters some macro it wasn't expecting, it can easily give up on whatever it was outputting and output something else. I've encoded as much flexibility as I reasonably could into the mdoc reader here, but I don't know how to be as flexible as mandoc. This branch has been developed almost exclusively against mandoc's documentation and implementation of mdoc as a reference, and the real-world manual pages tested against are those from the OpenBSD base system. Of ~3500 manuals in mdoc format shipped with a fresh OpenBSD install, 17 cause the mdoc reader to exit with a parse error. Any further chasing of edge cases is deferred to future work. Many of the tests in test/Tests/Readers/Mdoc.hs are derived directly from mandoc's extensive regression tests. [API change] Adds readMdoc to the public API
2024-12-05Parameterize Roff escapingEvan Silberman
The existing lexRoff does some stuff I don't want to deal with in mdoc just yet, like lexing tbl, and some stuff I won't do at all, like handling macro and text string definitions and switching between modes. Uses a typeclass with associated type families to reuse most of the escaping code between Roff (i.e. man) and Mdoc. Future work could improve on this so that more lexing code could be shared between Man and Mdoc. Mdoc inherits Roff's surface syntax so hypothetically it makes sense to lex it into tokens that make sense for roff. But it happens that the Mdoc parser is much easier to build with an Mdoc specific token stream. Some discussion in jgm/pandoc#10225 about the rationale. Adds a test for the roff \A escape, which I accidentally dropped support for in an earlier iteration without anything complaining.
2024-12-05Docx reader: parse index references as empty Spans.John MacFarlane
See #10171.
2024-12-03Depend on latest dev texmath, update tests.John MacFarlane
2024-11-19MediaWiki reader: fix indented tables with caption.John MacFarlane
Closes #10390.
2024-11-11Respect empty LineBlock lines in ANSI writerEvan Silberman
2024-11-11Respect empty LineBlock lines in plain writerEvan Silberman
The plain writer behaved as a markdown variant with Ext_line_blocks turned off, and so empty lines in a line block would get eliminated. This is surprising, since if there's anything where the intent can be preserved in plain text output it's empty lines. It's still a bit surprising to have nbsps in plain text output, as in the test, where the distinction doesn't really matter, but that'd be an orthogonal change.
2024-11-11Remove definitions.typst partial.John MacFarlane
Remove unnecessary definition of `endnote`. Incorporate the one remaining definition into `default.typst`.
2024-11-09Typst writer: make template sensitive to a `page-numbering` variable.John MacFarlane
This can be set to an empty string (or, in metadata, to false) for no page numbers. Addresses #10370.
2024-11-06Fix typos (#10349)Andreas Deininger
2024-11-04JATS writer: correct spelling of suppress attribute (#10350)Andreas Deininger
2024-10-29Adjust test suite for typst template changes.John MacFarlane
Note: the new templates presuppose typst 0.12; if you try to use an earlier version of typst, an error will be raised.
2024-10-25LaTeX reader: put minipage in specially marked Div.John MacFarlane
Closes #10266.
2024-10-23RST reader: implement option lists.John MacFarlane
Closes #10318.
2024-10-23HTML writer: unwrap empty incremental divsAlbert Krewinkel
Divs are unwrapped if the only purpose of the div seems to be to control whether lists are presented incrementally on slides. Closes: #10328
2024-10-22Update typst tests for math symbol changes.John MacFarlane
2024-10-21Typst reader: avoid generating empty paragraphs.John MacFarlane
2024-10-17Fix typo in test case.John MacFarlane
2024-10-16RST reader: handle block level substitutions.John MacFarlane
2024-10-15RST reader: avoid putting metadata in Para.John MacFarlane
Create MetaInlines when possible, just as with markdown input. MetaBlocks is still used when there are multiple paragraphs or non-paragraph content. This change also affects field lists. Closes #7766.
2024-10-15RST reader: fix linked substitutions.John MacFarlane
E.g. `|Python|_`. Closes #6588.
2024-10-15RST reader: support inline anchors.John MacFarlane
Closes #9196.
2024-10-15RST reader: explicit links define references.John MacFarlane
For example, ``Go to `g`_ `g <www.example.com>`_.`` should produce two links to www.example.com. Closes #5081.
2024-10-13RST reader: Use a new one-pass parsing strategy.John MacFarlane
Instead of having an initial pass where we collect reference definitions, we create links with target `##SUBST##something` or `##REF##something` or `##NOTE##something`, and resolve these in a pass over the parsed AST. This allows us to handle link references that are not at the top level. Closes #10281.
2024-10-09RST reader: ignore newlines in URL in explicit link.John MacFarlane
Closes #10279.
2024-10-08Typst writer: make `smart` extension work.John MacFarlane
If `smart` is not enabled, a command in the default template will disable smartquote substitutions. When `smart` is enabled, render curly apostrophes as straight and escape straight apostrophes. When `smart` is disabled, render curly apostrophes as curly and don't escape straight apostrophes. And similarly for quotes, em and en dashes. This should give more idiomatic typst output, with fewer unnecessary escapes. Closes #10271.
2024-10-08MediaWiki reader: Fix parsing of col/rowspan.John MacFarlane
Closes #6992.
2024-10-01Make --number-sections work with beamer (#10245)Thomas Hodgson
+ Remove section numbering code from common.latex + Add section numbering to default.latex + Add logic for numbering sections in default.beamer. I moved the template setting code to where other beamer templates are set. This makes the section-titles and numbersections variables independent. This should make --number-sections work with beamer.
2024-10-01RST writer: change bullet list hang from 3 to 2.John MacFarlane
This accords with the style in the reference docs.
2024-10-01Amend the fix to #10236 to handle list tables.John MacFarlane
With this patch, we also reuse bullet list code for list tables, which simplifies the code.
2024-09-30RST writer: handle cases where indented context starts with block quote.John MacFarlane
In these cases we emit an empty comment to fix the point from which indentation is measured; otherwise the block quote is not parsed as a block quote. This affects list items and admonitions. Cloess #10236.
2024-09-30LaTeX writer: better fix for lists in definition lists.John MacFarlane
In commit a26ec96d89ccf532f7bca7591c96ba30d8544e4a we added an empty `\item[]` to the beginning of a list that occurs first in a definition list, to avoid having one item on the line with the label. This gave bad results in some cases (#10241) and there is a more idiomatic solution anyway: using `\hfill`. Closes #10241.
2024-09-30Fix invalid XML in test/docx/normalize.docx.John MacFarlane
Closes #10242.
2024-09-29Refactor latex template using partials.John MacFarlane
+ Split out common parts of latex template into partials: common.latex, fonts.latex, font-settings.latex, passoptions.latex, hypersetup.latex, after-header-includes.latex. + Split out old latex template into default.latex and default.beamer. + Make default.beamer the default template for beamer.
2024-09-27RST writer: fix two issues with list tables.John MacFarlane
- Fix alignment of list items corresponding to cells. - Don't enclose the list table in a `.. table::`; this leads to doubled captions. Closes #10226. Closes #10227. Modified test output for #4564.
2024-09-21Dokuwiki reader: be more forgiving about misaligned lists...John MacFarlane
like dokuwiki itself. Closes #8863.
2024-09-21Improve blockquote parsing in dokuwiki.John MacFarlane
Allow for quoted code blocks.
2024-09-21DokuWiki reader: fix block quote behavior.John MacFarlane
Closes #6461. Blockquotes are not really block containers in DokuWiki; the lines are interpreted literally (so, e.g., you can't start a list), and line breaks are added at the ends.
2024-09-21DokuWiki writer: don't emit `<HTML>` tags.John MacFarlane
The use of these tags is now strongly discouraged for security reasons, and will be removed. We previously used them as a fallback for lists that could not be represented using DokuWiki syntax, e.g. ordered lists with fancy numbers or lists with multiple blocks in their items. We also used them for block quotes with multiple blocks as their contents. We now use the `<WRAP>` syntax (from the optional WRAP plugin) to handle lists with multiple blocks as their contents. A new method of handling block quotes with complex contents has the side benefit of also handling nested block quotes, which weren't supported before. `<HTML>` and `<html>` tags are only for raw HTML blocks and inlines, and only if the `raw_html` extension is enabled. (It is now a valid extension for `dokuwiki`, though off by default.) Closes #7413.
2024-09-14Parse id, class, and tabstyle on tables in DocBook ReaderErik Rask
Add parsing of id (xml:id), class, and tabstyle XML attributes for table and informaltable in the DocBook reader. The tabstyle value is put in the 'custom-style' attribute. fixes #10181
2024-09-14Fix test case #10185.John MacFarlane
2024-09-14LaTeX writer: avoid error on `refs` div with empty citations.John MacFarlane
If there are no citations, don't emit an empty CSLReferences environment. Closes #10185.
2024-09-13T.P.Shared addPandocAttributes - modify for new commonmark-pandoc.John MacFarlane
The new commonmark-pandoc version automatically adds the attribute `wrapper="1"` on all Divs and Spans that are introduced just as containers for attributes that belong properly to their contents. So we don't need to add the attribute here. This gives much better results in some cases. Previously the wrapper attribute was being added even for explicit Divs and Spans in djot, but it is not needed in these cases.
2024-09-11Fix copy-pasteo in ANSI writer subscriptsEvan Silberman
The ANSI Superscript writer was copy-pasted and incompletely modified to create the Subscript output writer and I forgot to call the right function, so subscripts were getting rendered with superscripted numbers. Adds a line for this to the ANSI golden test.
2024-09-10Add command test for #9121.pandoc-server-0.1.0.8pandoc-lua-engine-0.3.2pandoc-cli-3.43.4John MacFarlane
2024-09-09ANSI writer: use black rather than green for headings.John MacFarlane
2024-09-09Tests.Readers.Markdown: avoid use of 'head'.John MacFarlane
2024-09-09Tests: use 'drop 1' instead of partial function 'tail'.John MacFarlane
2024-09-09Avoid use of 'head' in Tests.Shared.John MacFarlane
2024-09-08Typst reader: change how "block" elements are handled.John MacFarlane
Previously they were always parsed as divs. But actually they can occur in some "inline" contexts. Now we first try to parse them as inlines, and only as blocks if that fails. A surrounding Div or Span element is added only if there is an identifier.