| Age | Commit message (Collapse) | Author |
|
Closes #10490.
|
|
Closes #10491.
|
|
|
|
|
|
When `--top-level-division=chapter` is used, a paragraph with
section properties is inserted before each level-1 heading.
By default, this causes the new heading to start on a new page
(though this default can be adjusted in Word).
This change should also make it possible to number footnotes
by chapter (#2773), though that change isn't yet made.
|
|
...markdown raw blocks. For motivation see #10477.
|
|
This case was missed when changing the reference link strategy for RST
to allow a single pass.
Closes #10484.
|
|
The given example wasn't actually functional because `anyChar` parses a
`Char` and `charsInBalanced` wants a `Text` parser as its inner parser.
|
|
...that would otherwise be interpreted as list starts.
Closes #9700.
|
|
Closes #10403.
|
|
|
|
...in the same way it works for other formats (with the top-level
heading being promoted to metadata title). This needed special
treatment because of the way djot surrounds sections with Divs.
Closes #10459.
|
|
Avoid the slow URI parser from network-uri on large data URIs.
See #10075. In a benchmark with a large base64 image in HTML ->
docx, this patch causes us to go from 7942 GCs to 3654, and from
3781M in use to 1396M in use.
(Note that before the last few commits, this was running 9099 GCs
and 4350M in use.)
|
|
This avoids calling the slow URI parser from network-uri on
data URIs, instead calling our own parser.
Benchmarks on an html -> docx conversion with large base64 image:
GCs from 7942 to 6695, memory in use from 3781M to 2351M,
GC time from 7.5 to 5.6.
See #10075.
|
|
|
|
We had a few special cases encoded, but as previously written
they wouldn't work properly with modifiers like `;charset=utf-8`.
|
|
The intent is to allow bash process substitution: e.g.,
`--template <(echo "foo")`.
Previously pandoc *always* added an extension based on the
output format, which caused problems with the absolute filenames
used by bash process substitution (e.g. `/dev/fd/11`).
Now, if the template has no extension, pandoc will first
try to find it without the extension, and then add the
extension if it can't be found.
So, in general, extensionless templates can now be used.
But this has been implemented in a way that should not cause
problems for existing uses, unless you are using a template
`NAME.FORMAT` but happen to have an extensionless file `NAME` in
the template search path.
Closes #5270.
|
|
This was done to determine the "media category," but we can
get that directly from the mime component of data: URIs.
Profiling revealed that a significant amount of time was
spent in this function when a file contained images with
large data URIs.
Contributes to addressing #10075.
|
|
Text.Pandoc.URI: export `pBase64DataURI`. Modify `isURI` to use this
and avoid calling network-uri's inefficient `parseURI` for data URIs.
Markdown reader: use T.P.URI's `pBase64DataURI` in parsing data
URIs.
Partially addresses #10075.
Obsoletes #10434 (borrowing most of its ideas).
Co-authored-by: Evan Silberman <[email protected]>
|
|
This fixes an omission in the last commit.
|
|
This patch borrows some code from @silby's PR #10434 and should
be regarded as co-authored. This is a lighter-weight patch
that only touches the Markdown reader.
The basic idea is to speed up parsing of base64 URIs by parsing
them with a special path. This should improve the problem
noted at #10075.
Benchmarks (optimized compilation):
Converting the large test.md from #10075 (7.6Mb embedded image)
from markdown to json,
before: 6182 GCs, 1578M in use, 5.471 MUT, 1.473 GC
after: 951 GCs, 80M in use, .247 MUT, 0.035 GC
For now we leave #10075 open to investigate improvements in
HTML rendering with these large data URIs.
Co-authored-by: Evan Silberman <[email protected]>
|
|
It can be very expensive to call network-uri's URI parser on
these. See #10075.
|
|
Closes #10472.
|
|
The span needs to be separated from its surroundings by spaces.
Also, a span can have attributes, which we now attach.
Closes #9878.
|
|
...is preceded by whitespace.
Closes #10414.
|
|
Closes #10456.
|
|
We need to produce an svg tag and parse it using `image.decode`.
This is slightly roundabout but doesn't require any external
libraries.
Closes #10460.
|
|
This change affects the default openxml template as well as the
OpenXML writer.
Closes #10282 (regression introduced in pandoc 3.5).
|
|
We used to set this to `.html`, but this seemed inappropriate
once we started using this function for `--pdf-engine=typst`.
So we changed it in pandoc 3.6 to `.source`. But apparently
`wkhtmltopdf` needs it to be `.html`. So now we have added
a parameter to `toPdfViaTempFile` that allows the extension
to be specified.
Closes #10468.
|
|
|
|
Instead of storing isDisplay and then always choosing displayMath or math
based on that, just store displayMath or math directly.
|
|
|
|
|
|
"all f []" is always true, so "null xs || all f xs" can be simplified
to just "all f xs".
|
|
This is more efficient than the existing one.
|
|
|
|
|
|
Previously, they had to be YAML objects with a `references` key.
Closes #10452.
|
|
|
|
|
|
|
|
Previously it did not (contrary to what was implied by the manual),
which means that an image with URL `/etc/passwd` would leak an
encoded version of that file to HTML output with `--self-contained`
or `--embed-resources`, even if `--sandbox` was used.
Thanks to Samuel Mortenson for pointing out the issue.
|
|
This computes the sandboxed files from Opt and avoids some
code repetition in T.P.App and T.P.App.OutputSettings.
|
|
See #10171.
|
|
Closes #5294.
|
|
This change introduces a reader for mdoc, a roff-derived semantic markup
language for manual pages. The two relevant contemporary implementations
of mdoc for manual pages are mandoc (https://mandoc.bsd.lv/), which
implements the language from scratch in C, and groff
(https://www.gnu.org/software/groff/), which implements it as roff macros.
mdoc has a lot of semantics specific to technical manuals that aren't
representable in Pandoc's AST. I've taken a cue from the mandoc HTML
output and many mdoc elements are encoded as Codes or Spans with classes
named for the mdoc macro that produced them.
Much like web browsers with HTML, mandoc attempts to produce best-effort
output given all kinds of weird and crappy mdoc input. Part of the
reason it's able to do this is it uses a very accommodating parse tree
and stateful output routines specialized to the output mode, and when it
encounters some macro it wasn't expecting, it can easily give up on
whatever it was outputting and output something else. I've encoded as
much flexibility as I reasonably could into the mdoc reader here, but I
don't know how to be as flexible as mandoc.
This branch has been developed almost exclusively against mandoc's
documentation and implementation of mdoc as a reference, and the
real-world manual pages tested against are those from the OpenBSD base
system. Of ~3500 manuals in mdoc format shipped with a fresh OpenBSD
install, 17 cause the mdoc reader to exit with a parse error. Any
further chasing of edge cases is deferred to future work.
Many of the tests in test/Tests/Readers/Mdoc.hs are derived directly
from mandoc's extensive regression tests.
[API change] Adds readMdoc to the public API
|
|
The existing lexRoff does some stuff I don't want to deal with in mdoc
just yet, like lexing tbl, and some stuff I won't do at all, like
handling macro and text string definitions and switching between modes.
Uses a typeclass with associated type families to reuse most of the
escaping code between Roff (i.e. man) and Mdoc.
Future work could improve on this so that more lexing code could be
shared between Man and Mdoc. Mdoc inherits Roff's surface syntax so
hypothetically it makes sense to lex it into tokens that make sense for
roff. But it happens that the Mdoc parser is much easier to build with
an Mdoc specific token stream. Some discussion in jgm/pandoc#10225 about
the rationale.
Adds a test for the roff \A escape, which I accidentally dropped support
for in an earlier iteration without anything complaining.
|
|
Support crossrefs.
Clean up and unify switch parsing for fields.
|
|
See #10171.
|
|
|