aboutsummaryrefslogtreecommitdiff
path: root/src/Text
AgeCommit message (Collapse)Author
2025-01-10Docx reader and writer: support row heads.John MacFarlane
Reader: When `w:tblLook` has `w:firstColumn` set (or an equivalent bit mask), we set row heads = 1 in the AST. Writer: set `w:firstColumn` in `w:tblLook` when there are row heads. (Word only allows one, so this is triggered by any number of row heads > 0.) Closes #9495.
2025-01-10Docx reader: read table styles as custom styles...John MacFarlane
...when `styles` extension is enabled. Closes #9603. Also improve manual's coverage of custom styles.
2025-01-07HTML reader: add size information for fa svg icons.John MacFarlane
If the icon has class fa-fw or fa-w16 or fa-w14, we add a width attribute to prevent the icon from appearing full-width in PDF or docx output. Closes #10134.
2025-01-06Djot reader/writer highlighted text fixes:John MacFarlane
- The reader now uses a Span with class "mark" rather than "highlighted", for consistency with the other pandoc readers and writers. - The writer renders a Span with sole class "mark" as highlighted text.
2025-01-06Asciidoc writer: don't emit class in span if it's just "mark".John MacFarlane
"mark" class is used for highlighting, and Asciidoc treats bare `#...#` with no attributes as highlighted text. Closes #10511.
2025-01-06EPUB v2 writer: fix cover image.John MacFarlane
Closes #10505. Regression from 3.6 caused by #10404.
2025-01-03Add mdoc St for C23Evan Silberman
Following mandoc: https://cvsweb.bsd.lv/mandoc/st.c?rev=1.19&content-type=text/x-cvsweb-markup
2025-01-01Typst writer: fix handling of pixel image dimensions.John MacFarlane
These are now converted to inches as in the LaTeX writer. Closes #9945.
2024-12-28AsciiDoc writer: improve escaping.John MacFarlane
Closes #10385. Closes #2337. Closes #6424.
2024-12-27RST reader: fix handling of underscores.John MacFarlane
Fixes a regression in 3.6 that caused problems parsing text with underscores. Closes #10497.
2024-12-27Make pod the default reader for .pod/.pl/.pmEvan Silberman
2024-12-27Add Pod readerEvan Silberman
Pod ("Plain old documentation") is a markup languaged used principally to document Perl modules and programs. Since it was originally meant to be translated pretty directly to man, the semantics are fairly simple. This Pod reader was developed with reference to the canonical user and implementer documentation of Pod: https://perldoc.perl.org/perlpod and https://perldoc.perl.org/perlpodspec. There are 1490 .pod, .pl, and .pm in the Perl 5.34 distribution found in /System/Library/Perl on my mac. Of those, this reader dies with a parse error on 7 of them. All of them seem to be cases where pod commands are found within a non-colon-prefixed =begin/=end. perlpodspec says I may treat this as an error. [API change] adds readPod
2024-12-23Improve message for asciidoc input error (#10492)Santiago Zarate
Closes #8416.
2024-12-23MediaWiki reader: allow empty quoted attributes.John MacFarlane
Closes #10490.
2024-12-23MediaWiki reader: allow cells starting with `+`.John MacFarlane
Closes #10491.
2024-12-23Fix spacing error.John MacFarlane
2024-12-22Remove old comment-out line.John MacFarlane
2024-12-22Docx writer: better handling of chapters.John MacFarlane
When `--top-level-division=chapter` is used, a paragraph with section properties is inserted before each level-1 heading. By default, this causes the new heading to start on a new page (though this default can be adjusted in Word). This change should also make it possible to number footnotes by chapter (#2773), though that change isn't yet made.
2024-12-22Markdown writer: avoid collapsing of initial/final newline in...John MacFarlane
...markdown raw blocks. For motivation see #10477.
2024-12-22RST reader: handle explicit reference links (#10485)silby
This case was missed when changing the reference link strategy for RST to allow a single pass. Closes #10484.
2024-12-20Correct example in charsInBalancedEvan Silberman
The given example wasn't actually functional because `anyChar` parses a `Char` and `charsInBalanced` wants a `Text` parser as its inner parser.
2024-12-20Mediawiki writer: escape line-initial characters...John MacFarlane
...that would otherwise be interpreted as list starts. Closes #9700.
2024-12-20LaTeX writer: properly handle boolean value for `csquotes` variable.John MacFarlane
Closes #10403.
2024-12-19Mention typst in PandocUnknownWriterError for pdfEvan Silberman
2024-12-19Allow `--shift-heading-level-by=-1` to work in djot...John MacFarlane
...in the same way it works for other formats (with the top-level heading being promoted to metadata title). This needed special treatment because of the way djot surrounds sections with Divs. Closes #10459.
2024-12-19T.P.mediaBag insertMedia: fast path for data URIs.John MacFarlane
Avoid the slow URI parser from network-uri on large data URIs. See #10075. In a benchmark with a large base64 image in HTML -> docx, this patch causes us to go from 7942 GCs to 3654, and from 3781M in use to 1396M in use. (Note that before the last few commits, this was running 9099 GCs and 4350M in use.)
2024-12-19T.P.Class: shortcut for base64 data URIs in `downloadOrRead`.John MacFarlane
This avoids calling the slow URI parser from network-uri on data URIs, instead calling our own parser. Benchmarks on an html -> docx conversion with large base64 image: GCs from 7942 to 6695, memory in use from 3781M to 2351M, GC time from 7.5 to 5.6. See #10075.
2024-12-19T.P.URI: pBase64DataURI now returns mime + bytesJohn MacFarlane
2024-12-19T.P.MIME: fix `extensionFromMimeType`.John MacFarlane
We had a few special cases encoded, but as previously written they wouldn't work properly with modifiers like `;charset=utf-8`.
2024-12-19Change `--template` to allow use of extensionless templates.John MacFarlane
The intent is to allow bash process substitution: e.g., `--template <(echo "foo")`. Previously pandoc *always* added an extension based on the output format, which caused problems with the absolute filenames used by bash process substitution (e.g. `/dev/fd/11`). Now, if the template has no extension, pandoc will first try to find it without the extension, and then add the extension if it can't be found. So, in general, extensionless templates can now be used. But this has been implemented in a way that should not cause problems for existing uses, unless you are using a template `NAME.FORMAT` but happen to have an extensionless file `NAME` in the template search path. Closes #5270.
2024-12-18HTML writer: avoid calling parseURIString for data URIs.John MacFarlane
This was done to determine the "media category," but we can get that directly from the mime component of data: URIs. Profiling revealed that a significant amount of time was spent in this function when a file contained images with large data URIs. Contributes to addressing #10075.
2024-12-18Further improvements to base64 data URI parsing.John MacFarlane
Text.Pandoc.URI: export `pBase64DataURI`. Modify `isURI` to use this and avoid calling network-uri's inefficient `parseURI` for data URIs. Markdown reader: use T.P.URI's `pBase64DataURI` in parsing data URIs. Partially addresses #10075. Obsoletes #10434 (borrowing most of its ideas). Co-authored-by: Evan Silberman <[email protected]>
2024-12-18Markdown reader: Adjust source position in data: URI parser.John MacFarlane
This fixes an omission in the last commit.
2024-12-18Markdown reader: more efficient base64 data URI parsing.John MacFarlane
This patch borrows some code from @silby's PR #10434 and should be regarded as co-authored. This is a lighter-weight patch that only touches the Markdown reader. The basic idea is to speed up parsing of base64 URIs by parsing them with a special path. This should improve the problem noted at #10075. Benchmarks (optimized compilation): Converting the large test.md from #10075 (7.6Mb embedded image) from markdown to json, before: 6182 GCs, 1578M in use, 5.471 MUT, 1.473 GC after: 951 GCs, 80M in use, .247 MUT, 0.035 GC For now we leave #10075 open to investigate improvements in HTML rendering with these large data URIs. Co-authored-by: Evan Silberman <[email protected]>
2024-12-18HTML reader: don't canonicalize data: URIs.John MacFarlane
It can be very expensive to call network-uri's URI parser on these. See #10075.
2024-12-18LaTeX reader: handle `figure*` environment as a figure.John MacFarlane
Closes #10472.
2024-12-17Textile reader: improve parsing of spans.John MacFarlane
The span needs to be separated from its surroundings by spaces. Also, a span can have attributes, which we now attach. Closes #9878.
2024-12-17Textile reader: inline constructors don't trigger if closer...John MacFarlane
...is preceded by whitespace. Closes #10414.
2024-12-17LaTeX writer: use displayquote for block quotes with csquotes.John MacFarlane
Closes #10456.
2024-12-17Typst writer: properly handle data: URIs in images.John MacFarlane
We need to produce an svg tag and parse it using `image.decode`. This is slightly roundabout but doesn't require any external libraries. Closes #10460.
2024-12-17Docx writer: use styleIds not styleNames for Title, Subtitle, etc.John MacFarlane
This change affects the default openxml template as well as the OpenXML writer. Closes #10282 (regression introduced in pandoc 3.5).
2024-12-17Text.Pandoc.PDF: fix temp file extension in `toPdfViaTempFile`.John MacFarlane
We used to set this to `.html`, but this seemed inappropriate once we started using this function for `--pdf-engine=typst`. So we changed it in pandoc 3.6 to `.source`. But apparently `wkhtmltopdf` needs it to be `.html`. So now we have added a parameter to `toPdfViaTempFile` that allows the extension to be specified. Closes #10468.
2024-12-14Use lastMay instead of reverseJoseph C. Sible
2024-12-14Store a function instead of a BooleanJoseph C. Sible
Instead of storing isDisplay and then always choosing displayMath or math based on that, just store displayMath or math directly.
2024-12-14Use <$> instead of >>= and returnJoseph C. Sible
2024-12-14Put the length in the range expression instead of calling take laterJoseph C. Sible
2024-12-14Remove redundant null checkJoseph C. Sible
"all f []" is always true, so "null xs || all f xs" can be simplified to just "all f xs".
2024-12-14Use the definition of unsnoc from baseJoseph C. Sible
This is more efficient than the existing one.
2024-12-14Use catMaybes instead of building with maybe and (:) one element at a timeJoseph C. Sible
2024-12-14Remove several unnecessary layers of indirection from refsJoseph C. Sible