aboutsummaryrefslogtreecommitdiff
path: root/test/Tests/Readers/Org
AgeCommit message (Collapse)Author
2025-12-10Org: don't include 'example' class when parsing org example blocks.John MacFarlane
These are just unmarked code blocks. Closes #11339.
2025-08-27Org reader: improve sub- and superscript parsing.Albert Krewinkel
Sub- and superscript must be preceded by a string in Org mode. Some text preceded by space or at the start of a paragraph was previously parsed incorrectly as sub- or superscript.
2025-08-06Add `smart_quotes` and `special_strings` extensions for OrgAlbert Krewinkel
Org mode makes a distinction between smart parsing of quotes, and smart parsing of special strings like `...`. The finer grained control over these features is necessary to truthfully reproduce Emacs Org mode behavior. Special strings are enabled by default, while smart quotes are disabled. The behavior of `special_string` is brought closer to the reference implementation in that `\-` is now treated as a soft hyphen.
2025-07-24Org reader: Recognize "fast access" characters in TODO state definitions ↵Ryan Gibb
(#10990)
2025-05-11Remove some redundant code in test.John MacFarlane
2025-05-11Org reader: change handling of inline TeX.John MacFarlane
Previously inline TeX was handled in a way that was different from org's own export, and that could lead to information loss. This was particularly noticeable for inline math environments such as `equation`. Previously, an `equation` environment starting at the beginning of a line would create a raw block, splitting up the paragraph containing it (see #10836). On the other hand, an `equation` environment not at the beginning of a line would be turned into regular inline elements representing the math. (This would cause the equation number to go missing and in some cases degrade the math formatting.) Now, we parse all of these as raw "latex" inlines, which will be omitted when converting to formats other than LaTeX (and other formats like pandoc's Markdown that allow raw LaTex). Closes #10836.
2024-07-08Harmonize maintainer email addresses in module headers.Albert Krewinkel
2024-04-25Update copyright dates to 2024.John MacFarlane
2023-08-26Org reader: allow escaping commas in macro argumentsAmneesh Singh
Signed-off-by: Amneesh Singh <[email protected]>
2023-04-05Org reader: treat `#+NAME` as synonym for `#+LABEL`.Albert Krewinkel
Closes: #8578
2023-03-22Org reader: Allow zero width space as an escape characterChristian Christiansen
Allow the character U+200B to be used as an escape character as described in the Org-mode documentation https://orgmode.org/manual/Escape-Character.html Closes issue #8716.
2023-01-13Support complex figures. [API change]Albert Krewinkel
Thanks and credit go to Aner Lucero, who laid the groundwork for this feature in the 2021 GSoC project. He contributed many changes, including modifications to the readers for HTML, JATS, and LaTeX, and to the HTML and JATS writers. Shared (Albert Krewinkel): - The new function `figureDiv`, exported from `Text.Pandoc.Shared`, offers a standardized way to convert a figure into a Div element. Readers (Aner Lucero): - HTML reader: `<figure>` elements are parsed as figures, with the caption taken from the respective `<figcaption>` elements. - JATS reader: The `<fig>` and `<caption>` elements are parsed into figure elements, even if the contents is more complex. - LaTeX reader: support for figures with non-image contents and for subfigures. - Markdown reader: paragraphs containing just an image are treated as figures if the `implicit_figures` extension is enabled. The identifier is used as the figure's identifier and the image description is also used as figure caption; all other attributes are treated as belonging to the image. Writers (Aner Lucero, Albert Krewinkel): - DokuWiki, Haddock, Jira, Man, MediaWiki, Ms, Muse, PPTX, RTF, TEI, ZimWiki writers: Figures are rendered like Div elements. - Asciidoc writer: The figure contents is unwrapped; each image in the the figure becomes a separate figure. - Classic custom writers: Figures are passed to the global function `Figure(caption, contents, attr)`, where `caption` and `contents` are strings and `attr` is a table of key-value pairs. - ConTeXt writer: Figures are wrapped in a "placefigure" environment with `\startplacefigure`/`\endplacefigure`, adding the features caption and listing title as properties. Subfigures are place in a single row with the `\startfloatcombination` environment. - DocBook writer: Uses `mediaobject` elements, unless the figure contains subfigures or tables, in which case the figure content is unwrapped. - Docx writer: figures with multiple content blocks are rendered as tables with style `FigureTable`; like before, single-image figures are still output as paragraphs with style `Figure` or `Captioned Figure`, depending on whether a caption is attached. - DokuWiki writer: Caption and "alt-text" are no longer combined. The alt text of a figure will now be lost in the conversion. - FB2 writer: The figure caption is added as alt text to the images in the figure; pre-existing alt texts are kept. - ICML writer: Only single-image figures are supported. The contents of figures with additional elements gets unwrapped. - HTML writer: the alt text is no longer constructed from the caption, as was the case with implicit figures. This reduces duplication, but comes at the risk of images that are missing alt texts. Authors should take care to provide alt texts for all images. Some readers, most notably the Markdown reader with the `implicit_figures` extension, add a caption that's identical to the image description. The writer checks for this and adds an `aria-hidden` attribute to the `<figcaption>` element in that case. - JATS writer: The `<fig>` and `<caption>` elements are used write figures. - LaTeX writer: complex figures, e.g. with non-image contents and subfigures, are supported. The `subfigure` template variable is set if the document contains subfigures, triggering the conditional loading of the *subcaption* package. Contants of figures that contain tables are become unwrapped, as longtable environments are not allowed within figures. - Markdown writer: figures are output as implicit figures if possible, via HTML if the `raw_html` extension is enabled, and as Div elements otherwise. - OpenDocument writer: A separate paragraph is generated for each block element in a figure, each with style `FigureWithCaption`. Behavior for single-image figures therefore remains unchanged. - Org writer: Only the first element in a figure is given a caption; additional block elements in the figure are appended without any caption being added. - RST writer: Single-image figures are supported as before; the contents of more complex images become nested in a container of type `float`. - Texinfo writer: Figures are rendered as float with type `figure`. - Textile writer: Figures are rendered with the help of HTML elements. - XWiki: Figures are placed in a group. Co-authored-by: Aner Lucero <[email protected]>
2023-01-10Update copyright years, it's 2023!Albert Krewinkel
2022-10-10Org reader: make #+pandoc-emphasis-pre work as expected. (#8360)Amir Dekel
So far, `orgStateLastPreCharPos` wasn't updated appropriately after each parsing to native Str (by the parser `str`). In addition to solving this, the guard `notAfterString` in `emphasisStart` is removed to allow emphasis after Str at the first place.
2022-08-21Fix typosluz paz
Found via `codespell -q 3 -S changelog.md -L bu,fo,ist,mke,multline,noes,ot,pard,pres,tabl,te,tothe`
2022-05-02Org reader: allow attrs for Org tables. (#8049)Brian Leung
Tables with attributes are no longer wrapped in Div elements; attributes are added directly to the table element.
2022-02-23Tests: improve location reporting of failing testsAlbert Krewinkel
2022-02-21Org reader: More flexible LaTeX environmentsLucas V. R
Looking at the definition of `org-element-latex-environment-parser`, one sees that Org allows arbitrary arguments to LaTeX environments. In fact, it parses every char just after `\begin{xxx}` until `\end{xxx}` as content for the environment, so all the following examples are valid environments: ```org \begin{equation} e = mc^2 \end{equations} ``` ```org \begin{tikzcd}[ampersand replacement=\&] A \& B \\ C \& D \arrow[from=1-1, to=1-2] \arrow["f", from=2-1, to=2-2] \end{tikzcd} ```
2022-01-09Org reader: support alphabetical (fancy) listsLucas Viana
This adds support for alphabetical lists in org by enabling the extension Ext_fancy_lists, mimicking the behaviour of Org Mode when org-list-allow-alphabetical is enabled. Enabling Ext_fancy_lists will also make Pandoc differentiate between the delimiters of ordered lists (periods or closing parentheses). Org does this differentiation by default when exporting to some formats (e.g. plain text) but does not in others (e.g. html and latex), so I decided to copy Pandoc's markdown reader behaviour.
2022-01-06Org reader: support counter cookies in listsLucas Viana
This adds support for counter cookies in org lists. Such cookies are used to override the item counter in ordered lists. In org it is possible to set the counter at any list item, but since Pandoc AST does not support this, we restrict the usage to setting an offset for the entire ordered list, by using the cookie in the first list item. Note that even though unordered lists do not have counters, Org Mode still parses such cookies in unordered lists and suppresses them in the output, so we do the same. Also, even though org-list-allow-alphabetical is disabled in Emacs by default, for some reason alphabetical cookies are always parsed and used in Org Mode regardlessly of whether this option is enabled or the list style is decimal, so we do the same. E.g. 2. test 3. test Is parsed as an ordered list starting at 1, as before. This also conforms to Org Mode behaviour. 1. [@2] test 2. test Is now parsed as an ordered list starting at 2, so that it conforms to Org Mode behaviour. Note that when parsing 1. [@2] test 2. [@9] test the second cookie is silenced and the entire list starts at 2. This is because the current Pandoc AST does not support expressing a change in the counter at a specific item.
2022-01-02Copyright notices: update for 2022Albert Krewinkel
2022-01-01Org reader: allow trailing spaces after key/value pairs in directivesAlbert Krewinkel
Ensures that spaces at the end of attribute directives like `#+ATTR_HTML: :width 100%` (note the trailing spaces) are accepted.
2021-12-14Org reader: parse official org-cite citations.John MacFarlane
We also support the older org-ref style as a fallback. We no longer support the "markdown-style" citations. See #7329.
2021-12-14Org reader: remove support for "Berkeley style" citations.John MacFarlane
See #7329.
2021-10-22Org reader: allow an initial :PROPERTIES: drawer to add to metadata.John MacFarlane
Closes #7520.
2021-02-18Org reader: fix bug in org-ref citation parsing.Albert Krewinkel
The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101
2021-02-13Org: support task_lists extensionAlbert Krewinkel
The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-07Avoid unnecessary use of NoImplicitPrelude pragma (#7089)Albert Krewinkel
2021-01-09Org reader: allow multiple pipe chars in todo sequencesAlbert Krewinkel
Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED | DONE | FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel
2021-01-03Org reader: mark verbatim code with class "verbatim". (#6998)Dimitri Sabadie
* Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.
2021-01-01Org reader: restructure output of captioned code blocksAlbert Krewinkel
The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977
2020-12-05Org reader: preserve targets of spurious linksAlbert Krewinkel
Links with (internal) targets that the reader doesn't know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class `spurious-link`, with an attribute `target` set to the link's original target. This allows to recover and fix broken or unknown links with filters. See: #6916
2020-11-22Org reader: parse `#+LANGUAGE` into `lang` metadata fieldAlbert Krewinkel
Fixes: #6845
2020-11-18Replace org #+KEYWORDS with #+keywordsTEC
As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): https://code.orgmode.org/bzg/org-mode/commit/13424336a6f30c50952d291e7a82906c1210daf0 Upper case keywords are exclusive to the manual: - https://orgmode.org/list/[email protected]/ - https://orgmode.org/list/[email protected]/
2020-10-14Fix remaining typos in testsAlbert Krewinkel
See: #6738
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-07-01Org reader: respect tables-excluding export settingAlbert Krewinkel
Tables can be removed from the final document with the `#+OPTION: |:nil` export setting.
2020-06-30Org reader: respect export setting disabling footnotesAlbert Krewinkel
Footnotes can be removed from the final document with the `#+OPTION: f:nil` export setting.
2020-06-30Org reader: respect export setting which disables entitiesAlbert Krewinkel
MathML-like entities, e.g., `\alpha`, can be disabled with the `#+OPTION: e:nil` export setting.
2020-06-29Org reader: keep unknown keyword lines as raw orgAlbert Krewinkel
The lines of unknown keywords, like `#+SOMEWORD: value` are no longer read as metadata, but kept as raw `org` blocks. This ensures that more information is retained when round-tripping org-mode files; additionally, this change makes it possible to support non-standard org extensions via filters.
2020-06-29Org reader: unify keyword handlingAlbert Krewinkel
Handling of export settings and other keywords (like `#+LINK`) has been combined and unified.
2020-06-29Org reader: support LATEX_HEADER_EXTRA and HTML_HEAD_EXTRA settingsAlbert Krewinkel
These export settings are treated like their non-extra counterparts, i.e., the values are added to the `header-includes` metadata list.
2020-06-29Org reader: allow multiple #+SUBTITLE export settingsAlbert Krewinkel
The values of all lines are read as inlines and collected in the `subtitle` metadata field.
2020-06-28Org reader: read `#+INSTITUTE` values as text with markupAlbert Krewinkel
The value is stored in the `institute` metadata field and used in the default beamer presentation template.
2020-06-28Org tests: group export settings test for Org readerAlbert Krewinkel
2020-06-28Org reader: update behavior of author, keywords export settingsAlbert Krewinkel
The behavior of the `#+AUTHOR` and `#+KEYWORD` export settings has changed: Org now allows multiple such lines and adds a space between the contents of each line. Pandoc now always parses these settings as meta inlines; setting values are no longer treated as comma-separated lists. Note that a Lua filter can be used to restore the previous behavior.
2020-06-27Org reader: read description lines as inlinesAlbert Krewinkel
`#+DESCRIPTION` lines are treated as text with markup. If multiple such lines are given, then all lines are read and separated by soft linebreaks. Closes: #6485
2020-06-25Org reader: honor tex export optionAlbert Krewinkel
The `tex` export option can be set with `#+OPTION: tex:nil` and allows three settings: - `t` causes LaTeX fragments to be parsed as TeX or added as raw TeX, - `nil` removes all LaTeX fragments from the document, and - `verbatim` treats LaTeX as text. The default is `t`. Closes: #4070
2020-06-20Recognize images with uppercase extensionsAlbert Krewinkel
Fixes: #6472