diff options
Diffstat (limited to 'wasm/examples/markdown-to-rst/stdin')
| -rw-r--r-- | wasm/examples/markdown-to-rst/stdin | 250 |
1 files changed, 250 insertions, 0 deletions
diff --git a/wasm/examples/markdown-to-rst/stdin b/wasm/examples/markdown-to-rst/stdin new file mode 100644 index 000000000..7c7cd7c8a --- /dev/null +++ b/wasm/examples/markdown-to-rst/stdin @@ -0,0 +1,250 @@ +--- +author: +- Albert Krewinkel +- John MacFarlane +date: 'January 10, 2020' +title: Pandoc Lua Filters +--- + +# Introduction + +Pandoc has long supported filters, which allow the pandoc +abstract syntax tree (AST) to be manipulated between the parsing +and the writing phase. [Traditional pandoc +filters](https://pandoc.org/filters.html) accept a JSON +representation of the pandoc AST and produce an altered JSON +representation of the AST. They may be written in any +programming language, and invoked from pandoc using the +`--filter` option. + +Although traditional filters are very flexible, they have a +couple of disadvantages. First, there is some overhead in +writing JSON to stdout and reading it from stdin (twice, once on +each side of the filter). Second, whether a filter will work +will depend on details of the user's environment. A filter may +require an interpreter for a certain programming language to be +available, as well as a library for manipulating the pandoc AST +in JSON form. One cannot simply provide a filter that can be +used by anyone who has a certain version of the pandoc +executable. + +Starting with version 2.0, pandoc makes it possible to write +filters in Lua without any external dependencies at all. A Lua +interpreter (version 5.3) and a Lua library for creating pandoc +filters is built into the pandoc executable. Pandoc data types +are marshaled to Lua directly, avoiding the overhead of writing +JSON to stdout and reading it from stdin. + +Here is an example of a Lua filter that converts strong emphasis +to small caps: + +``` lua +return { + { + Strong = function (elem) + return pandoc.SmallCaps(elem.c) + end, + } +} +``` + +or equivalently, + +``` lua +function Strong(elem) + return pandoc.SmallCaps(elem.c) +end +``` + +This says: walk the AST, and when you find a Strong element, +replace it with a SmallCaps element with the same content. + +To run it, save it in a file, say `smallcaps.lua`, and invoke +pandoc with `--lua-filter=smallcaps.lua`. + +Here's a quick performance comparison, converting the pandoc +manual (MANUAL.txt) to HTML, with versions of the same JSON +filter written in compiled Haskell (`smallcaps`) and interpreted +Python (`smallcaps.py`): + + Command Time + --------------------------------------- ------- + `pandoc` 1.01s + `pandoc --filter ./smallcaps` 1.36s + `pandoc --filter ./smallcaps.py` 1.40s + `pandoc --lua-filter ./smallcaps.lua` 1.03s + +As you can see, the Lua filter avoids the substantial overhead +associated with marshaling to and from JSON over a pipe. + +# Lua filter structure + +Lua filters are tables with element names as keys and values +consisting of functions acting on those elements. + +Filters are expected to be put into separate files and are +passed via the `--lua-filter` command-line argument. For +example, if a filter is defined in a file `current-date.lua`, +then it would be applied like this: + + pandoc --lua-filter=current-date.lua -f markdown MANUAL.txt + +The `--lua-filter` option may be supplied multiple times. Pandoc +applies all filters (including JSON filters specified via +`--filter` and Lua filters specified via `--lua-filter`) in the +order they appear on the command line. + +Pandoc expects each Lua file to return a list of filters. The +filters in that list are called sequentially, each on the result +of the previous filter. If there is no value returned by the +filter script, then pandoc will try to generate a single filter +by collecting all top-level functions whose names correspond to +those of pandoc elements (e.g., `Str`, `Para`, `Meta`, or +`Pandoc`). (That is why the two examples above are equivalent.) + +For each filter, the document is traversed and each element +subjected to the filter. Elements for which the filter contains +an entry (i.e. a function of the same name) are passed to Lua +element filtering function. In other words, filter entries will +be called for each corresponding element in the document, +getting the respective element as input. + +The return value of a filter function must be one of the +following: + +- nil: this means that the object should remain unchanged. +- a pandoc object: this must be of the same type as the input + and will replace the original object. +- a list of pandoc objects: these will replace the original + object; the list is merged with the neighbors of the + original objects (spliced into the list the original object + belongs to); returning an empty list deletes the object. + +The function's output must result in an element of the same type +as the input. This means a filter function acting on an inline +element must return either nil, an inline, or a list of inlines, +and a function filtering a block element must return one of nil, +a block, or a list of block elements. Pandoc will throw an error +if this condition is violated. + +If there is no function matching the element's node type, then +the filtering system will look for a more general fallback +function. Two fallback functions are supported, `Inline` and +`Block`. Each matches elements of the respective type. + +Elements without matching functions are left untouched. + +See [module documentation](#module-pandoc) for a list of pandoc +elements. + +## Filters on element sequences + +For some filtering tasks, it is necessary to know the order +in which elements occur in the document. It is not enough then to +inspect a single element at a time. + +There are two special function names, which can be used to define +filters on lists of blocks or lists of inlines. + +[`Inlines (inlines)`]{#inlines-filter} +: If present in a filter, this function will be called on all + lists of inline elements, like the content of a [Para] + (paragraph) block, or the description of an [Image]. The + `inlines` argument passed to the function will be a [List] of + [Inline] elements for each call. + +[`Blocks (blocks)`]{#blocks-filter} +: If present in a filter, this function will be called on all + lists of block elements, like the content of a [MetaBlocks] + meta element block, on each item of a list, and the main + content of the [Pandoc] document. The `blocks` argument + passed to the function will be a [List] of [Block] elements + for each call. + +These filter functions are special in that the result must either +be nil, in which case the list is left unchanged, or must be a +list of the correct type, i.e., the same type as the input +argument. Single elements are **not** allowed as return values, +as a single element in this context usually hints at a bug. + +See ["Remove spaces before normal citations"][Inlines filter +example] for an example. + +This functionality has been added in pandoc 2.9.2. + +[Inlines filter example]: #remove-spaces-before-citations + +## Traversal order + +The traversal order of filters can be selected by setting the key +`traverse` to either `'topdown'` or `'typewise'`; the default is +`'typewise'`. + +Example: + +``` lua +local filter = { + traverse = 'topdown', + -- ... filter functions ... +} +return {filter} +``` + +Support for this was added in pandoc 2.17; previous versions +ignore the `traverse` setting. + +### Typewise traversal + +Element filter functions within a filter set are called in a +fixed order, skipping any which are not present: + + 1. functions for [*Inline* elements](#type-inline), + 2. the [`Inlines`](#inlines-filter) filter function, + 2. functions for [*Block* elements](#type-block) , + 2. the [`Blocks`](#inlines-filter) filter function, + 3. the [`Meta`](#type-meta) filter function, and last + 4. the [`Pandoc`](#type-pandoc) filter function. + +It is still possible to force a different order by explicitly +returning multiple filter sets. For example, if the filter for +*Meta* is to be run before that for *Str*, one can write + +``` lua +-- ... filter definitions ... + +return { + { Meta = Meta }, -- (1) + { Str = Str } -- (2) +} +``` + +Filter sets are applied in the order in which they are returned. +All functions in set (1) are thus run before those in (2), +causing the filter function for *Meta* to be run before the +filtering of *Str* elements is started. + +### Topdown traversal + +It is sometimes more natural to traverse the document tree +depth-first from the root towards the leaves, and all in a single +run. + +For example, a block list `[Plain [Str "a"], Para [Str +"b"]]`{.haskell} will try the following filter functions, in +order: `Blocks`, `Plain`, `Inlines`, `Str`, `Para`, `Inlines`, +`Str`. + +Topdown traversals can be cut short by returning `false` as a +second value from the filter function. No child-element of +the returned element is processed in that case. + +For example, to exclude the contents of a footnote from being +processed, one might write + +``` lua +traverse = 'topdown' +function Note (n) + return n, false +end +``` + |
