PDML Extensions User Manual

First Published


Latest Update



CC BY-ND 4.0


Christian Neumanns



PML Markup Code



This document describes optional PDML extensions.

An extension is a PDML feature that is not part of Core PDML.


Extensions are a work in progress. There might be breaking changes in future versions.

Additional experimental extensions are implemented already in the reference implementation, but not yet documented here.

If you encounter a bug or miss an important feature, then please submit an issue.

If you have a question or want to discuss an idea, enhancement or anything else, then please create a discussion.

Syntax Extensions



A comment consists of a text segment that is not part of the data/markup code stored in a PDML document.

Comments are typically used to add information that is useful for human readers, but ignored by machines that read the PDML document. For example, a comment can provide a general description of the data stored in a document.

Comments are also useful to temporarily disable a segment of a document without deleting it.

Here is an example of a PDML document with various comments:

    [- This configuration file is used to ... -]        <
    [power 220V]

    [- values are in degrees Celcius -]                 <
    [min_temp -20]
    [max_temp 50]

    [-                                                  <
        temporarily disabled to used default values     <
        [security_level 5]                              <
        [log_level debug]                               <

Syntax Rules

Start/End Symbols

A comment starts with [- and ends with -]:

[- text of comment -]
^^                 ^^
Start/End Positions

A comment can start at any position in a line. It can end at any subsequent position in the same line or at any position in a subsequent line.


text [- comment -] text

[- single line comment -]


text [- this text is
commented out -] text

Comments can be inserted anywhere in a PDML document, even before or after the root node:

[- comment -]

    [- comment -]
    text [- comment -] text
[- comment -]
Nested Comments

Comments can be nested. That is, a comment can contain another comment:

[- comment [- nested comment -] comment -]

Comments can be nested to any level:

[- level 1 (not nested)
    [- level 2
        [- level 3 -]

Character Escape Sequences

The Core PDML Specification specifies:

  • The following characters must always be escaped: [ ] \.

  • A backslash (\) is used as escape character.

For example, to insert the text [, ], and \ must be escaped, we need to write:

\[, \], and \\ must be escaped
^^  ^^      ^^

Extended PDML supports two more categories of character escape sequences:

  • Invisible whitespace characters:

    Escape SequenceISO NameUnicode
    \tCharacter Tabulation (HT, TAB)U+0009
    \nEnd of Line (EOL, LF, NL)U+000A
    \rCarriage Return> (CR)U+000D
  • Unicode escape sequences:

    Escape SequenceDescriptionExample
    \uhhhh16 bits Unicode character, hexadecimal encoding\u0041 (=A)
    \Uhhhhhhhh32 bits Unicode character, hexadecimal encoding\U0001F600 (=😀)

    A 16 bits Unicode escape sequence starts with \u, followed by four hexadecimal digits (denoted by hhhh in the above table).

    A hexadecimal digit can be any of the following characters: 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F. The regular expression for a hexadecimal digit is [0-9a-fA-F]

    A 32 bits Unicode escape sequence starts with \U, followed by eight hexadecimal digits.

The following table summarizes the character escape sequences supported in Core PDML and Extended PDML:

Escape Sequence


Extended PDML











Attributes are used to assign a list of name/value pairs to a node. Each value is an arbitrary string.

Here is an example:

[image [@ width="200" height="100"] images/ball.png]
          ^^^^^^^^^^^ ^^^^^^^^^^^^

The above image node has two attributes:

  • A first attribute with name width and value 200

  • A second attribute with name height and value 100

The content of the node is the text images/ball.png.

PDML attributes are conceptually similar to attributes in XML/HTML. The following code shows the same data expressed in PDML and XML:

PDML: [image [@ width="200" height="100"] images/ball.png]

XML:  <image width="200" height="100">images/ball.png</image>

When to Use Attributes?

PDML does not specify when to use attributes.

Attributes are typically used only to attach simple meta-data or additional information to nodes. Attribute values are mostly simple scalar values like strings, numbers, booleans, enumerateds, etc. This is similar to how attributes are used in HTML.

However, because attributes can hold strings of any length and any content, noting prevents you to store structured data (of any complexity) in attributes, if you have a good reason to do so.

Here is an example of a node that uses the PDML syntax in attribute dimensions, and a mini-DSL syntax in attribute border_color:

[image [@ dimensions="[width 200][height 100][units px]" border_color="rgb(0,0,255)"]]

In all cases, the PDML parser just parses a string value for each attribute. You can then explicitly convert this string into a value of any type required in your application. For example, you could store a JSON/PDML/XML document in an attribute. But then you need to parse the string value (returned by the PDML parser) into a typed value (e.g. an XML tree), using a dedicated parser in your application.

In a nutshell:

  • Attributes are typically used to store simple meta-data.

  • Nodes should preferably be used to store structured data.

  • For each attribute, the PDML parser just parses a string value.

  • In rare situations, attributes can be used to store structured data which must then be parsed into a typed value by the client application.

Syntax Rules


Attributes must appear after the node name, and before the node content.

[node_name [@ a1="v1"] node content]

A list of attributes starts with [@, and ends with ]:

[node_name [@ a1="v1"] node content]
           ^^        ^

An attribute name has the same rules as a node name. It starts with a letter or an underscore (_), optionally followed by any number of letters, digits, underscores (_), hyphens (-), or dots (.).

Here are some examples of valid attribute names:


Attribute names are case-sensitive. Hence the following names are different: color, COLOR, Color.

Each attribute in a single node must have a unique name. The same attribute name can appear in different nodes.

Assignment Symbol

Attribute names and values are separated by an equals sign (=):


An attribute value is enclosed in a pair of quotes:

      ^      ^

An attribute value can contain any Unicode characters, including spaces and new lines:

light green
blue 😀😀😀"


Extension nodes (covered later) are supported in attribute values.

Character Escape Rules in Values

Quote characters (") and backslashes (\) within a value must be escaped. For example, the value "1\2" would be assigned as follows:

      ^^ ^^ ^^

Square brackets ([]) don't need to be escaped, but for consistency with text nodes, they can be escaped if desired. For example, to assign the value [1], the following two assignments are valid:

index = "[1]"
index = "\[1\]"

If the parser supports Character Escape Sequences in text nodes, then the same escape sequences can also be used in attribute values:

a1 = "line 1\nline\t2\r\nline \u0033 \U0001F600"
            ^^    ^^ ^^^^     ^^^^^^ ^^^^^^^^^^

The above attribute contains the value:

line 1
line    2
line 3 😀
Whitespace Handling

While whitespace in text nodes is preserved by a PDML parser, it is ignored in attributes.

Whitespace characters (spaces, tabs, and/or new lines) before, between, or after attributes is ignored. Hence the following nodes are equivalent:

[image [@ source="resources/images/flower.png" width="200" height="100"]]

[image [@
    source = "resources/images/flower.png"
    width = "200"
    height = "100"

        source = "resources/images/flower.png"
         width = "200"
        height = "100"

The first whitespace character after the list of attributes is ignored. Additional whitespace is part of a text node. Hence the following node (which contains two spaces after the list of attributes) contains the text " foo" (foo with one leading space)

[name [@ a1="v1"]  foo]
Comments in Attributes

If the parser supports comments, then any number of comments is allowed before or after an attribute assignment. Example:

[image [@
    [- width in pixels -]
    width = "200"

    color = "0, 255, 0" [- RGB values -]

Comments are not allowed

  • Between the node name and the start of attributes ([@).

  • Before or after the assignment symbol (=).

Syntax Simplifications

In this chapter we are going to look at some optional syntax simplifications for attributes. These simplifications aim to shorten the syntax and make it more convenient for humans to write PDML documents.

However, the following simplifications can also lead to ambiguities and errors in rare edge cases. Hence, they should only be applied in projects where these edge cases (described below) cannot occur. When in doubt it is best to stick to the standard syntax described in the previous chapters.

Moreover, besides making the parser more complex, the following syntax variations also make third-party tools (e.g. editor plugins) more challenging to develop.

Lenient Parsing

To shorten the syntax for attributes, a parser can support the following lenient parsing rules:

  • Quotes around attribute values can be omitted if the value does not contain:

    • Whitespace (spaces, tabs, carriage returns, and line feeds)

    • The following characters: [ ] ( ) " '

    • Escape sequences like \n, \u0041, etc.

    Hence, instead of writing:

    size="200" file="foo\\bar.txt"
         ^   ^      ^            ^

    ... we can write shorter code:

    size=200 file=foo\bar.txt
  • If a node is specified to have only attributes (child nodes are not allowed) then the start/end symbols ([@ ]) around attributes can be omitted.

    Consider an image node that has attributes, but no child nodes. Instead of writing:

    [image [@ source="images/juicy apple.png" width="400"]]
           ^^                                       ^   ^^

    ... we can simply write:

    [image source="images/juicy apple.png" width=400]

    Omitting the start/end symbols can lead to subtle bugs and invalid documents if the specification for a node changes.

    Consider a node specified to have only attributes:

    [foo a1 = "v1"]

    Now suppose that the specification of the node changes later. The node can now also contain text.

    Then the meaning of the above code silently changes!

    The initial version was parsed as node foo with attribute a1 set to v1. But now the node is parsed as node foo with text content a1 = "v1". To keep the initial semantics, the code must be changed to:

    [foo [@ a1 = "v1"]]

    This is error prone!

    Hence, one should think twice before making the start/end symbols optional. It is best to avoid this syntax simplification if the specification for nodes risks to change in future versions.

Alternative Start/End Syntax

Instead of using the [@ ] symbols to embed attributes, a pair of parenthesis () can alternatively be used.

Using parenthesis makes the code a bit shorter and more visually appealing, as shown below:

[name [@ a1="v1" a2="v2" ] ]     // standard syntax
      ^^                 ^

[name ( a1="v1" a2="v2" ) ]      // alternative syntax
      ^                 ^

However, using parenthesis is error-prone if the text in a node starts with (.

Suppose a node contains the text (organic=better), but no attributes. Then the following code is ambiguous:

[i (organic=better)]

If the parser does not support attributes (i.e. a parser that supports only Core PDML), then a node with text (organic=better) is parsed. However, if attributes are supported, then a node with attribute organic is parsed.

There are two workarounds.

  • To make it clear that there are no attributes, we can write:

    [i () (organic=better)]
  • If the parser supports an additional \) escape sequence, then we can escape the ( to make it clear that the ( is not the start of attributes:

    [i \(organic=better)]

However, the above workarounds are error-prone if the document is accidentally parsed with a parser that doesn't support attributes (e.g. a parser that supports only Core PDML).

In the first case the parser reads a node with text () (organic=better) (instead of (organic=better)).

In the second case the parser generates an error because \( is not supported. That's better, because we get aware of the problem, instead of silently getting wrong data.

Another rare edge case can arise if the specification of a node changes.

For example, suppose a node specified to contain only attributes (no child nodes) changes to a node that contains only child nodes (no attributes). Then the following code silently changes from a node with attribute a1 to a node with text (a1="v1"):

[foo (a1="v1")]

All above edge cases don't exist when the standard syntax is used:

[i [@ organic=better]]

Moreover, if a document using attributes is accidentally read by a parser without support for attributes then an error is generated, because [@ is parsed as the start of a node with an invalid name. Hence there is no risk of silently parsing wrong data.

A PDML parser implementation that supports attributes should clearly specify which syntax it supports: only the standard syntax, only the alternative syntax, or both.


By default the parser in the full PDML reference implementation only supports the standard syntax. But there are two configuration flags to explicitly activate/deactivate the standard and alternative syntaxes.

Lenient parsing is also switched off by default. There are configuration flags to explicitly enable lenient parsing.

Extension Nodes

Preliminary Example

To get the gist of extension nodes, let's have a look at the following example of an extension node:

[u:ins_file path=chapter.txt]

This node looks like a normal data node with name ins_file, namespace u, and an attribute path with value chapter.txt. However, the difference between a normal data node and an extension node is that an extension node represents an instruction to do something.

In the above example, PDML is instructed to insert the content of file chapter.txt into the PDML document.


While PDML data nodes simply represent data, extension nodes add behavior to a PDML document. You can also think of an extension node as an action node that does something.

The behavior of an extension node depends on its name, and is implemented in PDML. An extension node can have attributes to customize it's behavior.

PDML specifies a set of standard extension nodes with well-defined names, attributes and behavior. Other extension nodes can programmatically be added to implement customized, domain-dependant actions.

To avoid name clashes with data nodes, extension nodes are all associated with predefined namespaces.

There are three categories of extension nodes, and each category is associated with a different namespace:

  • Utility Nodes

    These extension nodes provide various features and utilities.

  • Type Nodes

    Extension nodes in this category define data types (string, number, boolean, etc). They are used to parse, validate and transform data nodes.

  • Script Nodes

    Script nodes contain user-defined source code that is executed when the PDML document is parsed.

Extension nodes are supported in nodes, as well as in quoted and unquoted attribute values.


If you are a programmer, you can conceptually think of an extension node as a function.

The node's name is the name of the function.

The node's attributes and its content are the function's input arguments.

The function's body is implemented in PDML (or in a plugin in case of a customized extension node).

The function returns a result and/or produces a side effect.

Now that we have covered the therory, let's get practical and see which extension nodes exist and what they do.

Utility Nodes

Utility nodes provide various practical features and utilities.

They are all associated with the predefined namespace prefix u (utility).

For example:

  • A u:set node assigns a text to a named parameter, and subsequent u:get nodes can be used to insert the same text multiple times into the PDML document. This supports the important Don't repeat yourself (DRY) principle.

  • A u:ins_file node reads a text file and inserts the text into the PDML document.

The full list of utility nodes is documented in chapter Utility Nodes of the reference manual.

Type Nodes

Type nodes denote data types.

They are all associated with the predefined namespace prefix t (type).

Type nodes are used to parse, validate, and optionally transform data nodes.


If you are a programmer you can conceptually think of PDML types as types in a programming language (string, number, boolean, etc.).

To understand why types are useful, consider a PDML document containing data about employees. Each employee node contains child-node birthdate:

    [birthdate 1999-12-31]

Without types, the PDML parser has no way of checking and reporting invalid birthdate nodes like this one: [birthdate kdjhfgkjdf]. The burden to validate birthdates is left to the application that consumes the parser's output.

A PDML type enables the parser to validate data nodes. To accomplish this, a type must be assigned to a data node.

In our example we need to assign type date to node birthdate. There are different ways to do this:

  • Inline type

    In this case, a data node contains a type node that contains the data:

    [birthdate [t:date 1999-12-31]]
               ^^^^^^^           ^

    Now the parser ensures that the content of t:date is a valid date. It will generate an error if the node's content is invalid.

    The application reading the parser's output will only see a node birthdate with text content 1999-12-31, as if the document simply contained:

    [birthdate 1999-12-31]

    The advantage is that the application doesn't need to check the content of birthdate anymore. birthdate is guaranteed to contain a valid date.


    The application still gets a string that might need to be parsed into the correct object type of the application's programming language.

    For example, if the application is written in Java, the parser will provide a String object representing a valid date, which the application converts into a java.time.LocalDate object.

    Note that inline types are used rarely, because each individual node needs to have a type annotation.

  • Configuration by node name

    A more practical approach is to assign a specific type to all nodes with a specific name (e.g. all nodes with name N are of type T).

    This can be done programmatically by configuring the parser.

    In our employees example, the parser would be told that all nodes with name birthdate are of type date. Therefore the document doesn't need inline type annotations anymore. A birthdate node looks like this:

    [birthdate 1999-12-31]

    ... but the parser checks that all birthdate nodes contain valid dates.

  • Schema


    PDML schemas are not specified and not implemented yet.

    A schema is itself a PDML document that assigns types to data nodes. It can also define new domain-specific types, based on standard PDML types. A schema can be:

    • embedded: Schema and data nodes are stored in the same document. The schema must be defined at the beginning of the document.

    • external: Schema and data are stored in different documents. The data document contains a reference to the schema's location.

The full list of PDML types is documented in chapter Types of the reference manual.


PDML types are a work-in-progress. A future PDML version might have support for adding domain-specific types programmatically or by sharable configuration data.

Script Nodes


A script node enables you to embed executable source code in a PDML document. The source code is executed when the document is parsed. This allows you for example to:

  • generate or update parts of the document programmatically

  • conditionally include/exclude text

  • retrieve text and data to be included in the document from external resources like files, URLs, databases, webservices, etc.

  • run external programs or OS scripts to get real-time data, generate media pointed to in the document, and much much more.


If you are a programmer, you can conceptually think of script nodes as a very powerful preprocessor, because you can use the full features of a programming language.

For example you can use self-defined or imported functions, external libraries, or call external programs (written in any programming language) or OS scripts to achieve whatever you need in your specific context.

Script nodes are all associated with the predefined namespace prefix s (scripting). You can also think of the s as an abbreviation for source code.

PDML currently supports Javascript as scripting language. Support for other scripting languages might be added in the future.


There are three kinds of script nodes:

  • s:exp: evaluate an expression and insert its result into the PDML document

  • s:script: run a set of instructions to insert text, retrieve and transform text from external resources, create images files, or do anything else that can be achieved by executing a script

  • s:def: define constants, variables, and functions to be used in s:exp and s:script nodes

A typical PDML document would first have one or more s:def nodes to define shared code (constants and functions), and then some s:exp and/or s:script nodes to do whatever needs to be automated.


Expression Node

Here is an example of an expression node used in a document:

1 + 1 = [s:exp 1 + 1]

This snippet results in:

1 + 1 = 2

Ok, that's not very spectacular! However, as we'll see later, the power of expressions quickly becomes obvious if we consider that any valid Javascript expression can be used, including complex expressions that compose internal and/or external functions defined somewhere else.

Now let's look at how this works.

First, the [] pair tells us that we are using a PDML node:

[s:exp 1 + 1]
^           ^

The namespace s states that we are using an extension node in the scripting category:

[s:exp 1 + 1]

The node's name is exp, which is an abbreviation for expression:

[s:exp 1 + 1]

Finally we can see that the node's content is the text 1 + 1:

[s:exp 1 + 1]

As soon as the parser sees the s:, it passes control to PDML's extension node handler. This handler checks the node's name and namespace, and passes control to a dedicated handler for expressions. The expression handler reads the node's text content, evaluates it, converts it to a string, and then inserts the result (2 in our case) into the PDML document. The final result will be that the code:

[s:exp 1 + 1]

... has been replaced with:


Hence, the application that reads the PDML document will see the following code:

1 + 1 = [s:exp 1 + 1]

... like this:

1 + 1 = 2


In the world of pre-processors using macros we would say that [s:exp 1 + 1] expands to 2.

Real-world examples demonstrating the power of expressions can be found in Scripting Examples.

Script Node

A s:script node contains one or more Javascript statements. Here is an example:

[s:script doc.insert ( "Hello" );]

This code insert the text Hello in the document, so that the result of parsing the above code will be:


Yes, we could as well just have written Hello in the document. So let's look at a more compelling example:

    const licenseFile = "resources/license.txt"
    if ( fileUtils.exists ( licenseFile ) ) {
        doc.insert ( fileUtils.readText ( licenseFile ) );
    } else {
        stderr.writeLine ( "WARNING: No license file found!" );

This code checks if file resources/license.txt exists. If it exists, its content is written into the PDML document. If it doesn't exist, a warning is written to the operating system's standard error device stderr (e.g. the terminal).

Note the ~~~ delimiters that embed the script. We'll soon see how this works.

Note for programmers: You can think of a script node as the body of a function that has no input arguments and doesn't return a value. Hence, a script is executed for its side effects (such as inserting code into the document).

Definition Node

A definition node has the qualified name s:def. It is used to define constants, variables, and functions that will later be used in s:exp or s:script nodes.

s:def nodes must be declared before using them in s:exp or s:script nodes.

A single document can have any number of s:def nodes.

Here is a simple example of an s:def node that defines the constant PI, as well as functions to compute the circumference and area of a circle:

    const PI = 3.1415926;

    function computeCircumference ( radius ) {
        return 2 * PI * radius;

    function computeArea ( radius ) {
        return PI * radius * radius;

Suppose the above code is in a PML document, and later in the document we write:

Consider a circle of radius 10 cm.

It's [i circumference] is [s:exp computeCircumference ( 10 );] cm.

It's [i area] is [s:exp computeArea ( 10 );] cm[sup 2].

This code would expand to:

Consider a circle of radius 10 cm.

It's [i circumference] is 62.831852 cm.

It's [i area] is 314.15926 cm[sup 2].

... and, after converting to HTML, be displayed as:

Consider a circle of radius 10 cm.

It's circumference is 62.831852 cm.

It's area is 314.15926 cm2.

Instead of embedding definition nodes in a document (as shown above) you can also import definition nodes from external resources. This is useful if you need the same set of functions in different documents, or if you want to share them with other users, for example via Github or Gitlab.

Definitions can be imported with an u:ins_file node, or any other method that inserts text into a document.

For example, you can store the above definition node in file circle_library.def (the file name and extension can be chosen freely). The file looks like this:

File circle_library.def
    const PI = 3.1415926;

    function computeCircumference ( radius ) {
        return 2 * PI * radius;

    function computeArea ( radius ) {
        return PI * radius * radius;

Here is an example of a fully functioning PML file that uses a u:ins_file node to import the definitions:

File circle_demo.pml
[doc [title Circle Demo]

    [u:ins_file path=circle_library.def]

    Suppose a circle of radius 10 cm.
    It's [i circumference] is [s:exp computeCircumference ( 10 );] cm.
    It's [i area] is [s:exp computeArea ( 10 );] cm[sup 2].

If PML is installed on your computer you can convert the above PML file to HTML by executing the following command in a terminal:

pmlc circle_demo.pml

This command creates file output/circle_demo.html which is displayed as follows in a browser:


All script nodes (s:exp, s:script, and s:def) are of type raw_text. This means that an expression node with content list[1] could be written in three ways:

  • [s:exp list\[1\]]
  • [s:exp
  • [s:exp

Note that the [ and ] characters must be escaped in the first version, but not in the other two.

For more information about the syntax rules, please refer to raw_text

Javascript Support

The power of script nodes largely depends on the set of ready-to-use functions provided by existing libraries. Moreover, we must be able to define our own customized functions, and it should be easy to share them with other users who need the same functionality.

In chapter Definition Node we saw already how to define functions in a PDML document or import them from external resources.

In this chapter we'll see how to use other existing functions/libraries.

Standard Libraries

Standard libraries contain functions that are implicitly available.

Javascript Functions

PDML currently uses a Javascript implementation that is fully compatible with the ECMAScript 2021 specification. All objects and functions defined in the ECMAScript specification can therefore be used in PDML.

For example you can use functions substring, replaceall, and many more when working with strings.

PDML Functions

PDML provides a set of global objects available in all PDML script nodes. Each object contains a set of functions and/or constants, logically grouped into categories by the object's name. The goal of this library is to provide additional functionality not available in 'standard' Javascript, but commonly required in PDML script nodes, such as working with files or interacting with the operating system. The API is designed to simplify common PDML scripting tasks as far as possible. For example:

  • Object fileUtils contains functions to work with files, such as function readText and writeText to read from or write to a text file.

  • Object OSCommand contains functions to execute OS commands. Command line arguments can be provided, and the data written to the OS's standard output device (stdout) can be retrieved into a string variable or constant.

The API is documented in chapter Scripting API of the PDML Extensions Reference Manual.

External Libraries

External libraries need to be explicitly loaded before their functions can be used.

ECMAScript Modules

Modules as defined by ECMAScript 6 and later are supported.

PDML Definitions

As shown in chapter Definition Node, s:def nodes can be imported from external resources.

Node.js Modules

While ECMAScript modules are supported, CommonJS modules can currently not be imported with function require(...).

Native support for Node.js modules might be added in a future version.

However, non-native CommonJS modules can be bundled into self-contained Javascript source code files, and then be used in PDML.

Moreover, if Node.js is installed, all NPM modules (including native ones like fs, http, etc.) can be used by executing Node.js like any other external program with functions available in OSCommand.

Error handling

Errors in script nodes are detected and reported when the PDML document is parsed, and the code is executed.


Examples of script nodes can be found in PDML Scripting Examples.

Customized Extension Nodes

A PDML parser can provide a plugin mechanism that enables applications to programmatically add customized extension nodes, including type nodes.

Instructions for implementing such a plugin are out of scope of this document, because they largely depend on the programming language used to create the PDML parser, as well as other factors.