Basic PDML Specification
The Practical Data and Markup Language (PDML) is a text format to store data.
A distinction is made between Basic PDML and PDML Extensions. Basic PDML is the minimum needed to store data. Extensions are optional features to make PDML more practical.
This document is the official specification for Basic PDML.
A PDML document is a tree of nodes.
The syntax for a node is defined as follows (in EBNF):
"[" name ( separator ? child_node + ) ? "]"
A node is enclosed by a pair of square brackets:
[...]. A node starts with
[ and ends with
Each document has exactly one root node.
Each node has a name.
A node name must match the regex
[a-zA-Z_][a-zA-Z0-9_\.-]*. This means that a name starts with a letter or an underscore (
_), optionally followed by any number of letters, digits, underscores (
_), hyphens (
-), or dots (
Here are some examples of valid node names:
color Index_12 _ID_12.5-a
A node name does not need to be unique. Different nodes in a tree can have the same name.
The separator separates the node's name from its content.
The separator is a single whitespace character. The following whitespace characters are allowed:
|Unix new line||"\n"||U+000A|
|Windows new line||"\r\n"||U+000D U+000A|
The separator is required if the first child node is text. Example:
The separator is optional if the first child node is a node. Hence this code:
[b [i huge]]
... can also be written as:
A node can optionally have any number of child nodes.
A child node can be text (a sequence of Unicode characters) or another node (with optional child nodes too).
Node with one text child:
[color light green]
The node's name is
color. The node's single child node is the text
Node with child node:
[config [color light green]]
confighas one child node. The child node's name is
color, its text is
Tree of nodes:
[config [color light green] [size [width 200] [height 100] ] ]
Node containing a mixture of text and nodes (markup code):
[p We can write words in [i italic], [b bold], or [b[i bold and italic]].]
If a node has no child nodes, it is called an empty node.
As seen already,
] are used as node delimiters. Therefore these two characters must be escaped when they are used in text nodes.
A backslash (
\) is used as escape character (as in C-like programming languages). Therefore the backslash must itself be escaped too.
The final rule is simple: Characters
\ must be preceded by
\ when they are used in text nodes, as shown in the following table:
foo contains the text:
Characters [, ], and \ must be escaped.
This would be written as:
[foo Characters \[, \], and \\ must be escaped.]
The following whitespace characters before of after the root node are ignored:
Other characters before or after the root node are illegal.
Within a PDML document, there are no whitespace handling rules defined in Basic PDML. Whitespace is preserved when a PDML document is parsed.
Consider the following PDML snippet:
[a foo [b] 2 [c] [d] ]
In this example, node
a contains 7 child nodes:
Applications reading PDML documents (or customized PDML parsers) are free to implement any appropriate whitespace handling rules, such as:
skip whitespace nodes
trim leading and/or trailing whitespace in text nodes
replace whitespace sequences with a single space (similar to HTML)
New lines are defined differently in Unix/Linux and Windows. Unix uses a single line feed (
"\n"). Windows uses a carriage return, followed by a line feed (
The following rules are applied in PDML:
When a PDML document is read, Unix and Windows new lines are both supported, whether the application runs on Unix or Windows, even if a single document uses a mixture of Unix/Windows new lines.
For example, a parser reads
"\r\n"as a single new line.
When a PDML document is written, the operating system's canonical new line is used.
For example, a writer running on Unix writes
"\n". On Windows it writes
PDML documents are encoded in UTF-8.
The grammar is defined in separate documents, in two variations:
This document is the only official specification for Basic PDML.
The EBNF grammar and the railroad diagrams are just auxiliary assets to help readers better contextualize the specification.
More examples of PDML code can be found in PDML Examples.
This specification is licensed under CC BY-ND 4.0.
Permission is granted to create verbatim translations of this specification into other human languages.
This specification uses Semantic Versioning.
PDML's website is https://pdml-lang.dev/.
This document is written in PML and uses the PDML syntax.
The markup code is available on Github.
Pull requests are welcome.