Core PDML Examples
Latest update |
2025-03-25 |
First published |
2021-11-16 |
Author |
Christian Neumanns |
Introduction
This document shows examples of encoding commonly used data structures and markup using only Core PDML. PDML extensions (optional features, not part of Core PDML) will not be used in the examples below, but mentioned occasionally to explain when and why they are useful.
The aim is to show examples without digging into details — this is not a comprehensive Core PDML tutorial.
Note
You can consult the exact rules governing Core PDML (and get some practical tips) in the Core PDML Specification.
Readers are expected to have a basic knowledge of fundamental data structures.
Data
You can use Core PDML to encode scalar values, records, lists, maps, and other common data structures, as demonstrated in the following sections.
Strings
You can encode the string This is some arbitrary string
as follows:
[string This is some arbitrary string]
Here we define a node with tag string
, containing the string This is some arbitrary string
. The node starts with [
(opening delimiter) and ends with ]
(closing delimiter). The tag and content of the node are separated by a space.
In JSON the code would look like this:
{"string": "This is some arbitrary string"}
You can use new lines and any other Unicode code points in strings, except control characters listed in Invalid Characters. For example the text:
He said:
"She said: 'All is well.'"
😀
... is encoded as follows:
[text He said:
"She said: 'All is well.'"
😀]
JSON version:
{"text": "He said:\n\"She said: 'All is well.'\"\n😀"}
Note
If a JSON document is encoded in UTF-16 then you can't directly insert Unicode code points above U+FFFF
. Instead, you must use UTF-16 surrogate pairs. Hence, the smiley in the above example (U+1F600
) would need to be encoded as \uD83D\uDE00
.
PDML documents must be encoded in UTF-8. Unicode surrogate pairs are therefore never used in PDML — they are a mechanism specific to UTF-16 encoding.
Some characters must be escaped, and some can be escaped (see Character Escape Sequences). For example, the rule Characters [, ], ^, and \ must be escaped.
is encoded as follows:
[rule Characters \[, \], \^, and \\ must be escaped.]
Scalar Values
You can encode any scalar value (number, boolean, enum, date, etc.) by inserting it's string representation in the PDML document:
[scalar_values
[string This is some arbitrary string]
[number 123.45]
[boolean true]
[quality_enum excellent]
[date 2025-01-14]
[URL http://www.pdml-lang.dev]
[file_path path/to/file.txt]
]
JSON version:
{
scalar_values: {
"string": "This is some arbitrary string",
"number": 123.45,
"boolean": true,
"quality_enum": "excellent",
"date": "2025-01-14",
"URL": "http://www.pdml-lang.dev",
"file_path": "path/to/file.txt"
}
}
Note
Unlike JSON and like XML, Core PDML does not specify a special syntax for numbers and booleans. That's a deliberate choice for reasons not explained here. All scalar values are encoded as plain text, using their string representation.
Support for converting scalar values encoded as strings into native types (e.g. int32
) may be provided in a PDML implementation. For example, the PDML reference implementation provides convenience methods for converting PDML AST nodes into commonly used native types.
Absence of Value
To encode the absence of value (aka null
, nil
, void
, nothing
, etc. in programming languages) you use a tagged leaf node (i.e. a tagged node without child nodes):
[remark]
JSON version:
{"remark": null}
Records
The following example shows a record tagged config
with fields color
, width
, height
, and remark
:
[config
[color green]
[width 162]
[height 100]
[remark]
]
Records can be nested to any level. In other words, a record field can itself be a record (or any other type of data). For example, you can group fields width
and height
into a nested record tagged dimensions
:
[config
[color green]
[dimensions
[width 162]
[height 100]
]
[remark]
]
The indentation in the above example (as well as in all other examples shown in this document) is optional. You could also encode config
using the so-called compact form:
[config [color green][dimensions [width 162][height 100]][remark]]
For more information about indentation and other forms of insignificant whitespace you may read How to Handle Whitespace.
Collections
Lists
You can encode a simple list of names as follows:
[names
[name Tim]
[name Tom]
[name Tam]
]
Instead of tagging each element with name
, you can also use tag _
, which, by convention, is used whenever the tag is irrelevant (see Anonymous Nodes):
[names
[_ Tim]
[_ Tom]
[_ Tam]
]
Note
If each element in a list is a scalar value then you can also encode the list as a text node with elements separated by commas:
[names Tim, Tom, Tam]
However, if the elements themselves contain spaces, commas, and/or other special characters, then it might be required to use String Literals (a PDML extension not covered here), e.g.:
[friends "Tim and Tom", "Tam, Tum"]
The idiomatic way to encode lists (using only Core PDML) is to use a separate node for each element, as shown in this section.
You can also encode heterogenous lists (i.e. lists with elements of different types):
[list
[string Lorem ipsum ...]
[record
[field_1 value_1]
[field_2 value_2]
]
[nested_list
[_ Tim]
[_ Tom]
[_ Tam]
]
[remark]
]
To encode an empty list (i.e. a list that doesn't contain elements) you use a tagged leaf node:
[list]
Maps
Let's start with a simple, homogenous map (aka dictionary, associative array) that maps some digits to words:
[map
[1 one]
[2 two]
[3 three]
]
Map values can be of any type. To provide the words in English, Italian, and Thai, we can use values that are maps:
[map
[1
[words
[en one]
[it uno]
[th หนึ่ง]
]
]
[2
[words
[en two]
[it due]
[th สอง]
]
]
[3
[words
[en three]
[it tre]
[th สาม]
]
]
]
In the above code we are using indents (i.e. insignificant whitespace) copiously to increase readability. However, you are free to use insignificant whitespace in whatever way best suits your needs. For example, you can write the above code in a more succinct way:
[map
[1 [words [en one][it uno][th หนึ่ง]]]
[2 [words [en two][it due][th สอง]]]
[3 [words [en three][it tre][th สาม]]]
]
Note
A PDML parser does not generate an error or warning if a map contains duplicate keys (see Duplicate Tags for more information). Thus, no error/warning is generated when the following code is parsed — the error is (typically) reported later when the parsed AST is converted into a native map.
[map
[1 one]
[1 one]
[2 two]
[3 three]
]
Besides values, keys can also be of any type. However, if the keys aren't scalar values (they are records, lists, etc.) then the structure becomes a bit more complex, as you'll see in the next example.
Moreover, a single map can contain keys of different types, as well as values of different types. Thus, maps can be heterogeneous.
Below is a contrived example of a complex map with four entries:
-
entry 1: a digit mapped to a word (as in the first example of this section)
-
entry 2: a record mapped to a list
-
entry 3: a list mapped to a map
-
entry 4: null mapped to null
[crazy_map
[entry
[key 1]
[value one]
]
[entry
[key
[record
[field_1 value_1]
[field_2 value_2]
]
]
[value
[names
[_ Tim]
[_ Tom]
[_ Tam]
]
]
]
[entry
[key
[names
[_ Tim]
[_ Tom]
[_ Tam]
]
]
[value
[map
[1 one]
[2 two]
[3 three]
]
]
]
[entry
[key]
[value]
]
]
Note
The above code shows the recommended way to encode maps containing non-scalar keys: each entry is represented by a node tagged entry
and containing child-nodes key
and value
. However, nothing prevents you from structuring your map differently if you wish.
An empty map is encoded like an empty list:
[map]
Tables
A table is a collection whose elements are records of the same type.
For example, a table containing three products would look like this:
[products
[product
[id 1]
[name Keyboard]
[price 50.00]
]
[product
[id 2]
[name Mouse]
[price 25.00]
]
[product
[id 3]
[name Monitor]
[price 300.00]
]
]
Databases
A database is a set of tables.
Here's a database containing tables customers
, suppliers
, and products
:
[mini_ERP_database
[customers
[customer ...]
[customer ...]
[customer ...]
]
[suppliers
[supplier ...]
[supplier ...]
]
[products
[product ...]
[product ...]
[product ...]
[product ...]
]
]
Note
PDML is not a replacement for "real" databases like MySQL, SQL Server, and PostgreSQL.
Trees
Consider the following arithmetic expression:
(1 + 2) * (3 + 4 + 5)
We can represent this expression using an abstract syntax tree (AST):

You can encode this tree as follows:
[expression
[op *
[op +
[num 1]
[num 2]
]
[op +
[num 3]
[num 4]
[num 5]
]
]
]
Here's the compact, less readable version:
[expression [op *[op +[num 1][num 2]][op +[num 3][num 4][num 5]]]]
Markup
Suppose we want to render the following wonderful quote by Albert Einstein:
"Everything should be as simple as possible, but not simpler."
In HTML we would write:
<p>"Everything should be <b>as simple as possible</b>, but <b><i>not simpler</i></b>."</p>
In PDML the code looks like this:
[p "Everything should be [b as simple as possible], but [b [i not simpler]]."]
Note
The above example demonstrates the syntax actually used in the Practical Programming Language (PML), a lightweight markup language that uses PDML under the hood.
For a more complete markup example you may have a look at the PML source code of this document. However, be aware that some PDML extensions as well as PML-specific features are used in that code.
Data and Markup
Data and markup can be mixed in a single PDML document.
The following code shows a record tagged product
. Look at field description
. It contains PML markup embedded in a PDML data document:
[product
[id 123]
[name PML]
[category software]
[description
PML is a powerful markup language using
[link (url="https://pdml-lang.dev/") PDML] under the hood.
For example it allows you to:
[list
[el Render text in [i italic], [b bold], [strike strikethrough], and other styles.]
[el Insert [i lists], [i tables], [i quotes], [i footnotes], etc.]
[el Embed [i images], [i audios], and [i videos].]
]
[note For more information visit the [link (url="https://pml-lang.dev") PML website].]
]
[price free for everyone]
[remark]
]
The markup in field description
would be rendered as follows:
PML is a powerful markup language using PDML under the hood. For example it allows you to
-
Render text in italic, bold,
strikethrough, and other styles. -
Insert lists, tables, quotes, footnotes, etc.
-
Embed images, audios, and videos.
Note
For more information visit the PML website.
Now let's look at a more complex example illustrating API documentation that contains a mixture of scalar values, lists, records, and markup:
[API_doc
[functions
[function
[name foo]
[description
PML markup ...
]
[input
[parameter
[name p1]
[type string]
[description
PML markup ...
]
]
[parameter
[name p2]
[type integer]
[description
PML markup ...
]
]
]
[output
[type integer]
[description
PML markup ...
]
]
]
[function
[name bar]
...
]
]
]
Points of interest:
-
Node
functions
contains a list offunction
elements. Nodeinput
contains a list ofparameter
elements. -
Node
function
contains record data. Its (indirect) child-nodesparameter
and its (direct) child-nodeoutput
are records too. -
All nodes tagged
description
contain PML markup embedded in the PDML document.
Embedding Foreign Data/Markup Encodings
Since a PDML node can contain any text (except Unicode control characters listed in Invalid Characters), you can embed data and markup encoded in foreign formats such as XML, JSON, and HTML:
[foreign_formats_examples
[XML <rating>5/5</rating>]
[JSON { "rating": "5/5" }]
[HTML <p>Have a <i>great</i> day</p>]
]
Here, nodes XML
, JSON
, and HTML
are text nodes containing data and markup encoded in foreign formats. For instance, node XML
contains the XML code <rating>5/5</rating>
embedded as PDML text. After parsing the PDML document, this XML code may be parsed by an XML parser.
Note
A more sophisticated and more efficient parsing technique (not elaborated here) would be to switch parsers at parse-time. In our example, a PDML parser would first be invoked, then the application would switch to an XML parser to parse <rating>5/5</rating>
, then switch back to the PDML parser, then switch to a JSON parser to parse the JSON code, and so forth.
You can also embed PDML code as text in a PDML document, but then you must escape characters [
, ]
, otherwise they would be parsed as node start and end symbol. For example, to store the markup [p Have a [i great] day.]
as text in node markup_text
, you must write:
[markup_text \[p Have a \[i great\] day.\]]
Note
Frequent use of escape sequences increases error-proneness while reducing readability and writability.
To address this issue, the String Literals PDML extension (not part of Core PDML) provides alternative ways to encode text, eliminating the need for excessive escape sequences. For example, a quoted string literal allows you to rewrite the above code as follows:
[markup_text ^"[p Have a [i great] day.]"]