PDML Overview
Latest update |
2025-03-25 |
First published |
2022-11-16 |
Website |
|
Author |
Christian Neumanns |
Editor |
Tristano Ajmone |
Introduction
This document provides a concise introduction to the Practical Data and Markup Language (PDML). It is written in a question-and-answer format, aiming to address common questions from first-time visitors. After reading this overview, you should have a clear understanding of what PDML is, is not, and what you can do with it.
What Is PDML?
The Practical Data and Markup Language (PDML) is a text format you can use for encoding, storing and transmitting data and markup (i.e. formatted text).
Its design goals emphasize:
-
Human-friendly approach: a simple, succinct syntax that is easy to read and write.
-
Suitability for encoding data and markup of any complexity as plain UTF-8 text.
-
An open standard that is independent of the operating system or programming language.
-
Optional yet powerful features (extensions) that enhance its practicality by improving and simplifying the process of creating and maintaining your PDML documents.
Note
Like JSON, XML, YAML, and other formats, PDML is not a binary format optimized for space- and time-efficiency. For example, numbers and boolean values are stored as text, not as binary data.
Here's a simple example showing how data and markup can be mixed in a single PDML document:
[document
[data
[server_config
[name Office Server]
[address
[ip 192.168.1.1]
[port 8080]
]
]
]
[markup_code
[p We can write text in [b bold], [i italic], or [b [i bold and italic]].]
]
]
What Can I Find on This Website?
To help you getting started, the PDML website provides:
-
Documentation — most notably, the Core PDML specification: a portable, programming-language-agnostic and open standard specifying the rules to encode data and markup as plain UTF-8 text.
-
A free and open-source (FOSS) reference implementation, written in Java, and available as a Java library (
.jar
file) which you may use to work with PDML documents in JVM-based projects running on Windows, Linux, or MacOS. This library enables you to parse PDML documents; programmatically generate PDML documents and ASTs; explore, change, and transform ASTs, etc. -
The PDML Companion: a FOSS command line tool which runs on Windows, Linux, and MacOS. You may use this tool to carry out various operations on PDML documents (stored in
.pdml
text files or read from STDIN). -
Other assets described later.
What Is "Core PDML" and What Are "PDML Extensions"?
The Core PDML Specification encompasses the fundamental set of simple rules needed for encoding data and markup of any complexity as plain text. This is the minimum set of rules that every PDML implementation must adhere to.
PDML Extensions encompass additional rules that specify a set of predefined, but optional features (called extensions) designed to enrich the core functionality and increase practicality in specific contexts. A PDML implementation can support some or all extensions, but every extension must be implemented according to the official PDML Extensions Specification, and the documentation of a PDML implementation should clearly specify which extensions (if any) are supported.
Extensions are designed to simplify, and sometimes even automate the creation and maintenance of your PDML documents.
Can You Tell Me More About Extensions?
Here's a short overview of PDML extensions, including some simple examples:
-
Comments
This extension enables you to insert single- or multi-line comments in PDML documents. Multiline comments can be nested.
Example of a single-line comment:
[config ^// Valid values: small, medium, large [size large] ... ]
-
Constants
You can define constants containing reusable text, data, or markup snippets that can be inserted multiple times in a PDML document. This supports the "Don't repeat yourself" (DRY) principle.
Consider the following PDML code:
[URLs [about_URL https://www.example.com:8080/public/resources/docs/about.html] [FAQ_URL https://www.example.com:8080/public/resources/docs/faq.html] [manual_URL https://www.example.com:8080/public/resources/docs/manual.html] ]
Instead of repeating the common URL part three times, you can increase maintainability and shorten the code by storing the common part into a constant named
docs_URL
, and then insert the value stored in the constant as many times as needed:[URLs ^[const docs_URL="https://www.example.com:8080/public/resources/docs/"] [about_URL ^[ins docs_URL]about.html] [FAQ_URL ^[ins docs_URL]faq.html] [manual_URL ^[ins docs_URL]manual.html] ]
-
Include Functionality
You can dynamically insert text, data, or markup retrieved from external resources such as files, URLs, OS environment variables, etc. This allows you, for example, to:
-
Define common text, data, or markup snippets used in several documents and share them with collaborators.
-
Split a PDML document into several files.
Here's an example showing the chapters of a book stored in separate files:
[book [title Strange Adventures of a Curious Software Developer] ^[ins_file chapters/chapter_1.pml] ^[ins_file chapters/chapter_2.pml] ^[ins_file chapters/chapter_3.pml] ... ]
-
-
Unicode Escape Sequences
Insert Unicode escape sequences (e.g.
\u{1F600}
→ 😀). -
String Literals
Use quoted, multi-line, or raw string literals to simplify the insertion of "special text," such as source code, HTML snippets, regular expressions, etc.
For example, consider the following source code:
repeat 3 times write_line ( "[Hello]" ) .
Suppose you want to store this code in a node named
code
. Using only Core PDML you would have to write:[code repeat 3 times write_line ( "\[Hello\]" ) .]
Readability and writeability improves if you use a multiline string:
[code """ repeat 3 times write_line ( "[Hello]" ) . """ ]
-
Attributes
Use attributes to define meta-data (similar to attributes in XML/HTML).
Pseudo-markup example:
[header ^(color=red size=big) Important!]
-
Embedded Java Source Code (Experimental)
You can embed Java source code into a PDML document to:
-
Programmatically generate parts of the document (e.g. auto-generate a table containing data retrieved from databases or other resources).
-
Conditionally include/exclude parts/snippets of a document.
-
Write a log entry to STDOUT or into a file or database.
-
Create and use templates controlled by the environment (i.e. template engine is built-in; no need for an external, third-party template tool).
-
Exploit the power of Java and third-party Java libraries (
.jar
files) to do anything you need to do in your specific context.
Here's a simple example of using a Java expression to define the text content of node
OS_name
:[OS_name ^[ins_exp System.getProperty("os.name")]]
-
-
Document Validation (Incubating)
Validate PDML documents via types and schemas.
Note
While the above features are all implemented in the PDML reference implementation, some are not yet documented on the PDML website.
What Can I Do With PDML?
In a nutshell, you can:
-
Use PDML to store/transmit data/markup.
-
Launch a FOSS command line tool to manipulate and convert PDML files in various ways.
-
Use the PDML reference implementation (
.jar
file) in your own JVM projects (Java, Kotlin, Scala, etc.). -
Create your own PDML parser or assets.
The following sub-sections provide more details. You might want to skip them and read them later if you wish.
Human-Friendly Data Storage and Delivery
You can use the PDML text format to edit data and/or markup (using any editor/IDE), save it into a portable, human-readable .pdml
file, store it in a local/remote database and transfer it over networks.
PDML enables you to encode all kinds of data:
-
scalar values (strings, numbers, booleans, etc.)
-
structured data such as lists, maps (key, value pairs), records that can be multi-nested, tables, simple databases, tree structures, configuration data
-
unstructured, heterogenous, or polymorphic data
-
markup
You'll find examples of data and markup encoding in Core PDML Examples.
PDML Companion
You can immediately download the PDML Companion (a FOSS command line tool running on Linux, MacOS, and Windows) to:
-
Convert a PDML document into a standalone HTML document (CSS included) to display an expandable/collapsible tree view of the PDML data/markup structure.
-
Convert a PDML document that relies on extensions (optional features) into a simplified, static version that uses only Core PDML.
To see why this is useful, consider the following PDML document that uses three optional features (extensions): a Unicode escape sequence, a comment, and a Java expression:
File input.pdml[product [id 123] [quality \u{1F44D 1F44D}] ^// Use a Java expression to compute the price [price ^[ins_exp 100 * 1.05]] ]
To convert it to a document that depends only on Core PDML (and will therefore be compatible with any PDML implementation), you run the following command in a terminal:
pdml p2c input.pdml output.pdml
The resulting document will look like this:
File output.pdml[product [id 123] [quality 👍👍] [price 105] ]
As you can see, the Unicode escape sequence has been expanded, the comment has been removed, and the embedded Java source code expression (
100 * 1.05
) in the original document has been evaluated and replaced with105
in the target document.The result is
output.pdml
, a transpiled, static, and standalone version of the source document. It's well-suited for distribution, since it can now be parsed efficiently with any PDML compliant parser, including those which don't support PDML extensions. -
Convert a PDML document into JSON or XML, or JSON/XML into PDML.
-
Parse a PDML document into an abstract syntax tree (AST) and then invoke custom Java source code to programmatically explore its AST and do with it whatever you wish: search for nodes, create a report/summary, transform the AST into anything else, etc.
-
Call PDML Companion from scripts (e.g.
.cmd
/.bat
/.ps1
or.sh
files) and automated builds to create customized one-click tasks/pipelines carrying out complex operations involving multiple tools.
Java Library
If you work on a JVM-based project (e.g. an application written in Java, Kotlin, Scala, Clojure, or any other language targeting the JVM), you can add the FOSS PDML reference implementation (a .jar
file) as a dependency to your project, and then use all the functionality available:
-
Programmatically execute any PDML Companion command (mentioned earlier) by calling an API function (e.g. convert to/from JSON/XML).
-
Parse a PDML document into an AST, or listen to events while a PDML document is parsed.
-
Programmatically create PDML documents and ASTs.
-
Explore, validate, change, and transform PDML ASTs, e.g.:
-
traverse a tree or sub-tree to search/filter nodes
-
validate data encoded in PDML
-
programmatically add, remove, and change AST nodes
-
convert a tree or sub-tree to other formats, such as JSON and XML
-
-
Programmatically serialize/deserialize objects in memory (support for automatic serialization/deserialization is not yet available).
Note
The PDML reference implementation is not yet available in the Maven repository — you need to download the .jar
file and add it to your project.
Create Your Own Implementation
You can also read the Core PDML Specification and write your own PDML parser in your preferred language, and (if you wish) implement PDML extensions.
How Does PDML Compare to XML and JSON?
In a nutshell:
-
PDML is less verbose than XML and JSON.
-
Unlike JSON, PDML is suitable for encoding markup.
-
Like JSON and XML (but unlike YAML and some other formats), PDML is suitable for deeply nested data structures.
-
Core PDML (without extensions) is easier to parse than XML and JSON.
-
PDML provides the aforementioned unique set of standardized extensions. JSON and XML also support Unicode escape sequences, but JSON requires surrogate pairs for Unicode code points greater than
U+FFFF
. Some PDML extensions (e.g. comments and attributes) are also supported in XML, but in different ways.
Note
The above claims (excluding extensions) are explained and exemplified in "Suggestion For a Better XML/HTML Syntax".
The PDML homepage presents a simple example comparing the XML, JSON, and PDML syntaxes. You can find more examples in Core PDML Examples.
How Mature Is PDML?
The Core PDML Specification is stable — major (backwards incompatible) changes in the future are unlikely.
All other assets mentioned on the PDML website (implementations, tools, extensions, documentation, etc.) are more or less "under construction." Unlike XML and JSON, PDML is still in its early days.
Current development focuses on finalizing extensions, enhancing and completing documentation, and providing support for editors and IDEs.
How Can I Contribute to the PDML Project?
In order to contribute to the PDML project and its ecosystem you could:
-
Report a bug by posting an issue, submit a pull request, or start a discussion to provide feedback and suggestions.
-
Help improving the website and docs (repo on GitHub).
-
Write a PDML parser in your preferred language, and add a link to it on the PDML website.
-
Create and publish a PDML asset. For example, it would be great to have a Tree-sitter parser for PDML, because that parser could then be used to:
-
Add flawless PDML syntax highlighting and other useful features to editors and IDEs supporting Tree-sitter.
-
Create Tree-sitter ASTs (representing PDML documents) in all languages that have bindings for Tree-sitter.
-
-
Spread the word about PDML.
Which Software Assets Are Available?
The following PDML-related software assets are currently available:
-
PDML reference implementation, written in Java, supporting all PDML extensions (GPLv2 license).
-
Simple, minimalist Core PDML parser, written in Java (MIT License).
-
PDML parser written in V, created by Subhomoy Haldar (MIT License).
-
Sublime PDML for the Sublime Text editor, created by Tristano Ajmone (MIT License).
-
PDML Language Support for the VS Code editor, created by Shelby Landen (MIT License).
Note
To suggest additional assets for inclusion in the above list, please send a pull request (website repo), open a discussion on the website repo or send an email to < chris {at} pml-lang {dot} dev >
Do I Have to Pay to Use PDML?
No. PDML is free of charge for everyone (including commercial entities/projects) — no strings attached. The reference implementation is free and open source software (FOSS). In other words: PDML is free as in beer (gratis) and free as in speech (libre).
What Are the Origins of PDML?
PDML originated along with the Practical Markup Language (PML), a lightweight markup specification and FOSS command line tool for writing styled text documents, rendered as HTML.
Here's a condensed version of the timeline:
-
2018: creation of the PDML syntax (initially called PML syntax).
-
2019: publication of the new syntax in We Need a New Document Markup Language - Here is Why.
-
2020: PML published at pml-lang.dev.
-
2022: PDML published at pdml-lang.dev.
Note
All documents on the PDML and PML websites (including this one) are written in PML, which uses PDML under the hood (i.e. PML, written in Java, depends on the PDML .jar
library). Hence, all PDML extensions are also available in PML — and any other JVM project that uses the PDML reference implementation as a third-party library.
Who Created PDML?

"Greetings! I'm Christian Neumanns, from Luxembourg.
You can read more about me, or contact me at: < chris {at} pml-lang {dot} dev >."