Creating well-tagged PDFs with a correct tag structure is essential for accessibility, searchability, and repurposing content. However, automated layout recognition tools often have limitations when it comes to accurately detecting complex document structures, such as nested headings, tables, or captions.
This is where the Template Language comes in. By defining custom rules, you can overcome these limitations and ensure that your PDFs are properly tagged, making them accessible to screen readers, compliant with standards like PDF/UA, and easier to navigate. Whether you’re working with technical documents, academic papers, or reports, this template language empowers you to fine-tune the detection and tagging process, ensuring your PDFs meet the highest quality standards.
Key Features
- Flexible Rules: Define conditions based on element properties like font size, font name, color, position, and text content
- Custom Actions: Assign headings, flag elements, or tag images with alternate text
- Adaptable to Any Layout**: Fine-tune detection and processing to match your document’s unique structure
How It Works
The template is organized into sections, each targeting specific types of elements (e.g., text lines, words, images). Each section contains an array of rules, where each rule defines conditions and actions.
Main Sections
- `element_update`
- Updates elements (e.g., headers, footers) based on their position or properties
- Example: Flag elements as headers if their bottom coordinate is greater than or equal to
743.58
- `text_update`
- Updates text elements based on font properties (e.g., size, name, color)
- Example: Assign
h1
to text with font nameMonaco
and font size between13.44
and13.45
- `word_update`
- Updates word elements based on their properties (e.g., font size, position)
- Example: Flag words as list items (
li_1
) if they are within a specific horizontal range and match a regex pattern
- `tag_image`
- Assigns alternate text (
alt
) to images based on their bounding box and page number - Example: Tag an image with the alternate text
"Elsevier"
if it appears within a specific bounding box on page1
- Assigns alternate text (
- `pagemap`
- Defines thresholds and parameters for layout recognition, such as line spacing, reading order, and text processing.
- Example: Set the maximum distance between lines (
text_line_distance_max_ratio
) to2
for paragraph detection.
- `pagemap_regex`
- Contains regular expressions for recognizing patterns like bullets, labels, page numbers, and captions.
- Example: Use
regex_bullet
to detect bullet characters in lists.
Examples
Example 1: Assign Heading 2 to Red Text with Font Size 14
{
"template": {
"word_update": [
{
"heading": "h2",
"query": {
"$and": [
{
"$0_font_size": [
{
"$gte": "14"
},
{
"$lte": "14"
}
]
},
{
"$0_fill_color": [
"255", // Red
"0", // Green
"0" // Blue
]
}
],
"param": [
"pde_text"
]
},
"statement": "$if"
}
]
}
}
- Action: Assigns `h2` to text with font size `14` and red fill color.
Example 2: Flag Words with Font Size 5.1 as Labels
{
"template": {
"word_update": [
{
"label": "label",
"query": {
"$and": [
{
"$0_font_size": [
{
"$gte": "5.1"
},
{
"$lte": "5.1"
}
]
}
],
"param": [
"pde_word"
]
},
"statement": "$if"
}
]
}
}
- Action: Flags words with font size `5.1` as labels.
Getting Started
To create your own template:
- Define the elements you want to detect (e.g., headers, footers, headings).
- Write rules using the query language to match elements based on their properties.
- Specify actions to apply to matching elements (e.g., assign headings, flag as labels).
For a full template configuration, check out our [Complete Template Example](#).
Why Use This Template?
- Precision: Fine-tune detection rules to match your document’s layout.
- Automation: Automate the tagging and processing of PDF elements.
- Accessibility: Ensure compliance with accessibility standards by tagging elements correctly.
Learn More
Explore the full capabilities of the template language by visiting our [Template Language Guide](#) or downloading the [Sample Template](#).
Let us know if you have any questions or need assistance creating your custom template!