Run PDF Validation

How to Create Accessible PDF

Annotations

Bookmarks

Accessibility

Tags

Content

Layout Template

Workspace

Table Tool

No headings found in this post.

Selection tools

SDK Actions

Preferences

Thumbnails

No headings found in this post.

Fonts

No headings found in this post.

PDF Conversion

Browser

No headings found in this post.

Destinations

License

How to Define Annotations

Missing help

No headings found in this post.

PDFix Actions Pipeline

External Actions

How to Define Tags

How to Define Content

Tag Tool

Basic Actions

How to Define Content

Define the content (page objects) to which the specific action is applied. Combo box Objects gives you multiple options how to do that.

artifact content on the document dialog screenshot

In the combo box you can simply check page object types you want to process.

screenshot of dialog window of different page objects

Each action includes predefined subsets of page objects, defined by regex or template. For example, Untagged Artifacts targets all untagged content – page objects without an MCID or Artifact content mark.


  • Define by Template option lets you create advanced queries for processing page objects in PDFix. To use this feature, select the Add Template option from the combo box menu.
dialog of options to define content in PDF by template
  • In the Edit Template dialog, you can define specific conditions for which tags should be processed in the object_update node. The process for setting conditions is the same as in the Template Panel.
  • For common use cases, examples of templates are provided below. Simply select the Plain Template option in the Edit Template dialog and replace the existing code with the provided examples to easily apply customized tag processing.
Edit Template Dialog

Example 1: Untagged content

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_artifact": "false"
                        },
                        {
                            "$0_mcid": "-1"
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 2: Content that should be marked as Header

In this example, page objects with a bounding box positioned more than 700px from the bottom of the page will be processed. These objects are typically identified as artifacted headers.

If you need to tag only the first instance of repeated content, you can use a modified template, as shown in Example 2.1. This template allows you to selectively tag content based on specific conditions.

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_bottom": {
                                "$gt": "700"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 2.1: Content that will be marked as Header on each page except the first one

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_bottom": {
                                "$gt": "700"
                            }
                        },
                        {
                            "$page_num": {
                                "$gt": "1"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

In this example, page objects with a bounding box located less than 100 px from the bottom of the page will be processed, typically identifying them as artifact-ed footers.

If you need to tag only the last instance of repeated content, you can modify the template as shown in Example 3.1 to suit this specific requirement.

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_top": {
                                "$lt": "100"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 3.1: Content that will be marked as Footer on each page except the last one

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_top": {
                                "$lt": "100"
                            }
                        },
                        {
                            "$page_num": {
                                "$lt": "$doc_num_pages"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Leave us a Question or Comment

Posted

in

Tags: