How to Define Content

Table of Content

Workspace

Preferences

General

Display

Panels

Shortcuts

App View

Accessibility

Page Map

Table Tool

Template

PDFix Actions

Action Manager

Selection Tools

Validation

Accessibility

Tags

Annotations

Content

Bookmarks

Conversion

Destinations

Browser

Template

License

Define the content (page objects) to which the specific action is applied. Combo box Objects gives you multiple options how to do that.

artifact content on the document dialog screenshot

In the combo box you can simply check page object types you want to process.

screenshot of dialog window of different page objects

Each action includes predefined subsets of page objects, defined by regex or template. For example, Untagged Artifacts targets all untagged content – page objects without an MCID or Artifact content mark.


  • Define by Template option lets you create advanced queries for processing page objects in PDFix. To use this feature, select the Add Template option from the combo box menu.
dialog of options to define content in PDF by template
  • In the Edit Template dialog, you can define specific conditions for which tags should be processed in the object_update node. The process for setting conditions is the same as in the Template Panel.
  • For common use cases, examples of templates are provided below. Simply select the Plain Template option in the Edit Template dialog and replace the existing code with the provided examples to easily apply customized tag processing.
Edit Template Dialog

Example 1: Untagged content

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_artifact": "false"
                        },
                        {
                            "$0_mcid": "-1"
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 2: Content that should be marked as Header

In this example, page objects with a bounding box positioned more than 700px from the bottom of the page will be processed. These objects are typically identified as artifacted headers.

If you need to tag only the first instance of repeated content, you can use a modified template, as shown in Example 2.1. This template allows you to selectively tag content based on specific conditions.

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_bottom": {
                                "$gt": "700"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 2.1: Content that will be marked as Header on each page except the first one

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_bottom": {
                                "$gt": "700"
                            }
                        },
                        {
                            "$page_num": {
                                "$gt": "1"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 3: Content that should be marked as Footer

In this example, page objects with a bounding box located less than 100 px from the bottom of the page will be processed, typically identifying them as artifact-ed footers.

If you need to tag only the last instance of repeated content, you can modify the template as shown in Example 3.1 to suit this specific requirement.

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_top": {
                                "$lt": "100"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 3.1: Content that will be marked as Footer on each page except the last one

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_top": {
                                "$lt": "100"
                            }
                        },
                        {
                            "$page_num": {
                                "$lt": "$doc_num_pages"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Posted

in

, ,

Tags: