How to define Content

Table of Content:

Workspace

Preferences

PDFix Actions

Selection Tools

Validation

Accessibility

Tags

Annotations

Content

Bookmarks

Conversion

Destinations

Browser

Template

License

Define the content (page objects) to which the specific action is applied. Combo box Objects gives you multiple options how to do that.

Check content types

In the combo box you can simply check page object types you want to process.

Predefined contents

For each action, there are predefined subset of page objects defined by regex or template. For example, Untagged artifacts all untagged content – page object without MCID and Artifact content mark.

Define content by Template

Define by template option allows you to define more complex query for page objects. Select the option Add Template from the combo box menu.

In the Edit template dialog you can define conditions which tags should be processed in the object_update node. The system of defining conditions is the same like in the template panel.

Below you can find examples of commonly used templates. In the Edit template dialog choose the Plain Template option and replace the code with examples bellow.

Example 1: Untagged content

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_artifact": "false"
                        },
                        {
                            "$0_mcid": "-1"
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 2: Content that should be marked as Header

In this example, each page object with a bounding box above 700 px from the bottom of the page will be processed, typically marked as an artifacted header.

In some cases, you may want to tag only the first instance of repeated content. For this, you can use slightly modified template in Example 2.1

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_bottom": {
                                "$gt": "700"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 2.1: Content that will be marked as Header on each page except the first one

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_bottom": {
                                "$gt": "700"
                            }
                        },
                        {
                            "$page_num": {
                                "$gt": "1"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 3: Content that should be marked as Footer

In this example, each page object with a bounding box below 100 px from the bottom of the page will be processed, typically marked as an artifacted footer.

In some cases, you may want to tag only the last instance of repeated content. For this, you can use slightly modified template in Example 3.1

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_top": {
                                "$lt": "100"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 3.1: Content that will be marked as Footer on each page except the last one

{
    "template": {
        "object_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_top": {
                                "$lt": "100"
                            }
                        },
                        {
                            "$page_num": {
                                "$lt": "$doc_num_pages"
                            }
                        }
                    ],
                    "param": [
                        "pds_object"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Posted

in

, ,

Tags: