How to define Tags

Table of Content:

Workspace

Preferences

PDFix Actions

Selection Tools

Validation

Accessibility

Tags

Annotations

Content

Bookmarks

Conversion

Destinations

Browser

Template

License

Define the Tags to which the specific action is applied. Combo box Tags gives you multiple options how to do that.

Check tag types

In the combo box you can simply check tag types you want to process.

Predefined tag sets

For each action, there are predefined subsets of tags defined by regex or template. For example, Empty tags without content processes each tag has no page content.

Define tags by Regex

Tag types can be easily defined using a regular expression. You can enter the regular expression into the Tags combo box, or select from predefined or saved regex formulas.

To add a favorite regex, use Add Regex, accessible by right-clicking in the combo box or clicking the menu icon

For example:

All tags:

.*

Tags with the bbox attribute

^Figure$|^Formula$|^Form$|^Table$

Note and TH tags

^Note$|^TH$

Figure and Formula tags

^Figure$|^Formula$

Test the regex on https://regex101.com/

Define tags by Template

Define by template option allows you to define more complex query for tags. Select the option Add Template from the combo box menu.

In the Edit template dialog you can define conditions which tags should be processed in the tag_update node. The system of defining conditions is the same like in the template panel.

Below you can find examples of commonly used templates. In the Edit template dialog choose the Plain Template option and replace the code with examples bellow.

Example 1: Empty tags without content

{
    "template": {
        "tag_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_tag_type": {
                                "$regex": "^(?!H$|H\\d$|TH$|TD$|TR$|LBody$).*"
                            }
                        },
                        {
                            "$0_has_content": "false"
                        }
                    ],
                    "param": [
                        "pds_struct_elem"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 2: Tags with empty spaces

{
    "template": {
        "tag_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_text": {
                                "$regex": "^ *$"
                            }
                        }
                    ],
                    "param": [
                        "pds_struct_elem"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Example 3: TD cells with specific font name and size

{
    "template": {
        "tag_update": [
            {
                "query": {
                    "$and": [
                        {
                            "$0_tag_type": {
                                "$regex": "^TD$"
                            }
                        },
                        {
                            "$0_font_name": {
                                "$regex": "AvenirNextLTPro-Demi"
                            }
                        },
                        {
                            "$0_font_size": "9"
                        }
                    ],
                    "param": [
                        "pds_struct_elem"
                    ]
                },
                "statement": "$if"
            }
        ]
    }
}

Posted

in

,

Tags: