PDFix Batch Actions

Batch Actions

no-code PDF processing

PDFix Batch actions offer an easy and flexible PDF documents processing with no code driven by a JSON configuration file.

PDFix Batch Actions

Custom PDF actions are engineered to automate the editing of PDFs and resolve various accessibility issues, thereby streamlining the processes of document editing and remediation. This approach enhances efficiency and simplifies tasks, eliminating the need for programming skills. By utilizing a bespoke sequence of actions, it is possible to process PDF documents tailored to specific requirements

{
    "title": "PDFix Batch Action Example",
    "desc": "Custom action sequence to re-tag the PDF document and set PDF/UA-1 identifier",
    "actions": [
        {
            "name": "clear_structure",
            "params": [
                {
                    "name": "clear_tags",
                    "value": true
                },
                {
                    "name": "clear_struct_tree",
                    "value": true
                },
                {
                    "name": "clear_bookmarks",
                    "value": false
                }
            ]
        },
        {
            "name": "add_tags",
            "params": [
                {
                    "name": "standard_attrs",
                    "value": false
                },
                {
                    "name": "sequential_headings",
                    "value": true
                }
            ]
        },
        {
            "name": "set_pdf_ua_standard",
            "params": [
                {
                    "name": "part_number",
                    "value": 1
                }
            ]
        }
    ]
}

Accessibility

Set PDF/UA Standard

set_pdf_ua_standard

Set the PDF/UA part number

params:

  • part_number (string) Part Identifier – The part number of the International Standard that the file conforms to

    • 1 – PDF/UA-1
    • 2 – PDF/UA-2
  • rev_number (string) Rev Number – Four-digit year of the date of publication or revision. Ignored for part 1

example:

{
    "name": "set_pdf_ua_standard",
    "params": [
        {
            "name": "part_number",
            "value": "1"
        },
        {
            "name": "rev_number",
            "value": "2023"
        }
    ]
}

Set Suspect Value

set_suspect_value

Fix the document Suspects entry

example:

{
    "name": "set_suspect_value"
}

Fix Optional Content

fix_oc_name

Fix optional content configuration dictionary

example:

{
    "name": "fix_oc_name"
}

Fix Display Document Title

set_display_doc_title

Fix ViewerPreferences dictionary

example:

{
    "name": "set_display_doc_title"
}

Remove Standard Tags Mapping

remove_standard_tags_mapping

Remove standard tags mapping

example:

{
    "name": "remove_standard_tags_mapping"
}

Autotag

add_tags

Autotag document

params:

  • template (file_path) Template – Load the template from the file as the current template. If the file is empty, the default template will be used

  • preflight (bool) Preflight – Preflight the document and combine the preflight values with the current template

  • standard_attrs (bool) Add Layout Attributes – Add all detected layout attributes

  • sequential_headings (bool) Sequential Heading Levels – Keep headings in sequentially-descending order

example:

{
    "name": "add_tags",
    "params": [
        {
            "name": "template",
            "value": ""
        },
        {
            "name": "preflight",
            "value": false
        },
        {
            "name": "standard_attrs",
            "value": false
        },
        {
            "name": "sequential_headings",
            "value": true
        }
    ]
}

Clear Document Structure

clear_structure

Clear document structure

params:

  • clear_tags (bool) Clear Content Marks – Clear content marks

  • clear_struct_tree (bool) Clear Structure Tree – Clear structure tree

  • clear_bookmarks (bool) Clear Bookmarks – Clear bookmarks

example:

{
    "name": "clear_structure",
    "params": [
        {
            "name": "clear_tags",
            "value": true
        },
        {
            "name": "clear_struct_tree",
            "value": true
        },
        {
            "name": "clear_bookmarks",
            "value": true
        }
    ]
}

Fix ID Tree

fix_id_tree

Fix ID tree

example:

{
    "name": "fix_id_tree"
}

Fix Parent Tree

fix_parent_tree

Fix parent tree

example:

{
    "name": "fix_parent_tree"
}

Annotations

Fix Media Clip

fix_media_clip_keys

Define a MIME type of the media clip annotations file

params:

  • ct_key (string) Media Clip – CT key

    • text/plain – text/plain
    • text/html – text/html
    • image/jpeg – image/jpeg
    • audio/mp3 – audio/mp3
    • video/mp4 – video/mp4

example:

{
    "name": "fix_media_clip_keys",
    "params": [
        {
            "name": "ct_key",
            "value": "text/plain"
        }
    ]
}

Set Tab Order

set_tabs_key

Sets the tab order key for every page. Every page on which there is an annotation shall contain in its page dictionary the key Tabs, and its value shall be S

params:

  • tabs_key (string) Tabs Key – Tabs key

  • overwrite (bool) Overwrite – Replace the current Tabs Key if it already exists

example:

{
    "name": "set_tabs_key",
    "params": [
        {
            "name": "tabs_key",
            "value": "S"
        },
        {
            "name": "overwrite",
            "value": true
        }
    ]
}

Tag Annotations

tag_annot

Tag the untagged annotations. Annotations are placed to the closest tag if such tag was founded

params:

  • annot_types (annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update

example:

{
    "name": "tag_annot",
    "params": [
        {
            "name": "annot_types",
            "value": "^(?!.*Popup).*$"
        }
    ]
}

Set Contents

set_annot_contents

Set alternative description for an annotation to Contents key.

params:

  • annot_types (annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update

  • alt_type (int) Contents – Define a source for detecting the alternative text

    • 0 – Define the Custom Text
    • 1 – Use the text from annotation bounding box
    • 2 – Use the action destination
  • custom_text (string) Custom Text – Enter custom text to Contents key

  • bbox_padding_x (float) BBox X padding – Bounding box padding in horizontal direction

  • bbox_padding_y (float) BBox Y padding – Bounding box padding in vertical direction

  • overwrite (bool) Overwrite – Replace the current alternative description if it already exists

example:

{
    "name": "set_annot_contents",
    "params": [
        {
            "name": "annot_types",
            "value": ".*$"
        },
        {
            "name": "alt_type",
            "value": 1
        },
        {
            "name": "custom_text",
            "value": "Decorative"
        },
        {
            "name": "bbox_padding_x",
            "value": 4
        },
        {
            "name": "bbox_padding_y",
            "value": 4
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Flatten Annotations

flatten_annot

Flatten the visual representation of annotations directly into the content layer. Use this feature to avoid problems with annotation tagging when your document doesn’t need to be interactive after publishing as PDF/UA compliant

params:

  • annot_types (annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update

example:

{
    "name": "flatten_annot",
    "params": [
        {
            "name": "annot_types",
            "value": "^(?!.*Link|.*Widget|.*Popup).*$"
        }
    ]
}

create_web_links

Create link annotations from web links in the page content

params:

  • url_regex (string) URL – Regular expression used for searching web links

example:

{
    "name": "create_web_links",
    "params": [
        {
            "name": "url_regex",
            "value": "^(((http(s)?|ftp):\\/\\/)|(mailto:)|www.)[^\\s\\/$.?#].[^\\s]*"
        }
    ]
}

Delete Annotations

delete_annot

Completly remove annotation from PDF. Use this action when annotation with specific subtype is not permitted in the PDF/UA compliant output, for example TrapNet.

params:

  • annot_types (annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update

example:

{
    "name": "delete_annot",
    "params": [
        {
            "name": "annot_types",
            "value": "^TrapNet$"
        }
    ]
}

Bookmarks

Create Bookmarks

create_bookmarks

Create bookmarks from the tag tree hierarchy

params:

  • tag_1 (tag) Level 1 – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • tag_2 (tag) Level 2 – Define the tag that represents second level

  • tag_3 (tag) Level 3 – Define the tag that represents third level

  • tag_4 (tag) Level 4 – Define the tag that represents fourth level

  • tag_5 (tag) Level 5 – Define the tag that represents fifth level

  • tag_6 (tag) Level 6 – Define the tag that represents sixth level

  • overwrite (bool) Overwrite – Replace the current bookmarks it already exist

example:

{
    "name": "create_bookmarks",
    "params": [
        {
            "name": "tag_1",
            "value": "^H1$"
        },
        {
            "name": "tag_2",
            "value": "^H2$"
        },
        {
            "name": "tag_3",
            "value": "^H3$"
        },
        {
            "name": "tag_4",
            "value": "^H4$"
        },
        {
            "name": "tag_5",
            "value": "^H5$"
        },
        {
            "name": "tag_6",
            "value": "^H6$"
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Content

Artifact Content

artifact_content

Mark defined content as an artifact

params:

  • object_types (object) Objects – Define the objects by the template object_update

  • artifact_type (int) Mark as – Mark the content as an Artifact, Header or Footer

    • 0 – Artifact
    • 1 – Header
    • 2 – Footer

example:

{
    "name": "artifact_content",
    "params": [
        {
            "name": "object_types",
            "value": {
                "template": {
                    "object_update": [
                        {
                            "query": {
                                "$and": [
                                    {
                                        "$0_artifact": "false"
                                    },
                                    {
                                        "$0_mcid": "-1"
                                    }
                                ],
                                "param": [
                                    "pds_object"
                                ]
                            },
                            "statement": "$if"
                        }
                    ]
                }
            }
        },
        {
            "name": "artifact_type",
            "value": 0
        }
    ]
}

Flatten Form XObjects

flatten_xobject

Flatten Form XObjects

example:

{
    "name": "flatten_xobject"
}

Remove Content Marks

remove_content_marks

Remove atrifact, mcid or any custom tag from page content objects

params:

  • object_types (object) Objects – Define the objects by the template object_update

  • flags (flag) Remove – Types of content marks to be removed

    • 8 – Invalid Content Marks
    • 4 – Custom Content Marks

example:

{
    "name": "remove_content_marks",
    "params": [
        {
            "name": "object_types",
            "value": ".*"
        },
        {
            "name": "flags",
            "value": 8
        }
    ]
}

Conversion

PDF to HTML

pdf_to_html

Convert PDF to HTML

params:

  • input_pdf (file_path) Input PDF – Input PDF file path

  • output_html (file_path) Ouput HTML file path – Define the output HTML Index file

  • html_type (int) HTML Layout – HTML layout type

    • 0 – Original layout
    • 1 – Responsive layout
    • 2 – Layout defined by PDF Tags
  • template (file_path) Template – Load the template from the file as the current template. If the file is empty, the default template will be used

  • preflight (bool) Preflight – Preflight the document and combine the preflight values with the current template

  • flags (flag) Conversion Flags – Define conversion flags

    • 1 – Export JavaScripts
    • 2 – Export fonts
    • 4 – Default font sizes
    • 8 – Retain text color
    • 32 – Inline CSS
    • 64 – Inline JavaScript
    • 128 – Embed images
    • 256 – Embed fonts
    • 512 – Gray padding

example:

{
    "name": "pdf_to_html"
}

PDF to JSON

pdf_to_json

Convert PDF to JSON

params:

  • input_pdf (file_path) Input PDF – Input PDF file path

  • output_json (file_path) Output JSON file path – Define the output JSON file

  • flags (flag) Conversion Flags – Flags specifying the extracted content

    • 1 – Document Metadata
    • 2 – Page Information
    • 16 – Page Content
    • 32 – Document Structure Tree
    • 64 – Layout Recognition PageMap
    • 256 – Bounding Box
    • 512 – Content Marks
    • 4096 – Text Content
    • 8192 – Text Style
    • 16384 – Text State
    • 65536 – Images (base64)

example:

{
    "name": "pdf_to_json"
}

Fonts

Embed Fonts

embed_font

Embed font

example:

{
    "name": "embed_font"
}

Replace Font

replace_font

Replace font

params:

  • font_name (string) Font Name – The PDF font name that will replaced. Regular expression is supported

  • font_family (system_font) Font Family – The font family name that will be used for replacement

example:

{
    "name": "replace_font",
    "params": [
        {
            "name": "font_name",
            "value": ""
        },
        {
            "name": "font_family",
            "value": ""
        }
    ]
}

Add Missing Unicodes

add_missing_unicode

Add missing unicode mapping

example:

{
    "name": "add_missing_unicode"
}

Metadata

Set Document Properties

set_doc_info

Set document metadata and properties

params:

  • set_author (bool) Set Author – Set the document author

  • author (string) Author – Author

  • set_title (bool) Set Title – Set the document title

  • title (string) Title – Title

  • set_subject (bool) Set Subject – Set the document subject

  • subject (string) Subject – Subject

  • set_keywords (bool) Set Keywords – Set the document keywords

  • keywords (string) Keywords – Keywords

  • set_producer (bool) Set Producer – Set the document producer

  • producer (string) Producer – Producer name

  • set_creator (bool) Set Creator – Set the document creator

  • creator (string) Creator – Creator

example:

{
    "name": "set_doc_info",
    "params": [
        {
            "name": "set_author",
            "value": true
        },
        {
            "name": "author",
            "value": ""
        },
        {
            "name": "set_title",
            "value": true
        },
        {
            "name": "title",
            "value": ""
        },
        {
            "name": "set_subject",
            "value": true
        },
        {
            "name": "subject",
            "value": ""
        },
        {
            "name": "set_keywords",
            "value": true
        },
        {
            "name": "keywords",
            "value": ""
        },
        {
            "name": "set_producer",
            "value": true
        },
        {
            "name": "producer",
            "value": ""
        },
        {
            "name": "set_creator",
            "value": true
        },
        {
            "name": "creator",
            "value": ""
        }
    ]
}

Set PDF Version

set_pdf_version

Set the PDF version

params:

  • version_number (int) PDF Version – Version designations

    • 14 – PDF 1.4
    • 15 – PDF 1.5
    • 16 – PDF 1.6
    • 17 – PDF 1.7
    • 20 – PDF 2.0

example:

{
    "name": "set_pdf_version",
    "params": [
        {
            "name": "version_number",
            "value": 17
        }
    ]
}

Set Language

set_language

Define the document language

params:

  • lang (lang) Language – Document language

  • apply_lang_to (int) Apply Language to – Select the objects to which you intend to apply the defined language

    • 0 – Document
    • 1 – Tag
    • 2 – Marked Content
  • overwrite (bool) Overwrite – Replace the current language if it already exists

example:

{
    "name": "set_language",
    "params": [
        {
            "name": "lang",
            "value": "en-US"
        },
        {
            "name": "apply_lang_to",
            "value": 0
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Set Title

set_title

Set document title

params:

  • title_type (int) Title – Define a source for detecting the document title

    • 0 – Define the Custom Title
    • 1 – Use the first Description Tag
    • 2 – Retrieve the title from the file name
  • custom_text (string) Custom Title – Custom title

  • description_tag (string) Description Tag – Define the tags which content is used for title text

  • overwrite (bool) Overwrite – Replace the current title if it already exists

example:

{
    "name": "set_title",
    "params": [
        {
            "name": "title_type",
            "value": 2
        },
        {
            "name": "custom_text",
            "value": ""
        },
        {
            "name": "description_tag",
            "value": "Caption"
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Pages

Rotate Page

rotate_pages

Rotate pages

params:

  • rotation_type (int) Rotation Type – Type of rotation

    • 0 – Set rotation angle
    • 1 – Rotate by angle
  • rotation_angle (int) Rotation Angle – Rotation angle

    • 0 – 0 degrees
    • 90 – 90 degrees
    • 180 – 180 degrees
    • 270 – 270 degrees
  • page_rotation (int) Pages With Rotation – Pages with defined rotation

    • 0 – 0 degrees
    • 90 – 90 degrees
    • 180 – 180 degrees
    • 270 – 270 degrees
  • page_orientation (int) Pages With Orientation – Pages with defined orientation

    • 0 – Pages With Any Orientation
    • 1 – Landscape Pages
    • 2 – Portrait Pages

example:

{
    "name": "rotate_pages",
    "params": [
        {
            "name": "rotation_type",
            "value": 1
        },
        {
            "name": "rotation_angle",
            "value": 0
        },
        {
            "name": "page_rotation",
            "value": 0
        },
        {
            "name": "page_orientation",
            "value": 0
        }
    ]
}

Table

Set Table Headers

set_table_header

Set table header cells

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • scope (string) Scope – Reflects whether the header cell applies to the rest of the cells in the row, column or both that contain it

    • None – None
    • Row – Row
    • Column – Column
    • Both – Both
  • row_span (int) RowSpan – The number of rows in the enclosing table that shall be spanned by the cell. Use -1 to do not change existing RowSpan

  • col_span (int) ColSpan – The number of columns in the enclosing table that shall be spanned by the cell. Use -1 to do not change existing ColSpan

  • overwrite (bool) Overwrite – Replace the current TH properties if they already exist

example:

{
    "name": "set_table_header",
    "params": [
        {
            "name": "tag_names",
            "value": "^TD$"
        },
        {
            "name": "scope",
            "value": "None"
        },
        {
            "name": "row_span",
            "value": -1
        },
        {
            "name": "col_span",
            "value": -1
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Set Table Summary

set_table_summary

A summary of the table’s purpose and structure. This entry shall only be used within Table tags

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • summary_type (int) Summary – Define a source for detecting the summary

    • 0 – Define the Custom Summary
    • 1 – Use the associated TH content
    • 2 – Use the associated tag content
  • custom_text (string) Custom Summary – Enter custom text to serve as the table summary

  • overwrite (bool) Overwrite – Replace the table summary if it already exists

example:

{
    "name": "set_table_summary",
    "params": [
        {
            "name": "tag_names",
            "value": "^Table$"
        },
        {
            "name": "summary_type",
            "value": 2
        },
        {
            "name": "custom_text",
            "value": "Summary"
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Fix Table Tag

fix_table_tag

Fix Table tag errors

params:

  • tag_names (tag) Tags – Define the tags by the template tag_update

example:

{
    "name": "fix_table_tag",
    "params": [
        {
            "name": "tag_names",
            "value": "^Table$"
        }
    ]
}

Tags

Delete Tags

delete_tags

Delete defined tags

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • tag_content (string) Content – Handle the content of the deleted tag

    • none – Leave content untagged
    • move – Move nested tags to parent
    • artifact – Artifact content

example:

{
    "name": "delete_tags",
    "params": [
        {
            "name": "tag_names",
            "value": {
                "template": {
                    "tag_update": [
                        {
                            "query": {
                                "$and": [
                                    {
                                        "$0_tag_type": {
                                            "$regex": "^(?!H$|H\\d$|TH$|TD$|TR$|LBody$).*"
                                        }
                                    },
                                    {
                                        "$0_has_content": "false"
                                    }
                                ],
                                "param": [
                                    "pds_struct_elem"
                                ]
                            },
                            "statement": "$if"
                        }
                    ]
                }
            }
        },
        {
            "name": "tag_content",
            "value": "none"
        }
    ]
}

Apply Standard Tags

apply_standard_tags

Apply standard tag names according to their role mapping

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • rolemap (bool) Clear Role Map – Clear the role map specified in the structure tree root

example:

{
    "name": "apply_standard_tags",
    "params": [
        {
            "name": "tag_names",
            "value": ".*"
        },
        {
            "name": "rolemap",
            "value": true
        }
    ]
}

Rename Tags

rename_tags

Rename tag names

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • tag_name (string) Replace with – Type a new tag name

example:

{
    "name": "rename_tags",
    "params": [
        {
            "name": "tag_names",
            "value": "^P$"
        },
        {
            "name": "tag_name",
            "value": "P"
        }
    ]
}

Clone XObjects

clone_xobject

Clone Form XObjects in tags

example:

{
    "name": "clone_xobject"
}

Set Tag ID

set_tag_id

Generate a unique ID key for specific tags. For example, PDF/UA1 requires a unique ID for the Note tag

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • overwrite (bool) Overwrite – Replace the current Tag ID if it already exists

example:

{
    "name": "set_tag_id",
    "params": [
        {
            "name": "tag_names",
            "value": "^Note$|^TH$"
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Set Tag BBox

set_tag_bbox

Calculate the bounding box from the tag content and sets it to the Layout attributes

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • overwrite (bool) Overwrite – Replace the current Tag ID if it already exists

example:

{
    "name": "set_tag_bbox",
    "params": [
        {
            "name": "tag_names",
            "value": "^Figure$|^Formula$|^Form$|^Table$"
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Set Alternate Description

set_alt

Set alternative description to the tag. These text alternatives are crucial for accessibility, as they enable users with vision disabilities to understand the content

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • alt_type (int) Alternative Description – Define a source for detecting the alternative text

    • 0 – Define the Custom Alternative text
    • 1 – Use the first Description Tag above
    • 2 – Use the first Description Tag below
    • 3 – Use the first Description Tag from children
    • 4 – Use the associated tag content. If there is an Annotation among the children, its Contents key is used
  • custom_text (string) Custom Alternative – Enter custom text to serve as an alternative description

  • description_tag (string) Description Tag – Define the tags which content is used for alternative description

  • overwrite (bool) Overwrite – Replace the alternative description if it already exists

example:

{
    "name": "set_alt",
    "params": [
        {
            "name": "tag_names",
            "value": "^Figure$|^Formula$"
        },
        {
            "name": "alt_type",
            "value": 4
        },
        {
            "name": "custom_text",
            "value": "Decorative"
        },
        {
            "name": "description_tag",
            "value": "Caption"
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Set Actual Text

set_actual

Set a replacement for the content, providing text that is equivalent to what a person would see when viewing the content

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • actual_type (int) Actual Text – Define a source for detecting the replacement text

    • 0 – Define the Custom Actual text
    • 1 – Use the associated tag content
  • custom_text (string) Custom Actual – Enter custom text to serve as a replacement text

  • overwrite (bool) Overwrite – Replace the actual text if it already exists

example:

{
    "name": "set_actual",
    "params": [
        {
            "name": "tag_names",
            "value": "^Span$"
        },
        {
            "name": "actual_type",
            "value": 0
        },
        {
            "name": "custom_text",
            "value": ""
        },
        {
            "name": "overwrite",
            "value": false
        }
    ]
}

Fix Placement

fix_placement

Fix Placement corrects incorrect Placement attributes for the Figure, Formula, Form and Note tags

example:

{
    "name": "fix_placement"
}

Fix Document Tag

fix_document_tag

Fix Document tag

example:

{
    "name": "fix_document_tag"
}

Fix List Tag

fix_list_tag

Fix List tag errors

params:

  • tag_names (tag) Tags – Define the tags by the template tag_update

example:

{
    "name": "fix_list_tag",
    "params": [
        {
            "name": "tag_names",
            "value": "^L$"
        }
    ]
}

Remove Tag Properties

remove_tag_data

Remove properties from the defined tags

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • accept_alternate_desc (bool) Remove Alternate – Remove Alt key

  • accept_actual_text (bool) Remove Actual Text – Remove ActualText key

  • accept_id (bool) Remove ID – Remove ID key

  • accept_lang (bool) Remove Language – Remove Lang key

  • accept_title (bool) Remove Title – Remove T key

  • owner (string) Remove Attribute Owner – Remove standard attribute owner. If Attribute Name is not specified, it removes the whole group of attributes

    • None – None
    • Layout – Layout Attributes governing the layout of content
    • List – List Attributes governing the numbering of lists
    • PrintField – PrintField Attributes governing Form structure elements for non-interactive form fields
    • Table – Table Attributes governing the organisation of cells in tables
  • name (string) Remove Attribute Name – Removes only the attribute with defined name e.g. Headers from the previously defined Attribute Owner

example:

{
    "name": "remove_tag_data",
    "params": [
        {
            "name": "tag_names",
            "value": ".*"
        },
        {
            "name": "accept_alternate_desc",
            "value": false
        },
        {
            "name": "accept_actual_text",
            "value": false
        },
        {
            "name": "accept_id",
            "value": false
        },
        {
            "name": "accept_lang",
            "value": false
        },
        {
            "name": "accept_title",
            "value": false
        },
        {
            "name": "owner",
            "value": ""
        },
        {
            "name": "name",
            "value": ""
        }
    ]
}

Set Tag Attributes

set_structure_attribute

Set the standard structure attributes for tags. Each attribute object has an owner.

params:

  • tag_names (tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update

  • overwrite (bool) Overwrite – Replace the current Tag ID if it already exists

  • owner (string) Owner – Standard attribute owner

    • Layout – Layout Attributes governing the layout of content
    • List – List Attributes governing the numbering of lists
    • PrintField – PrintField Attributes governing Form structure elements for non-interactive form fields
    • Table – Table Attributes governing the organisation of cells in tables
  • name (string) Name – Attribute name

  • value (string) Value – Arrtibute value

  • value_type (string) Value Type – Attribute value type

    • string – string
    • name – name
    • array – array
    • number – number
  • overwrite (bool) Overwrite – Replace the current attribute if it already exists

example:

{
    "name": "set_structure_attribute",
    "params": [
        {
            "name": "tag_names",
            "value": ".*"
        },
        {
            "name": "overwrite",
            "value": false
        },
        {
            "name": "owner",
            "value": ""
        },
        {
            "name": "name",
            "value": ""
        },
        {
            "name": "value",
            "value": ""
        },
        {
            "name": "value_type",
            "value": ""
        },
        {
            "name": "overwrite",
            "value": "false"
        }
    ]
}

Fix Spaces

fix_structure_spaces

Add missing or resolve duplicate white spaces within a structure element

params:

  • add_missing_spaces (bool) Add Missing Spaces – Identify words in the structure and add missing spaces

  • remove_unnecessary_spaces (bool) Remove Unnecessary Spaces – Remove duplicite spaces after each word

  • artifact_unnecessary_spaces (bool) Artifact Unnecessary Spaces – Mark duplicite spaces after each word as an Artifact

example:

{
    "name": "fix_structure_spaces",
    "params": [
        {
            "name": "add_missing_spaces",
            "value": true
        },
        {
            "name": "remove_unnecessary_spaces",
            "value": false
        },
        {
            "name": "artifact_unnecessary_spaces",
            "value": true
        }
    ]
}

Fix Headings

fix_headings

Correct an invalid heading structure mo maintain headings in sequentially-descending order

params:

  • renumber_headings (int) Renumber Headings – Renumber all headings

    • 0 – Change headings to
    • 1 – Move headings up a level
    • 2 – Add empty headings
  • change_headings_to (string) Change Headings to – Change Headings to

    • H – H
    • H1 – H1
    • H2 – H2
    • H3 – H3
    • H4 – H4

example:

{
    "name": "fix_headings",
    "params": [
        {
            "name": "renumber_headings",
            "value": 2
        },
        {
            "name": "change_headings_to",
            "value": "H"
        }
    ]
}