Batch Actions
no-code PDF processing
PDFix Batch actions offer an easy and flexible PDF documents processing with no code driven by a JSON configuration file.
PDFix Batch Actions
Custom PDF actions are engineered to automate the editing of PDFs and resolve various accessibility issues, thereby streamlining the processes of document editing and remediation. This approach enhances efficiency and simplifies tasks, eliminating the need for programming skills. By utilizing a bespoke sequence of actions, it is possible to process PDF documents tailored to specific requirements
{
"title": "PDFix Batch Action Example",
"desc": "Custom action sequence to re-tag the PDF document and set PDF/UA-1 identifier",
"actions": [
{
"name": "clear_structure",
"params": [
{
"name": "clear_tags",
"value": true
},
{
"name": "clear_struct_tree",
"value": true
},
{
"name": "clear_bookmarks",
"value": false
}
]
},
{
"name": "add_tags",
"params": [
{
"name": "standard_attrs",
"value": false
},
{
"name": "sequential_headings",
"value": true
}
]
},
{
"name": "set_pdf_ua_standard",
"params": [
{
"name": "part_number",
"value": 1
}
]
}
]
}
Index
Accessibility: Set PDF/UA Standard, Set Suspect Value, Fix Optional Content, Fix Display Document Title, Remove Standard Tags Mapping, Autotag, Clear Document Structure, Fix ID Tree, Fix Parent Tree
Annotations: Fix Media Clip, Set Tab Order, Tag Annotations, Set Contents, Flatten Annotations, Create Web Links, Delete Annotations
Bookmarks: Create Bookmarks
Conversion: PDF to HTML, PDF to JSON
Fonts: Embed Fonts, Replace Font, Add Missing Unicodes
Metadata: Set Document Properties, Set PDF Version, Set Language, Set Title
Pages: Rotate Page
Accessibility
Set PDF/UA Standard
set_pdf_ua_standard
Set the PDF/UA part number
params:
-
part_number
(string) Part Identifier – The part number of the International Standard that the file conforms to- 1 – PDF/UA-1
- 2 – PDF/UA-2
-
rev_number
(string) Rev Number – Four-digit year of the date of publication or revision. Ignored for part 1
example:
{
"name": "set_pdf_ua_standard",
"params": [
{
"name": "part_number",
"value": "1"
},
{
"name": "rev_number",
"value": "2023"
}
]
}
Set Suspect Value
set_suspect_value
Fix the document Suspects entry
example:
{
"name": "set_suspect_value"
}
Fix Optional Content
fix_oc_name
Fix optional content configuration dictionary
example:
{
"name": "fix_oc_name"
}
Fix Display Document Title
set_display_doc_title
Fix ViewerPreferences dictionary
example:
{
"name": "set_display_doc_title"
}
Remove Standard Tags Mapping
remove_standard_tags_mapping
Remove standard tags mapping
example:
{
"name": "remove_standard_tags_mapping"
}
Autotag
add_tags
Autotag document
params:
-
template
(file_path) Template – Load the template from the file as the current template. If the file is empty, the default template will be used -
preflight
(bool) Preflight – Preflight the document and combine the preflight values with the current template -
standard_attrs
(bool) Add Layout Attributes – Add all detected layout attributes -
sequential_headings
(bool) Sequential Heading Levels – Keep headings in sequentially-descending order
example:
{
"name": "add_tags",
"params": [
{
"name": "template",
"value": ""
},
{
"name": "preflight",
"value": false
},
{
"name": "standard_attrs",
"value": false
},
{
"name": "sequential_headings",
"value": true
}
]
}
Clear Document Structure
clear_structure
Clear document structure
params:
-
clear_tags
(bool) Clear Content Marks – Clear content marks -
clear_struct_tree
(bool) Clear Structure Tree – Clear structure tree -
clear_bookmarks
(bool) Clear Bookmarks – Clear bookmarks
example:
{
"name": "clear_structure",
"params": [
{
"name": "clear_tags",
"value": true
},
{
"name": "clear_struct_tree",
"value": true
},
{
"name": "clear_bookmarks",
"value": true
}
]
}
Fix ID Tree
fix_id_tree
Fix ID tree
example:
{
"name": "fix_id_tree"
}
Fix Parent Tree
fix_parent_tree
Fix parent tree
example:
{
"name": "fix_parent_tree"
}
Annotations
Fix Media Clip
fix_media_clip_keys
Define a MIME type of the media clip annotations file
params:
-
ct_key
(string) Media Clip – CT key- text/plain – text/plain
- text/html – text/html
- image/jpeg – image/jpeg
- audio/mp3 – audio/mp3
- video/mp4 – video/mp4
example:
{
"name": "fix_media_clip_keys",
"params": [
{
"name": "ct_key",
"value": "text/plain"
}
]
}
Set Tab Order
set_tabs_key
Sets the tab order key for every page. Every page on which there is an annotation shall contain in its page dictionary the key Tabs, and its value shall be S
params:
-
tabs_key
(string) Tabs Key – Tabs key -
overwrite
(bool) Overwrite – Replace the current Tabs Key if it already exists
example:
{
"name": "set_tabs_key",
"params": [
{
"name": "tabs_key",
"value": "S"
},
{
"name": "overwrite",
"value": true
}
]
}
Tag Annotations
tag_annot
Tag the untagged annotations. Annotations are placed to the closest tag if such tag was founded
params:
annot_types
(annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update
example:
{
"name": "tag_annot",
"params": [
{
"name": "annot_types",
"value": "^(?!.*Popup).*$"
}
]
}
Set Contents
set_annot_contents
Set alternative description for an annotation to Contents key.
params:
-
annot_types
(annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update -
alt_type
(int) Contents – Define a source for detecting the alternative text- 0 – Define the Custom Text
- 1 – Use the text from annotation bounding box
- 2 – Use the action destination
-
custom_text
(string) Custom Text – Enter custom text to Contents key -
bbox_padding_x
(float) BBox X padding – Bounding box padding in horizontal direction -
bbox_padding_y
(float) BBox Y padding – Bounding box padding in vertical direction -
overwrite
(bool) Overwrite – Replace the current alternative description if it already exists
example:
{
"name": "set_annot_contents",
"params": [
{
"name": "annot_types",
"value": ".*$"
},
{
"name": "alt_type",
"value": 1
},
{
"name": "custom_text",
"value": "Decorative"
},
{
"name": "bbox_padding_x",
"value": 4
},
{
"name": "bbox_padding_y",
"value": 4
},
{
"name": "overwrite",
"value": false
}
]
}
Flatten Annotations
flatten_annot
Flatten the visual representation of annotations directly into the content layer. Use this feature to avoid problems with annotation tagging when your document doesn’t need to be interactive after publishing as PDF/UA compliant
params:
annot_types
(annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update
example:
{
"name": "flatten_annot",
"params": [
{
"name": "annot_types",
"value": "^(?!.*Link|.*Widget|.*Popup).*$"
}
]
}
Create Web Links
create_web_links
Create link annotations from web links in the page content
params:
url_regex
(string) URL – Regular expression used for searching web links
example:
{
"name": "create_web_links",
"params": [
{
"name": "url_regex",
"value": "^(((http(s)?|ftp):\\/\\/)|(mailto:)|www.)[^\\s\\/$.?#].[^\\s]*"
}
]
}
Delete Annotations
delete_annot
Completly remove annotation from PDF. Use this action when annotation with specific subtype is not permitted in the PDF/UA compliant output, for example TrapNet.
params:
annot_types
(annot) Annotations – Define the annotations types using a regular expression format (https://regex101.com/) or define by the template annot_update
example:
{
"name": "delete_annot",
"params": [
{
"name": "annot_types",
"value": "^TrapNet$"
}
]
}
Bookmarks
Create Bookmarks
create_bookmarks
Create bookmarks from the tag tree hierarchy
params:
-
tag_1
(tag) Level 1 – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update -
tag_2
(tag) Level 2 – Define the tag that represents second level -
tag_3
(tag) Level 3 – Define the tag that represents third level -
tag_4
(tag) Level 4 – Define the tag that represents fourth level -
tag_5
(tag) Level 5 – Define the tag that represents fifth level -
tag_6
(tag) Level 6 – Define the tag that represents sixth level -
overwrite
(bool) Overwrite – Replace the current bookmarks it already exist
example:
{
"name": "create_bookmarks",
"params": [
{
"name": "tag_1",
"value": "^H1$"
},
{
"name": "tag_2",
"value": "^H2$"
},
{
"name": "tag_3",
"value": "^H3$"
},
{
"name": "tag_4",
"value": "^H4$"
},
{
"name": "tag_5",
"value": "^H5$"
},
{
"name": "tag_6",
"value": "^H6$"
},
{
"name": "overwrite",
"value": false
}
]
}
Content
Artifact Content
artifact_content
Mark defined content as an artifact
params:
-
object_types
(object) Objects – Define the objects by the template object_update -
artifact_type
(int) Mark as – Mark the content as an Artifact, Header or Footer- 0 – Artifact
- 1 – Header
- 2 – Footer
example:
{
"name": "artifact_content",
"params": [
{
"name": "object_types",
"value": {
"template": {
"object_update": [
{
"query": {
"$and": [
{
"$0_artifact": "false"
},
{
"$0_mcid": "-1"
}
],
"param": [
"pds_object"
]
},
"statement": "$if"
}
]
}
}
},
{
"name": "artifact_type",
"value": 0
}
]
}
Flatten Form XObjects
flatten_xobject
Flatten Form XObjects
example:
{
"name": "flatten_xobject"
}
Remove Content Marks
remove_content_marks
Remove atrifact, mcid or any custom tag from page content objects
params:
-
object_types
(object) Objects – Define the objects by the template object_update -
flags
(flag) Remove – Types of content marks to be removed- 8 – Invalid Content Marks
- 4 – Custom Content Marks
example:
{
"name": "remove_content_marks",
"params": [
{
"name": "object_types",
"value": ".*"
},
{
"name": "flags",
"value": 8
}
]
}
Conversion
PDF to HTML
pdf_to_html
Convert PDF to HTML
params:
-
input_pdf
(file_path) Input PDF – Input PDF file path -
output_html
(file_path) Ouput HTML file path – Define the output HTML Index file -
html_type
(int) HTML Layout – HTML layout type- 0 – Original layout
- 1 – Responsive layout
- 2 – Layout defined by PDF Tags
-
template
(file_path) Template – Load the template from the file as the current template. If the file is empty, the default template will be used -
preflight
(bool) Preflight – Preflight the document and combine the preflight values with the current template -
flags
(flag) Conversion Flags – Define conversion flags- 1 – Export JavaScripts
- 2 – Export fonts
- 4 – Default font sizes
- 8 – Retain text color
- 32 – Inline CSS
- 64 – Inline JavaScript
- 128 – Embed images
- 256 – Embed fonts
- 512 – Gray padding
example:
{
"name": "pdf_to_html"
}
PDF to JSON
pdf_to_json
Convert PDF to JSON
params:
-
input_pdf
(file_path) Input PDF – Input PDF file path -
output_json
(file_path) Output JSON file path – Define the output JSON file -
flags
(flag) Conversion Flags – Flags specifying the extracted content- 1 – Document Metadata
- 2 – Page Information
- 16 – Page Content
- 32 – Document Structure Tree
- 64 – Layout Recognition PageMap
- 256 – Bounding Box
- 512 – Content Marks
- 4096 – Text Content
- 8192 – Text Style
- 16384 – Text State
- 65536 – Images (base64)
example:
{
"name": "pdf_to_json"
}
Fonts
Embed Fonts
embed_font
Embed font
example:
{
"name": "embed_font"
}
Replace Font
replace_font
Replace font
params:
-
font_name
(string) Font Name – The PDF font name that will replaced. Regular expression is supported -
font_family
(system_font) Font Family – The font family name that will be used for replacement
example:
{
"name": "replace_font",
"params": [
{
"name": "font_name",
"value": ""
},
{
"name": "font_family",
"value": ""
}
]
}
Add Missing Unicodes
add_missing_unicode
Add missing unicode mapping
example:
{
"name": "add_missing_unicode"
}
Metadata
Set Document Properties
set_doc_info
Set document metadata and properties
params:
-
set_author
(bool) Set Author – Set the document author -
author
(string) Author – Author -
set_title
(bool) Set Title – Set the document title -
title
(string) Title – Title -
set_subject
(bool) Set Subject – Set the document subject -
subject
(string) Subject – Subject -
set_keywords
(bool) Set Keywords – Set the document keywords -
keywords
(string) Keywords – Keywords -
set_producer
(bool) Set Producer – Set the document producer -
producer
(string) Producer – Producer name -
set_creator
(bool) Set Creator – Set the document creator -
creator
(string) Creator – Creator
example:
{
"name": "set_doc_info",
"params": [
{
"name": "set_author",
"value": true
},
{
"name": "author",
"value": ""
},
{
"name": "set_title",
"value": true
},
{
"name": "title",
"value": ""
},
{
"name": "set_subject",
"value": true
},
{
"name": "subject",
"value": ""
},
{
"name": "set_keywords",
"value": true
},
{
"name": "keywords",
"value": ""
},
{
"name": "set_producer",
"value": true
},
{
"name": "producer",
"value": ""
},
{
"name": "set_creator",
"value": true
},
{
"name": "creator",
"value": ""
}
]
}
Set PDF Version
set_pdf_version
Set the PDF version
params:
-
version_number
(int) PDF Version – Version designations- 14 – PDF 1.4
- 15 – PDF 1.5
- 16 – PDF 1.6
- 17 – PDF 1.7
- 20 – PDF 2.0
example:
{
"name": "set_pdf_version",
"params": [
{
"name": "version_number",
"value": 17
}
]
}
Set Language
set_language
Define the document language
params:
-
lang
(lang) Language – Document language -
apply_lang_to
(int) Apply Language to – Select the objects to which you intend to apply the defined language- 0 – Document
- 1 – Tag
- 2 – Marked Content
-
overwrite
(bool) Overwrite – Replace the current language if it already exists
example:
{
"name": "set_language",
"params": [
{
"name": "lang",
"value": "en-US"
},
{
"name": "apply_lang_to",
"value": 0
},
{
"name": "overwrite",
"value": false
}
]
}
Set Title
set_title
Set document title
params:
-
title_type
(int) Title – Define a source for detecting the document title- 0 – Define the Custom Title
- 1 – Use the first Description Tag
- 2 – Retrieve the title from the file name
-
custom_text
(string) Custom Title – Custom title -
description_tag
(string) Description Tag – Define the tags which content is used for title text -
overwrite
(bool) Overwrite – Replace the current title if it already exists
example:
{
"name": "set_title",
"params": [
{
"name": "title_type",
"value": 2
},
{
"name": "custom_text",
"value": ""
},
{
"name": "description_tag",
"value": "Caption"
},
{
"name": "overwrite",
"value": false
}
]
}
Pages
Rotate Page
rotate_pages
Rotate pages
params:
-
rotation_type
(int) Rotation Type – Type of rotation- 0 – Set rotation angle
- 1 – Rotate by angle
-
rotation_angle
(int) Rotation Angle – Rotation angle- 0 – 0 degrees
- 90 – 90 degrees
- 180 – 180 degrees
- 270 – 270 degrees
-
page_rotation
(int) Pages With Rotation – Pages with defined rotation- 0 – 0 degrees
- 90 – 90 degrees
- 180 – 180 degrees
- 270 – 270 degrees
-
page_orientation
(int) Pages With Orientation – Pages with defined orientation- 0 – Pages With Any Orientation
- 1 – Landscape Pages
- 2 – Portrait Pages
example:
{
"name": "rotate_pages",
"params": [
{
"name": "rotation_type",
"value": 1
},
{
"name": "rotation_angle",
"value": 0
},
{
"name": "page_rotation",
"value": 0
},
{
"name": "page_orientation",
"value": 0
}
]
}
Table
Set Table Headers
set_table_header
Set table header cells
params:
-
tag_names
(tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update -
scope
(string) Scope – Reflects whether the header cell applies to the rest of the cells in the row, column or both that contain it- None – None
- Row – Row
- Column – Column
- Both – Both
-
row_span
(int) RowSpan – The number of rows in the enclosing table that shall be spanned by the cell. Use -1 to do not change existing RowSpan -
col_span
(int) ColSpan – The number of columns in the enclosing table that shall be spanned by the cell. Use -1 to do not change existing ColSpan -
overwrite
(bool) Overwrite – Replace the current TH properties if they already exist
example:
{
"name": "set_table_header",
"params": [
{
"name": "tag_names",
"value": "^TD$"
},
{
"name": "scope",
"value": "None"
},
{
"name": "row_span",
"value": -1
},
{
"name": "col_span",
"value": -1
},
{
"name": "overwrite",
"value": false
}
]
}
Set Table Summary
set_table_summary
A summary of the table’s purpose and structure. This entry shall only be used within Table tags
params:
-
tag_names
(tag) Tags – Define the tags types using a regular expression format (https://regex101.com/) or define by the template tag_update -
summary_type
(int) Summary – Define a source for detecting the summary- 0 – Define the Custom Summary
- 1 – Use the associated TH content
- 2 – Use the associated tag content
-
custom_text
(string) Custom Summary – Enter custom text to serve as the table summary -
overwrite
(bool) Overwrite – Replace the table summary if it already exists
example:
{
"name": "set_table_summary",
"params": [
{
"name": "tag_names",
"value": "^Table$"
},
{
"name": "summary_type",
"value": 2
},
{
"name": "custom_text",
"value": "Summary"
},
{
"name": "overwrite",
"value": false
}
]
}
Fix Table Tag
fix_table_tag
Fix Table tag errors
params:
tag_names
(tag) Tags – Define the tags by the template tag_update
example:
{
"name": "fix_table_tag",
"params": [
{
"name": "tag_names",
"value": "^Table$"
}
]
}