Batch Commands
no-code PDF processing
PDFix Batch commands is an easy and flexible PDF documents processing with no code driven by a JSON configuration file.
PDFix Batch Actions
Custom PDF actions are engineered to automate the editing of PDFs and resolve various accessibility issues, thereby streamlining the processes of document editing and remediation. This approach enhances efficiency and simplifies tasks, eliminating the need for programming skills. By utilizing a bespoke sequence of actions, it is possible to process PDF documents tailored to specific requirements.
{
"title": "PDFix Batch Action Example",
"desc": "Custom action sequence to re-tag the PDF document and set PDF/UA-1 identifier",
"actions": [
{
"name": "clear_structure",
"params": [
{
"name": "clear_tags",
"value": true
},
{
"name": "clear_struct_tree",
"value": true
},
{
"name": "clear_bookmarks",
"value": false
}
]
},
{
"name": "add_tags",
"params": [
{
"name": "standard_attrs",
"value": false
},
{
"name": "sequential_headings",
"value": true
}
]
},
{
"name": "set_pdf_ua_standard",
"params": [
{
"name": "part_number",
"value": 1
}
]
}
]
}
Index
Accessibility: set_pdf_ua_standard, set_suspect_value, fix_oc_name, set_display_doc_title, remove_standard_tags_mapping, set_language, set_title, add_tags, clear_structure, fix_id_tree, fix_parent_tree
Annotations: fix_media_clip_keys, set_tabs_key, tag_annot, set_annot_contents, flatten_annot, create_web_links, delete_annot
Bookmarks: create_bookmarks
Content: artifact_content, flatten_xobject, remove_content_marks
Conversion: pdf_to_html, pdf_to_json
Fonts: embed_font, replace_font, add_missing_unicode
Metadata: get_doc_info, set_doc_info
Pages: rotate_pages
Tags: delete_tags, apply_standard_tags, set_tag_id, set_tag_bbox, set_alt, fix_placement, remove_tag_data, set_structure_attribute, fix_structure_spaces, fix_headings
Template: preflight, load_template, reset_template
Validation: validation, validation_report
Accessibility
set_pdf_ua_standard
Set PDF/UA Standard – Set the PDF/UA part number
params:
-
part_number
(string) Part Identifier – The part number of the International Standard that the file conforms to.- 1 – PDF/UA-1
- 2 – PDF/UA-2
-
rev_number
(string) Rev Number – Four-digit year of the date of publication or revision. Ignored for part 1.
example:
{
"name": "set_pdf_ua_standard",
"params": [
{
"name": "part_number",
"value": "1"
},
{
"name": "rev_number",
"value": "2023"
}
]
}
set_suspect_value
Set Suspect Value – Fix the document Suspects entry
example:
{
"name": "set_suspect_value"
}
fix_oc_name
Fix Optional Content – Fix optional content configuration dictionary
example:
{
"name": "fix_oc_name"
}
set_display_doc_title
Fix Display Document Title – Fix ViewerPreferences dictionary
example:
{
"name": "set_display_doc_title"
}
remove_standard_tags_mapping
Remove Standard Tags Mapping – Remove standard tags mapping
example:
{
"name": "remove_standard_tags_mapping"
}
set_language
Set Language – Define the document language
params:
-
lang
(lang) Language – Document language -
apply_lang_to
(int) Apply Language to – Select the objects to which you intend to apply the defined language- 0 – Document
- 1 – Tag
- 2 – Marked Content
-
overwrite
(bool) Overwrite – Replace the current language if it already exists
example:
{
"name": "set_language",
"params": [
{
"name": "lang",
"value": "en-US"
},
{
"name": "apply_lang_to",
"value": 0
},
{
"name": "overwrite",
"value": false
}
]
}
set_title
Set Title – Set document title
params:
-
title_type
(int) Title – Define a source for detecting the document title- 0 – Define the Custom Title
- 1 – Use the first Description Tag
- 2 – Retrieve the title from the file name
-
custom_text
(string) Custom Title – Custom title -
description_tag
(string) Description Tag – Define the tags which content is used for title text -
overwrite
(bool) Overwrite – Replace the current title if it already exists
example:
{
"name": "set_title",
"params": [
{
"name": "title_type",
"value": 2
},
{
"name": "custom_text",
"value": ""
},
{
"name": "description_tag",
"value": "Caption"
},
{
"name": "overwrite",
"value": false
}
]
}
add_tags
Autotag – Autotag document
params:
-
standard_attrs
(bool) Add Layout Attributes – Add all detected layout attributes -
sequential_headings
(bool) Sequential Heading Levels – Keep headings in sequentially-descending order
example:
{
"name": "add_tags",
"params": [
{
"name": "standard_attrs",
"value": false
},
{
"name": "sequential_headings",
"value": true
}
]
}
clear_structure
Clear Document Structure – Clear document structure
params:
-
clear_tags
(bool) Clear Content Marks – Clear content marks -
clear_struct_tree
(bool) Clear Structure Tree – Clear structure tree -
clear_bookmarks
(bool) Clear Bookmarks – Clear bookmarks
example:
{
"name": "clear_structure",
"params": [
{
"name": "clear_tags",
"value": true
},
{
"name": "clear_struct_tree",
"value": true
},
{
"name": "clear_bookmarks",
"value": true
}
]
}
fix_id_tree
Fix ID Tree – Fix ID tree
example:
{
"name": "fix_id_tree"
}
fix_parent_tree
Fix Parent Tree – Fix parent tree
example:
{
"name": "fix_parent_tree"
}
Annotations
fix_media_clip_keys
Fix Media Clip – Define a MIME type of the media clip annotations file
params:
-
ct_key
(string) Media Clip – CT key- text/plain – text/plain
- text/html – text/html
- image/jpeg – image/jpeg
- audio/mp3 – audio/mp3
- video/mp4 – video/mp4
example:
{
"name": "fix_media_clip_keys",
"params": [
{
"name": "ct_key",
"value": "text/plain"
}
]
}
set_tabs_key
Set Tab Order – Sets the tab order key for every page
params:
-
tabs_key
(string) Tabs Key – Tabs key -
overwrite
(bool) Overwrite – Replace the current Tabs Key if it already exists
example:
{
"name": "set_tabs_key",
"params": [
{
"name": "tabs_key",
"value": "S"
},
{
"name": "overwrite",
"value": true
}
]
}
tag_annot
Tag Annotations – Tag the untagged annotations
params:
-
annot_types
(string) Annotation Types – Annotation types delimited by comma -
exclude_annot_types
(bool) Exclude – Exclude annotation types if checked -
flags
(flag) Flags – Skip annotations with specific characteristics- 1 – Skip Invisible
- 2 – Skip Hidden
example:
{
"name": "tag_annot",
"params": [
{
"name": "annot_types",
"value": "Popup"
},
{
"name": "exclude_annot_types",
"value": true
},
{
"name": "flags",
"value": 3
}
]
}
set_annot_contents
Set Annotation Contents – Set alternative description for an annotation
params:
-
annot_types
(string) Annotation Types – Annotation types delimited by comma -
exclude_annot_types
(bool) Exclude – Exclude annotation types if checked -
flags
(flag) Flags – Skip annotations with specific characteristics- 1 – Skip Invisible
- 2 – Skip Hidden
-
alt_type
(int) Contents – Define a source for detecting the alternative text- 0 – Define the Custom Contents text
- 1 – Use the text from annotation bounding box
- 2 – Use the action destination
-
custom_text
(string) Custom Contents – Enter custom text to serve as an alternative description -
bbox_padding_x
(float) BBox X padding – Bounding box padding in horizontal direction -
bbox_padding_y
(float) BBox Y padding – Bounding box padding in vertical direction -
overwrite
(bool) Overwrite – Replace the current alternative description if it already exists
example:
{
"name": "set_annot_contents",
"params": [
{
"name": "annot_types",
"value": "Link,Highlight,Underline,Squiggly,StrikeOut"
},
{
"name": "exclude_annot_types",
"value": false
},
{
"name": "flags",
"value": 3
},
{
"name": "alt_type",
"value": 0
},
{
"name": "custom_text",
"value": "Decorative"
},
{
"name": "bbox_padding_x",
"value": 4
},
{
"name": "bbox_padding_y",
"value": 4
},
{
"name": "overwrite",
"value": false
}
]
}
flatten_annot
Flatten Annotations – Flatten the visual representation of annotations directly into the content layer
params:
-
annot_types
(string) Annotation Types – Annotation types delimited by comma -
exclude_annot_types
(bool) Exclude – Exclude annotation types if checked -
flags
(flag) Flags – Skip annotations with specific characteristics- 1 – Skip Invisible
- 2 – Skip Hidden
example:
{
"name": "flatten_annot",
"params": [
{
"name": "annot_types",
"value": "Widget,Popup,Link"
},
{
"name": "exclude_annot_types",
"value": true
},
{
"name": "flags",
"value": 3
}
]
}
create_web_links
Create Web Links – Create link annotations from web links in the page content
params:
url_regex
(string) URL search regex – Regular expression for searching web links
example:
{
"name": "create_web_links",
"params": [
{
"name": "url_regex",
"value": "^(((http(s)?|ftp):\\/\\/)|(mailto:)|www.)[^\\s\\/$.?#].[^\\s]*"
}
]
}
delete_annot
Delete Annotations – Delete annotations from the document
params:
-
annot_types
(string) Annotation Types – Annotation types delimited by comma -
exclude_annot_types
(bool) Exclude – Exclude annotation types if checked -
flags
(flag) Flags – Satisfy the characteristics of the annotation- 1 – Delete Invisible
- 2 – Delete Hidden
- 255 – Delete All
example:
{
"name": "delete_annot",
"params": [
{
"name": "annot_types",
"value": "TrapNet"
},
{
"name": "exclude_annot_types",
"value": false
},
{
"name": "flags",
"value": 255
}
]
}
Bookmarks
create_bookmarks
Create Bookmarks – Create bookmarks from the tag tree hierarchy
params:
-
tag_1
(string) Level 1 – Define the tag that represents top level -
tag_2
(string) Level 2 – Define the tag that represents second level -
tag_3
(string) Level 3 – Define the tag that represents third level -
tag_4
(string) Level 4 – Define the tag that represents fourth level -
tag_5
(string) Level 5 – Define the tag that represents fifth level -
tag_6
(string) Level 6 – Define the tag that represents sixth level -
overwrite
(bool) Overwrite – Replace the current bookmarks it already exist
example:
{
"name": "create_bookmarks",
"params": [
{
"name": "tag_1",
"value": "H1"
},
{
"name": "tag_2",
"value": "H2"
},
{
"name": "tag_3",
"value": "H3"
},
{
"name": "tag_4",
"value": "H4"
},
{
"name": "tag_5",
"value": "H5"
},
{
"name": "tag_6",
"value": "H6"
},
{
"name": "overwrite",
"value": false
}
]
}
Content
artifact_content
Artifact Content – Mark defined content as an artifact
params:
-
artifact_untagged
(bool) Untagged content only – Mark the content as Artifact only if it’s untagged -
artifact_type
(int) Mark as – Mark the content as an Artifact, Header or Footer- 0 – Artifact
- 1 – Header
- 2 – Footer
example:
{
"name": "artifact_content",
"params": [
{
"name": "artifact_untagged",
"value": true
},
{
"name": "artifact_type",
"value": 0
}
]
}
flatten_xobject
Flatten Form XObjects – Flatten Form XObjects
example:
{
"name": "flatten_xobject"
}
remove_content_marks
Remove Content Marks – Remove atrifact, mcid or any custom tag from page content objects
params:
-
flags
(flag) Remove – Types of content marks to be removed- 1 – MCID
- 2 – Artifact
- 4 – Custom Content Marks
- 8 – Invalid Content Marks
- 16 – Artifact, if MCID is presented
- 32 – MCID, if Artifact is presented
example:
{
"name": "remove_content_marks",
"params": [
{
"name": "flags",
"value": 8
}
]
}
Conversion
pdf_to_html
Convert To HTML – Convert PDF to HTML
example:
{
"name": "pdf_to_html"
}
pdf_to_json
Convert To JSON – Convert PDF to JSON
params:
-
output
(folder_path) Output Folder – The destination folder for the output -
struct_tree
(bool) Structure Tree – Structure tree -
page_map
(bool) PageMap Objects – PageMap objects -
page_content
(bool) Page Content Objects – Page content objects -
text
(bool) Export Texts – Export texts -
text_style
(bool) Export Text Styles – Export text styles -
text_state
(bool) Export Text States – Export text states -
images
(bool) Export Images – Export images -
bbox
(bool) Export Bounding Box – Export object bounding box -
graphics_state
(bool) Export Graphics States – Export graphics states -
content_marks
(bool) Export Content Marks – Export object content marks
example:
{
"name": "pdf_to_json",
"params": [
{
"name": "output",
"value": ""
},
{
"name": "struct_tree",
"value": false
},
{
"name": "page_map",
"value": true
},
{
"name": "page_content",
"value": true
},
{
"name": "text",
"value": true
},
{
"name": "text_style",
"value": true
},
{
"name": "text_state",
"value": false
},
{
"name": "images",
"value": false
},
{
"name": "bbox",
"value": false
},
{
"name": "graphics_state",
"value": false
},
{
"name": "content_marks",
"value": false
}
]
}
Fonts
embed_font
Embed Fonts – Embed font
example:
{
"name": "embed_font"
}
replace_font
Replace Font – Replace font
params:
-
font_name
(string) Font Name – The PDF font name that will replaced. Regular expression is supported. -
font_family
(system_font) Font Family – The font family name that will be used for replacement
example:
{
"name": "replace_font",
"params": [
{
"name": "font_name",
"value": ""
},
{
"name": "font_family",
"value": ""
}
]
}
add_missing_unicode
Add Missing Unicodes – Add missing unicode mapping
example:
{
"name": "add_missing_unicode"
}
Metadata
get_doc_info
Get Document Properties – Get document metadata and properties
example:
{
"name": "get_doc_info"
}
set_doc_info
Set Document Properties – Set document metadata and properties
params:
-
set_author
(bool) Set Author – Set the document author -
author
(string) Author – Author -
set_title
(bool) Set Title – Set the document title -
title
(string) Title – Title -
set_subject
(bool) Set Subject – Set the document subject -
subject
(string) Subject – Subject -
set_keywords
(bool) Set Keywords – Set the document keywords -
keywords
(string) Keywords – Keywords -
set_producer
(bool) Set Producer – Set the document producer -
producer
(string) Producer – Producer name -
set_creator
(bool) Set Creator – Set the document creator -
creator
(string) Creator – Creator
example:
{
"name": "set_doc_info",
"params": [
{
"name": "set_author",
"value": true
},
{
"name": "author",
"value": ""
},
{
"name": "set_title",
"value": true
},
{
"name": "title",
"value": ""
},
{
"name": "set_subject",
"value": true
},
{
"name": "subject",
"value": ""
},
{
"name": "set_keywords",
"value": true
},
{
"name": "keywords",
"value": ""
},
{
"name": "set_producer",
"value": true
},
{
"name": "producer",
"value": ""
},
{
"name": "set_creator",
"value": true
},
{
"name": "creator",
"value": ""
}
]
}
Pages
rotate_pages
Rotate Page – Rotate pages
params:
-
rotation_type
(int) Rotation Type – Type of rotation- 0 – Set rotation angle
- 1 – Rotate by angle
-
rotation_angle
(int) Rotation Angle – Rotation angle- 0 – 0 degrees
- 90 – 90 degrees
- 180 – 180 degrees
- 270 – 270 degrees
-
page_rotation
(int) Pages With Rotation – Pages with defined rotation- 0 – 0 degrees
- 90 – 90 degrees
- 180 – 180 degrees
- 270 – 270 degrees
-
page_orientation
(int) Pages With Orientation – Pages with defined orientation- 0 – Pages With Any Orientation
- 1 – Landscape Pages
- 2 – Portrait Pages
example:
{
"name": "rotate_pages",
"params": [
{
"name": "rotation_type",
"value": 1
},
{
"name": "rotation_angle",
"value": 0
},
{
"name": "page_rotation",
"value": 0
},
{
"name": "page_orientation",
"value": 0
}
]
}
Tags
delete_tags
Delete Tags – Delete defined tags
params:
-
tag_names
(string) Tags – Define the tags, separated by commas, that shall be deleted -
exclude_tag_names
(bool) Exclude – If checked, If checked, all tags except those explicitly defined will be deleted -
skip_tag_names
(string) Ignore Tags – Define the tags, separated by commas, that should be skipped -
flags
(flag) Delete – Specify which tags should be deleted- 1 – Empty (no children)
- 2 – Invalid (no associated page content)
- 4 – Even when Actual Text exists
- 8 – Even when Alternate Description exists
- 255 – Delete All
-
tag_content
(string) Tag Content – Handle the content in the deleted tag- none – Leave Untagged
- move – Move to Parent
example:
{
"name": "delete_tags",
"params": [
{
"name": "tag_names",
"value": "H,H1,H2,H3,H4,H5,H6"
},
{
"name": "exclude_tag_names",
"value": true
},
{
"name": "skip_tag_names",
"value": "TH,TD,TR,LI,Lbl,LBody"
},
{
"name": "flags",
"value": 1
},
{
"name": "tag_content",
"value": "none"
}
]
}
apply_standard_tags
Apply Standard Tags – Apply standard tag names according to their role mapping
example:
{
"name": "apply_standard_tags"
}
set_tag_id
Set Tag ID – Generate a unique identifier for the tag
params:
-
tag_names
(string) Tags – Define the tags, separated by commas, from which properties should be removed -
exclude_tag_names
(bool) Exclude – If checked, all tags except those explicitly defined will be processed -
overwrite
(bool) Overwrite – Replace the current Tag ID if it already exists
example:
{
"name": "set_tag_id",
"params": [
{
"name": "tag_names",
"value": "Note,TH"
},
{
"name": "exclude_tag_names",
"value": false
},
{
"name": "overwrite",
"value": false
}
]
}
set_tag_bbox
Set Tag BBox – Calculate the tag bbox from it’s content
params:
-
tag_names
(string) Tags – Define the tags, separated by commas, from which properties should be removed -
exclude_tag_names
(bool) Exclude – If checked, all tags except those explicitly defined will be processed -
overwrite
(bool) Overwrite – Replace the current Tag ID if it already exists
example:
{
"name": "set_tag_bbox",
"params": [
{
"name": "tag_names",
"value": "Figure, Formula, Form, Table"
},
{
"name": "exclude_tag_names",
"value": false
},
{
"name": "overwrite",
"value": false
}
]
}
set_alt
Set Alternate Description – Set alternative description to the tag
params:
-
tag_names
(string) Tags – Define the tags, separated by commas, to set alternative descriptions -
exclude_tag_names
(bool) Exclude – If checked, all tags except those explicitly defined will be processed -
alt_type
(int) Alternative Text – Define a source for detecting the alternative text- 0 – Define the Custom Alternative text
- 1 – Use the first Description Tag above
- 2 – Use the first Description Tag below
- 3 – Use the first Description Tag from children
- 4 – Use the associated tag content
-
custom_text
(string) Custom Alternative – Enter custom text to serve as an alternative description -
description_tag
(string) Description Tag – Define the tags which content is used for alternative text -
overwrite
(bool) Overwrite – Replace the alternative text if it already exists
example:
{
"name": "set_alt",
"params": [
{
"name": "tag_names",
"value": "Figure,Formula"
},
{
"name": "exclude_tag_names",
"value": false
},
{
"name": "alt_type",
"value": 0
},
{
"name": "custom_text",
"value": "Decorative"
},
{
"name": "description_tag",
"value": "Caption"
},
{
"name": "overwrite",
"value": false
}
]
}
fix_placement
Fix Placement – Fix placement
example:
{
"name": "fix_placement"
}
remove_tag_data
Remove Tag Properties – Remove properties from the defined tags
params:
-
tag_names
(string) Tags – Define the tags, separated by commas, from which properties should be removed -
exclude_tag_names
(bool) Exclude – If checked, all tags except those explicitly defined will be processed -
accept_alternate_desc
(bool) Remove Alternate – Remove Alt key -
accept_actual_text
(bool) Remove Actual Text – Remove ActualText key -
accept_id
(bool) Remove ID – Remove ID key -
accept_lang
(bool) Remove Language – Remove Lang key -
accept_title
(bool) Remove Title – Remove T key
example:
{
"name": "remove_tag_data",
"params": [
{
"name": "tag_names",
"value": ""
},
{
"name": "exclude_tag_names",
"value": true
},
{
"name": "accept_alternate_desc",
"value": false
},
{
"name": "accept_actual_text",
"value": false
},
{
"name": "accept_id",
"value": false
},
{
"name": "accept_lang",
"value": false
},
{
"name": "accept_title",
"value": false
}
]
}
set_structure_attribute
Set Tag Attributes – Set the attributes of the tag
params:
-
tag_names
(string) Tags – Define the tags, separated by commas, from which properties should be removed -
exclude_tag_names
(bool) Exclude – If checked, all tags except those explicitly defined will be processed -
overwrite
(bool) Overwrite – Replace the current Tag ID if it already exists -
owner
(string) Owner – Standard attribute owner- Layout – Layout
- List – List
- PrintField – PrintField
- Table – Table
-
name
(string) Name – Attribute name -
value
(string) Value – Arrtibute value- ____ –
-
value_type
(string) Value Type – Attribute value type- string – string
- name – name
- array – array
- number – number
-
overwrite
(bool) Overwrite – Replace the current attribute if it already exists
example:
{
"name": "set_structure_attribute",
"params": [
{
"name": "tag_names",
"value": ""
},
{
"name": "exclude_tag_names",
"value": true
},
{
"name": "overwrite",
"value": false
},
{
"name": "owner",
"value": ""
},
{
"name": "name",
"value": ""
},
{
"name": "value",
"value": ""
},
{
"name": "value_type",
"value": ""
},
{
"name": "overwrite",
"value": "false"
}
]
}
fix_structure_spaces
Fix Spaces – Add missing or resolve duplicate white spaces within a structure element
params:
-
add_missing_spaces
(bool) Add Missing Spaces – Identify words in the structure and add missing spaces -
remove_unnecessary_spaces
(bool) Remove Unnecessary Spaces – Remove duplicite spaces after each word -
artifact_unnecessary_spaces
(bool) Artifact Unnecessary Spaces – Mark duplicite spaces after each word as an Artifact
example:
{
"name": "fix_structure_spaces",
"params": [
{
"name": "add_missing_spaces",
"value": true
},
{
"name": "remove_unnecessary_spaces",
"value": false
},
{
"name": "artifact_unnecessary_spaces",
"value": true
}
]
}
fix_headings
Fix Headings – Assign heading levels properly
params:
-
renumber_headings
(bool) Renumber Headings – Renumber all headings -
change_headings_to
(string) Change Headings to – Change Headings to- H – H
- H1 – H1
- H2 – H2
- H3 – H3
- H4 – H4
- H5 – H5
- H6 – H6
example:
{
"name": "fix_headings",
"params": [
{
"name": "renumber_headings",
"value": true
},
{
"name": "change_headings_to",
"value": "H"
}
]
}
Template
preflight
Preflight – Preflight document
params:
merge_preflight
(bool) Merge – Combine the preflight settings with the existing ones
example:
{
"name": "preflight",
"params": [
{
"name": "merge_preflight",
"value": false
}
]
}
load_template
Load Template – Load template from file and merge with the current template
params:
template
(file_path) Template – Template file path
example:
{
"name": "load_template",
"params": [
{
"name": "template",
"value": ""
}
]
}
reset_template
Reset Template – Reset template
example:
{
"name": "reset_template"
}
Validation
validation
Validation – Validation
params:
profile
(validation_profile) Profile – Validation profile
example:
{
"name": "validation",
"params": [
{
"name": "profile",
"value": "PDFUA-1"
}
]
}
validation_report
Validation Report – Save the validation report into a file
params:
-
output
(folder_path) Output Folder – Choose a folder to save the validation report, or save it in the same folder as the source file if no folder is specified. -
profile
(validation_profile) Profile – Validation profile -
file_format
(string) File Format – Validation report format- html – html
- xml – xml
- json – json
- text – text
- raw – raw
example:
{
"name": "validation_report",
"params": [
{
"name": "output",
"value": ""
},
{
"name": "profile",
"value": "PDFUA-1"
},
{
"name": "file_format",
"value": "html"
}
]
}