PDFix Command Line Interface (CLI)

Fast integration

of PDFix functionality into your workflows

PDFix command-line interface is easy to integrate not only for developers. It allows automating your processes through scripts. CLI provides many benefits like scalability, productivity, stability, simplicity and compatibility.

PDFix SDK CLI

Usage: ./pdfix_app [OPTIONS] [SUBCOMMAND]

Options:

  -h,--help                   Print this help message and exit
  --help-all-md               Show all help in MD format
  -v,--version                

Subcommands

name description
batch Run commands in a batch. The document is not saved to the output path if a command does not modify it.
make-accessible Makes PDF Accessible. Converts PDF to fully compliant PDF/UA.If you have image-only PDF, please use OCR command before.
add-tags Add tags to PDF.
extract-data Extract PDF data into JSON/XML
pdf2table Extracts tables detected in the PDF into CSV files.Output should point to the folder where separate CSV files will be saved.
pdf2txt Extract text from PDF
pdf2image Extract images from PDF
extract-highlighted-text Extract highlighted text from PDF
pdf2html Converts PDF to HTML , output is the HTML file created duringconversion. All necessary files generated during the conversion aresaved in the same folder as the output file.
preflight Preflight document template and output the config
ocr Converts scans or images-only PDF documents into searchable, editable PDF files.
import-data Import form data from JSON
acroform2json Extract PDF Form Fields into JSON
json2acroform Import PDF Form Fields from JSON
render-page Render Page
digital-signature Sign PDF
add-comment Add comment to PDF
remove-comments Remove comments from PDF
flatten Flatten all annotations into the PDF content.
embedfonts Embeds fonts into PDF
watermark Add watermark to PDF
parse-pds-objects Tag operations on PDF
create-document Create new PDF
create-page Create new page in PDF
move-page Move one page in document
delete-pages Delete pages from PDF
insert-pages Insert pages from PDF to another
document-metadata Get and set document metadata as XML
license License related commands
pages2json Convert PDF Pages into JSON
bmk2json Convert PDF Bookmarks into JSON
tags2json StructTree to JSON
content2json Page Content to JSON
dests2json Extract Named Destivations into JSON
test test commands
test-open-document test open document commands
test-incremental-save test incremental save
create-documents Create new PDF files
render-pages Render Pages
redact Redact content using all redaction annotations.
pocess-control test commands
undo-redo undo-redo test command
tag Tag operations on PDF
remove-security Add tags to PDF.
test-edit-page-object-mcid Edit page object mcid.

[Option Group: ]

Internal commands

Options:

  -m,--email TEXT             Registration e-mail address
  -k,--key TEXT               License key
  --settings-path TEXT        PDFix SDK settings
  -i,--input TEXT             Input file
  -o,--output TEXT            Output file or replace input file if not set

batch

Run commands in a batch. The document is not saved to the output path if a command does not modify it.

Options:

  -p,--password TEXT          Open password
  -c,--command TEXT           Batch command JSON file
  --progress                  Print out the batch commands progress output if set

make-accessible

Makes PDF Accessible. Converts PDF to fully compliant PDF/UA.If you have image-only PDF, please use OCR command before.

Options:

  --password TEXT             Open password
  -c,--command-path TEXT      Command file path. Default make accessible command will be used if it's empty.

add-tags

Add tags to PDF.

Options:

  --password TEXT             Open password
  -c,--config-path TEXT       Config file path.
  --preflight                 Preflight document template before processing

extract-data

Extract PDF data into JSON/XML

Options:

  --password TEXT             Open password
  -c,--config_path TEXT:FILE  Config file path
  --preflight                 Preflight document template before processing
  -f,--format ENUM:{0,1}      integer value defining the data file output format (0-JSON, 1-XML)
  -p,--page-number INT        the page number from which to extract data, default -1 extracts from all pages
  --doc-info                  extract document general information (metadata, num pages, etc.)
  --doc-outlines              extract document outlines (bookmarks)
  --doc-acroform              extract document forms (AcroForm)
  --doc-struct-tree           extract document structure tree (tags)
  --page-info                 extract page general information (number, crop box, rotation)
  --page-content              extract page objects (raw data)
  --page-map                  scrape page data (logical content extraction)
  --page-annots               extract page annotstions
  --text                      extract page map text elements or content text objects
  --tables                    extract page map tables structure
  --images                    extract page map image elements or page contnet images
  --bbox                      extract element or object bbox
  --text-style                extract text style of text element
  --text-state                extract text state of text object or element
  --graphic-state             extract page object's graphic state

pdf2table

Extracts tables detected in the PDF into CSV files.Output should point to the folder where separate CSV files will be saved.

pdf2txt

Extract text from PDF

Options:

  -c,--config_path TEXT:FILE  Config file path
  -p,--page INT               Page number from which text will be extracted (Default value -1 extract all pages

pdf2image

Extract images from PDF

Options:

  -w,--page-width INT         with of the rendered page in pixels used for scaling the images
  -f,--format ENUM:{1,2}      integer value defining the image output format (0-PNG, 1-JPG)
  -q,--quality INT:INT in [0 - 100]
image quality. For JPG means the compression level otherwise it’s ignored

extract-highlighted-text

Extract highlighted text from PDF

Options:

  -c,--config-path TEXT       path to config file

pdf2html

Converts PDF to HTML , output is the HTML file created duringconversion. All necessary files generated during the conversion aresaved in the same folder as the output file.

Options:

  --password TEXT             Open password
  -c,--config-path TEXT       path to config file.
  -w,--page-width INT         Page width
  -a,--append-html TEXT       Append HTML code from file
  --preflight                 Preflight document template before processing
  --export-js                 exports document JavaScript into HTML.
  --text-size                 retain original text size in created HTML.
  --text-color                page number from which image will be created.
  --no-external               use inline css, js and embeded images and fonts.
  --no-external-css           use inline css instead of the external file.
  --no-external-js            use inline javascript instead of the external file.
  --no-external-img           use embedded based encoded images.
  --no-external-font          use embedded based encoded fonts.
  --gray-background           use gray background and page padding.
  --no-page-render            do not render page.
  --responsive                creates responsive HTML, creates fixed layout if not set.
  --derivation                creates HTML derived from PDF tags.
  --export-fonts Needs: --responsive
exports embedded TrueType fonts into HTML using CSS3.
  --format ENUM:{1,2}         integer value defining the image output format (0-PNG, 1-JPG)
  --quality INT:INT in [0 - 100]
integer value defining the image output quality (0-100)
  -j,--js                     
  -s,--css                    
  -d,--doc                    
  -p,--page INT               

preflight

Preflight document template and output the config

Options:

  -f,--format ENUM:{0,1}      integer value defining the data file output format (0-JSON, 1-XML)

ocr

Converts scans or images-only PDF documents into searchable, editable PDF files.

Options:

  -l,--lang TEXT              OCR language
  -d,--data-path TEXT:DIR     path to Tesseract ORC data.

import-data

Import form data from JSON

Options:

  --password TEXT             Open password
  -j,--json-path TEXT:FILE    Path to JSON file
  -f,--flatten                Flatten PDF

acroform2json

Extract PDF Form Fields into JSON

Options:

  --password TEXT             Open password
  --widgets                   include information about the form field annotations - widgets.

json2acroform

Import PDF Form Fields from JSON

Options:

  -j,--json-path TEXT:FILE    Path to JSON file

render-page

Render Page

Options:

  -f,--format ENUM:{1,2}      integer value defining the image output format (0-PNG, 1-JPG)
  -r,--rotate ENUM:{0,90,180,270}
page rotation in degrees
  --password TEXT             Open password
  -p,--page-number INT        page number from which image will be created.
  -l,--left INT               integer value specifying the page left of the clipping region in device units
  -t,--top INT                integer value specifying the page top of the clipping region in device units
  -w,--width INT              integer value specifying the width of the page left clipping region in device units
  -g,--height INT             integer value specifying the height of the page left clipping region in device units
  -q,--quality INT:INT in [0 - 100]
integer value defining the image output quality (0-100)
  -z,--zoom FLOAT             floating point number of zoom level

digital-signature

Sign PDF

Options:

  -x,--pfx-path TEXT:FILE     Path to .pfx file with signature.
  -p,--pfx-password TEXT      Password for .pfx file.

add-comment

Add comment to PDF

remove-comments

Remove comments from PDF

flatten

Flatten all annotations into the PDF content.

embedfonts

Embeds fonts into PDF

watermark

Add watermark to PDF

Options:

  -m,--image-path TEXT:FILE   Path to image file used for watermark
  -s,--start-page INT         First page where the watermark is placed [0].
  -e,--end-page INT           Last page where the watermark is placed [last page].
  --order-top INT:NUMBER      Control watermark z-order (0-bottom, [1]-top)
  --percentage                Use percentage values instead of points
  --h-align ENUM:{1,2,3,6}:NUMBER
Horizontal alignment ([1]-left, 2-right, 3-justify, 6-center)
  --v-align ENUM:{4,5,6}:NUMBER
Vertical alignment ([4]-top, 5-bottom, 6-center)
  --h-value FLOAT:NUMBER      Horizontal image offset [0]
  --v-value FLOAT:NUMBER      Vertical image offset [0]
  --scale FLOAT:NUMBER        Image scale [1]
  --rotation FLOAT:NUMBER     Image counter-clockwise rotation in degrees [0]
  --opacity FLOAT:NUMBER      Image opacity [1]

parse-pds-objects

Tag operations on PDF

Options:

  --password TEXT             Open password

create-document

Create new PDF

create-page

Create new page in PDF

Options:

  -p,--after-page INT         page number after which the new page will be inserted.

move-page

Move one page in document

Options:

  -f,--from INT               page number of the page to move
  -t,--to INT                 new location of the page to move.

delete-pages

Delete pages from PDF

Options:

  -f,--from INT               page number of first page to delete.
  -t,--to INT                 page number of the last page to delete.

insert-pages

Insert pages from PDF to another

Options:

  -s,--src TEXT:FILE          Source file
  -a,--after INT              Page number after which pages will be inserted
  -f,--from INT               Page number of first page to insert
  -t,--to INT                 Page number of the last page to insert

document-metadata

Get and set document metadata as XML

Options:

  -x,--xml-path TEXT          Path to .xml file with metadata.

license

License related commands

Options:

  -a,--activate TEXT          activate license using the key online or offline if --license-path is set, or request activation if --request-path is set
  --request                   create activation request, --license-path should be set to write request to, used only in combination with --activate
  --license-path TEXT         path to a license file
  -d,--deactivate             deactivate license online, for offline deactivation --license-path should be set
  -u,--update                 update license online, for offline update --license-path should be set
  -s,--status                 print license status
  -r,--reset                  reset local license

pages2json

Convert PDF Pages into JSON

Options:

  --password TEXT             Open password
  -p,--page-number INT        page number, [0] default all pages)
  --text                      exports page text

bmk2json

Convert PDF Bookmarks into JSON

Options:

  --password TEXT             Open password

tags2json

StructTree to JSON

content2json

Page Content to JSON

Options:

  -p,--page-number INT        page number.

dests2json

Extract Named Destivations into JSON

Options:

  --password TEXT             Open password

test

test commands

Options:

  -m,--image-path TEXT        Path to image file used for watermark

test-open-document

test open document commands

test-incremental-save

test incremental save

create-documents

Create new PDF files

Positionals:

  count UINT                  Document count
  thread-count UINT           Thread count

Options:

  -c,--count UINT             Document count
  -t,--thread-count UINT      Thread count

render-pages

Render Pages

Options:

  -f,--format ENUM:{1,2}      integer value defining the image output format (0-PNG, 1-JPG)
  -r,--rotate ENUM:{0,90,180,270}
page rotation in degrees
  --page-from INT             page number from which rendering will be exectuted
  --page-to INT               page number to which rendering will be exectuted
  -l,--left INT               integer value specifying the page left of the clipping region in device units
  -t,--top INT                integer value specifying the page top of the clipping region in device units
  -w,--width INT              integer value specifying the width of the page left clipping region in device units
  -g,--height INT             integer value specifying the height of the page left clipping region in device units
  -q,--quality INT:INT in [0 - 100]
integer value defining the image output quality (0-100)
  -z,--zoom FLOAT             floating point number of zoom level
  --thread-count UINT         maximal number of threads to be used

redact

Redact content using all redaction annotations.

Options:

  -p,--page-number INT        page number where redaction mark will be created.
  -l,--left INT               integer value specifying the top of the redaction mark on page
  -b,--bottom INT             integer value specifying the bottom of the redaction mark on page
  -w,--width INT              integer value specifying the width of the redaction mark  on page
  -g,--height INT             integer value specifying the height of the redaction mark  on page

pocess-control

test commands

undo-redo

undo-redo test command

tag

Tag operations on PDF

Options:

  -r,--remove                 
  -a,--annotation             
  -f,--artefact               
  -g,--heading                
  -d,--reading-order          
  -s,--read-struct-tree       
  -e,--edit-struct-tree       
  -t,--table-as-figure        

remove-security

Add tags to PDF.

Options:

  --password TEXT             Open password

test-edit-page-object-mcid

Edit page object mcid.