Use case

Extract data from unstructured PDF document into XML format.


Download the original PDF document


The SDK provides two options for integrating into your project using a Command Line Utility or programatically.

Click here to create your free trial license key.

Command Line Utility (CLI)

PDFix provides simple and fast automated PDF processing using a command line. PDFix Command Line Utility is the easiest way to integrate the SDK functionality into your solution available for Windows, MacOS and Linux. Learn more about the Command Line Utility.

$ cd /pdfix_mac/bin
$ ./pdfix_app 3bE31NaixzFE58ir -pdf2xml /Users/admin/Documents/input.pdf output.xml


Processing pages...1/1

This command converts PDF into XML. Output is the file where the generated XML file will be saved.

Code Integration

This code sample shows how to extract data from a PDF document and output it into XML. Code integration into your project allows you to take full control of the PDF data processing:


Using the PDFix CLI or the provided code snippet, we get the XML output:

PDFix Extract Data from PDF - Getting Started

Click to expand

Customizing the output

PDFix SDK allows customization of the output by using configuration files that affect table detection process and the output structure. To learn more about the configuration files please follow the Documentation. When using the SDK programatically there are no limits to fit the output your needs.

Contact us if you need help with integration.