Use case

Extract data from unstructured PDF document into XML format.

Resources

Download the original PDF document

Integration

The SDK provides two options for integrating into your project using a Command Line Utility or programatically.

Click here to create your free trial license key.

Command Line Utility (CLI)

PDFix provides simple and fast automated PDF processing using a command line. PDFix Command Line Utility is the easiest way to integrate the SDK functionality into your solution available for Windows, MacOS and Linux. Learn more about the Command Line Utility.

$ cd /pdfix_mac/bin
$ ./pdfix_app support@pdfix.net 3bE31NaixzFE58ir -pdf2xml /Users/admin/Documents/input.pdf output.xml

Output:

PDF to XML
Processing pages...1/1
Success

This command converts PDF into XML. Output is the file where the generated XML file will be saved.

Code Integration

This code sample shows how to extract data from a PDF document and output it into XML. Code integration into your project allows you to take full control of the PDF data processing:

Result

Using the PDFix CLI or the provided code snippet, we get the XML output:

PDFix Extract Data from PDF - Getting Started

Click to expand
http://pdfix.net/wp-content/uploads/2019/06/pdfix-data-extract-into-xml-use-case-demo.xml

Customizing the output

PDFix SDK allows customization of the output by using configuration files that affect table detection process and the output structure. To learn more about the configuration files please follow the Documentation. When using the SDK programatically there are no limits to fit the output your needs.

Contact us if you need help with integration.