Use case
Extract data from unstructured PDF document into XML format.
Resources
Download the original PDF document
Integration
The SDK provides two options for integrating into your project using a Command Line Utility or programatically.
Click here to create your free trial license key.
Command Line Utility (CLI)
PDFix provides simple and fast automated PDF processing using a command line. PDFix Command Line Utility is the easiest way to integrate the SDK functionality into your solution available for Windows, MacOS and Linux. Learn more about the Command Line Utility.
$ cd /pdfix_mac/bin
$ ./pdfix_app support@pdfix.net 3bE31NaixzFE58ir -pdf2xml /Users/admin/Documents/input.pdf output.xml
Output:
PDF to XML
Processing pages...1/1
Success
This command converts PDF into XML. Output is the file where the generated XML file will be saved.
Code Integration
This code sample shows how to extract data from a PDF document and output it into XML. Code integration into your project allows you to take full control of the PDF data processing:
Result
Using the PDFix CLI or the provided code snippet, we get the XML output:
Customizing the output
PDFix SDK allows customization of the output by using configuration files that affect table detection process and the output structure. To learn more about the configuration files please follow the Documentation. When using the SDK programatically there are no limits to fit the output your needs.
Contact us if you need help with integration.