Tables are a common structuring element with high information density in many documents, such as PDF files.
Automatically extracting information contained in tables and storing them in structured machine-readable form is crucial in many applications. Table recognition and extraction is a very challenging problem.
With PDFix it is now possible to convert tables from PDF files into Excel and then manipulate the data as required. With our PDF Content Extraction technology we are able to decompose tables in PDF files and store the extracted data in a structured formats as CSV.
How do we recognise tables in PDF
|Detect tables borders|
|Detect table colums and rows|
|Extract tables into CSV output|
The quality of the CSV or XML output depends on the original PDF layout. Our online engine uses only general configuration file which should be ok for majority of cases. Don´t hesitate to contact us let we can setup a custom configuration to improve the results if you are not satisfied with the online sample output.