Logical content extraction and conversion

Logical content extraction and conversion

Logical Content Extraction

This is where the magic happens. Make your PDF’s accessible.

If you’ve ever tried to get any data from PDF files, you know how painful it is. There is no easy way how to do that. What looks like an image is not a image. You are not able to copy texts in a right reading order, what looks like a table is just a bunch of isolated elements as lines, rectangles and texts

Document Structure Recognition
Intelligent Data Extraction
Text paragraphs Detection
Images, Graphics Extraction
Annotation Extraction
Reading Order Detection
White space Detection
Table Detection (including cells & rows)
Text Table Detection (including cells & rows)
Header/Footer Detection
Table of Contents Detection
Regular Expression, Pattern Matching
AcroForm Reading Order Detection (Coming Soon)
Chart Detection (Coming Soon)

Access all logical elements

After successful processing you have access to all logical elements. You can search texts, you can save all images, you can export table values into your database or you can use exported elements for conversions – to HTML, JSON, Word, Excel, etc.

Search in texts
Export images
Export tables
Export any data you want in structured, usable formats
Convert to HTML
Convert to JSON
Convert to Word, Excel
PDF/UA Accessible PDFs
Are you interested?
If you want to get answers and information about our products and services, or to discuss your subscription, get in touch with us.
DOWNLOAD THE SDK >>
CONTACT US >>
REQUEST QUOTE >>