PDFix SDK
PDF Accessibility Automation API
Auto-tag, remediate, validate and convert PDFs programmatically. 4 AI engines. 100% local processing. One command from untagged PDF to certified PDF/UA.
Windows · macOS · Linux · Free Trial

PDF/UA
uA-1, UA-2, WCAG 2.2, Section 508, ADA, EAA
100 %
Local processing – on-premises
5 Languages
Python, Java, C++, .NET, JS
90%
Reduction in document processing time
WHAT PDFIX SDK DOES
PDF accessibility automation, data extraction and HTML conversion – one API
A high-performance C++ PDF library that covers three enterprise workflows via unified API or CLI.
No external dependencies required.
PDF accessibility automation PDF/UA & WCAG
Auto-tag PDFs for full PDF/UA and WCAG compliance. Fix structure, headings, Alt text, tables and reading order programmatically at any scale.
PDF data extraction – text, structure, tables and metadata
Extract structured data from PDFs for analytics, AI/ML pipelines and document processing systems. Output to JSON, XML, CSV.
PDF to HTML conversion – 3 rendering methods
Convert PDFs to HTML for web publishing, responsive rendering and accessible web content. PDF forms become HTML forms.
AUTO-TAGGING IN PDFIX SDK
Auto-tag PDFs – 4 workflow methods × 4 AI engines
Two decisions to make: which workflow method fits your document and volume, and which AI engine you want to use for AI-generated templates. Both decisions are independent – start with the method, then choose the engine.
How to auto-tag PDFs – choose a method
BASIC
NO TEMPLATE
Fastest results, zero setup. SDK detects text, reading order, tables and lists automatically.
✓ Zero setup
PREFLIGHT
AUTO-GENERATED TEMPLATE
Preflight analyzes the document and auto-generates a layout template – better heading and footer detection.
✓ Best balance
✓ Automated analysis
AI-GENERATED
INTELLIGENT TEMPLATE
AI model generates a JSON template that guides PDFix SDK. Choose from 4 AI engines below.
✓ Most intelligent
✓ Adaptive
PRE-DEFINED
JSON TEMPLATE
Define structure once for your document type – invoices, statements, reports. Apply across thousands of files.
✓ Consistent results
✓ Enterprise-grade
Choose your AI Tagging Engine – for AI-generated template
IBM Docling
LOCAL · FREE
On-prem · European enterprise
Data never leaves your network. Runs via Docker. Ideal for finance, healthcare and government where data sovereignty is required.
Headings · Paragraphs · Tables · Images · Multi-column
Paddle AI
LOCAL · FREE
Highest layout accuracy · Formulas
Best for complex layouts, scientific and academic documents. Handles mathematical formulas. Runs locally via Docker.
Headings · Tables · Formulas · Math · Lists
Amazon Textract
CLOUD · PAID AWS
AWS-native · Scanned PDFs · OCR
Best for scanned PDFs and documents with inconsistent structure. AWS-native customers and hybrid cloud workflows – connects via AWS credentials.
OCR · Scanned PDFs · Tables · Forms · Handwriting
OpenAI
CLOUD · API KEY
Best AI Alt text · MathML · Tables
Best results for Alt text generation, table summaries and MathML for formulas. Enterprise-grade AI understanding of images and complex tables in context.
AltText · MathML · Table summaries · Image context
COMPLIANCE AUTOMATION PIPELINE
From untagged PDF to certified PDF/UA – automated compliance pipeline
PDFix SDK covers the complete PDF/UA compliance workflow end-to-end – from raw input to validated, compliant output – in a single automated pipeline.
Input PDF
Untagged or partially tagged
Auto-tag
Docling · Paddle · Textract · OpenAI
Make Accessible
One-command fix for PDF/UA issues
Preflight
SDK structure and template check
veraPDF
ISO 14289-1 validation
Accessible PDFs
PDF/UA compliant output
Validation Capabilities
veraPDF
ISO 14289-1 PDF/UA-1 compliance. 136 Matterhorn Protocol failure conditions. Machine-readable JSON/XML output for CI/CD. veraPDF guide →
Duplicate MCID
Detects duplicate Marked Content Identifiers – critical for screen reader compatibility. Catches issues that veraPDF alone may not surface.
Arlington PDF Model
Catches PDF structural errors and syntax violations – specification-derived validation beyond standard PDF/UA rules.
SDK Preflight
Built-in structure analysis and template validation via CLI. Validates document structure before running full veraPDF compliance check.
Disclaimer: PDFix provides tools and technology to assist in making your documents accessible, but we do not guarantee 100% document accessibility. Achieving full compliance requires human checks and intervention. Please note that PDFix is a technology provider, not a service provider. The responsibility for document compliance rests with the user.








