Save hours on PDF tagging with AI-powered automation
Discover how PDFix SDK simplifies PDF accessibility workflows through intelligent auto-tagging. From basic automation to AI-driven layout templates, learn how to achieve accurate, consistent tagging across any document type. Watch our live demo and explore practical use cases, best practices, and downloadable examples on GitHub.
Key Takeaways
- During the webinar, we explored the main approaches to PDF auto-tagging with PDFix SDK:
- Basic Auto-Tagging without Layout Template – achieve a basic accessibility pass across mixed or high-volume PDFs
- Auto-Tagging with Preflight – Auto-Generated Layout Template – improve tagging accuracy with automated analysis
- Auto-Tagging with an AI-Generated Layout Template – use AI models for intelligent, adaptive tagging
- Auto-Tagging with a Pre-Defined Layout Template – ensure consistent tag structures for repetitive or print-ready document
- Test files for this demonstration : Get them on GitHub under the repository Weekly_Market_Commentary.
Webinar Source: Auto-Tagging PDFs with PDFix SDK
Frequently Asked Questions
Among the supported AI models (Amazon Textract, Docling, Paddle), which one gives the most accurate results for typical business reports?
There isn’t one universal “best” AI model – accuracy depends on the document type and layout. All supported models (Amazon Textract, Docling, and Paddle) perform well in layout recognition, table detection, and heading structure identification. However, none consistently outperforms the others across all business documents.
We recommend testing each model on your specific document set to see which fits best. If you find an AI model that performs exceptionally well, contact us – we can guide you on how to integrate it and convert its output into a compatible PDFix Layout Template.
Is it possible to connect a custom AI model trained on our own documents?
Yes. You can integrate your own AI layout model into the PDFix workflow.
Each layout action in the PDFix Marketplace links to our open-source Docker implementation on GitHub. Using this reference, you can implement your own AI model and gain competitive technical or business advantages.
Can this AI-based workflow run offline in Docker, or does it require an internet connection?
It depends on the AI model you choose. Cloud-based models such as Amazon Textract require an internet connection.
However, many models – including Docling or other locally deployed AI models — can run fully offline inside a Docker container.
Keep in mind that some of these models are large (e.g., 4–5 GB) and require local disk space, but once set up, they work entirely offline.
How can I combine AI model template with custom template?
You can freely modify or merge AI-generated templates with your own.
The AI template is a standard JSON file that you can edit to:
- Add or adjust element properties (for example, marking an element as a heading)
- Modify tagging rules
- Apply heuristics or logic to improve structure recognition
You can do this manually or programmatically – for instance, by adding custom properties such as alternate text, role mappings, or structural adjustments to enhance accessibility and accuracy.
How difficult is it to create our own JSON layout template from scratch? Is there a visual editor or helper tool?
Creating templates manually is possible but not necessary — PDFix Desktop includes a visual template editor.
With PDFix Desktop, you can:
- Visually design and test layout templates
- Tag elements directly on the page
- Export the JSON template for SDK automation
For a step-by-step demonstration, see our PDFix Layout Templates webinar, linked in this video’s description and on our webinar page.
Is there a way to combine auto-tagging and PDF/UA validation (for example with veraPDF) directly inside the SDK workflow?
Yes. PDFix SDK supports integrating PDF/UA validation (e.g., with veraPDF) directly in your automated workflow.
You can perform auto-tagging, apply fixes for accessibility issues, and then validate results – all programmatically.
This topic will be covered in an upcoming webinar, and you can also refer to our earlier webinars on automated validation and fixing accessibility issues with PDFix SDK.
With PDFix SDK, can we add alternate text with a custom template without using AI?
Absolutely. You can assign alternate text (alt text) to any element directly in your template – without AI.
For example:
- Mark an element as a figure and define its
alternate_textproperty. - Identify images based on page number, position (bounding box), or object ID.
- Use template functions to automatically assign descriptive alt text to each image.
This allows complete control over accessibility tagging within a purely rule-based (non-AI) template workflow.
What would a recommended workflow look like for generating accessible PDF reports – from upload to validation?
A typical end-to-end workflow looks like this:
- Document Intake
Detect whether documents already include tags or are PDF/UA-compliant - Validation Step
Run a compliance check (e.g., with veraPDF) to determine the tagging quality - Auto-Tagging & Accessibility Fixes
Use PDFix SDK to perform layout recognition, auto-tagging, and automated fixing of accessibility issues. - Re-Validation
Validate the processed files to confirm PDF/UA conformance. - Manual Review (if needed)
Send any remaining files for manual remediation to ensure full accessibility compliance.
This hybrid automation-plus-review approach ensures accuracy and reliability across large document volumes.









