Generating accessible PDFs at scale just got faster and smarter. The latest release from PDFix introduces a Dockerized solution for automated PDF tagging, purpose-built for high-volume document workflows. This powerful container combines the PDFix SDK with advanced AI layout recognition model enabling precise detection of tables, figures, and formulas — even in scanned or visually complex documents.
Designed for accessibility engineers, document automation teams, and organizations focused on PDF/UA compliance, this all-in-one environment simplifies the process of creating fully tagged, standards-compliant PDFs with minimal setup.
PDF Auto-Tagging, Templates, and MathML
Our Dockerized solution, which uses the PaddlePaddle AI model and the PDFix engine, is packaged with advanced support for formulas, tables, and visual structures – and offers a complete, automated pipeline for PDF/UA compliance. Here’s what it includes:
- Auto-Tag PDFs with Paddle
Automatically applies semantic tags to PDFs using Paddle AI model and PDFix layout engine
- Create Layout Templates Automatically
Generates a layout template (JSON) from PDF using Paddle
- Generate MathML from Image Files
Converts formulas from image content into valid MathML, outputting results as XML files
- Attach MathML to Formula Tags (Paddle)
Detects formulas in PDFs, uses Paddle to create MathML, and embeds it as an associated file for each formula tag

AI-Powered Auto-Tagging with PaddlePaddle in PDFix
Accurate layout detection is essential for high-quality PDF auto-tagging and full PDF/UA compliance. While many engines rely on internal structure – which often fails on scanned PDFs or visually complex documents – PDFix now supports an external AI-based alternative.
With its latest release, PDFix integrates the PaddlePaddle AI model as an additional layout recognition option. Paddle analyzes the rendered PDF page visually, identifying headings, tables, formulas, and other structures much like a human reader. It then auto-generates a layout template to drive precise semantic tagging.
We found this approach especially effective for:
- Financial reports with complex tables
- Academic papers, Math Textbooks with formulas and figures
- Scanned or OCRed PDFs
Paddle AI + PDFix Desktop: Next-Level Auto-Tagging
Easily use the built-in PDFix Auto-Tag Engine via intuitive icons in PDFix Desktop, or try the integrated Paddle Auto-Tag option from the menu – compare the results and choose the engine that best understands your documents.
Fine-Tune PDF Layout Detection with Confidence Thresholds
The AI model in PDFix now supports threshold-based layout detection, allowing you to set class-specific confidence levels for elements like tables, forms, and figures. This helps include only high-confidence results while filtering out noise – ideal for documents with complex or table-heavy layouts that require precise PDF auto-tagging.

Deploy and Scale PDF Auto-Tagging with Docker, PDFix SDK, and AI Models
This solution runs in a self-contained Docker image, making it easy to deploy across platforms with no complex setup. It’s fully integrated into PDFix Desktop, accessible directly via the toolbar icon, or it can be embedded into your automated document processing workflow using the PDFix SDK – ideal for building scalable document remediation pipelines.
Build Your Own Layout Detection Workflow with Us
The built-in PaddlePaddle AI model serves as a pluggable layout detection engine for PDF auto-tagging – demonstrating how AI model can be integrated directly into PDFix workflow. If you already use a different AI model or have a preferred engine, let us know – we can integrate it into PDFix as a custom external action tailored to your needs.
To explore more integrations, visit the PDFix Marketplace, where we regularly release new external actions. We’re currently working on support for additional AI models, including Amazon Textract and olmOCR, to further enhance advanced layout detection capabilities – so stay tuned for updates.









