The problem this solves
Many organizations need to make large volumes of PDFs accessible but face two challenges that local AI tools cannot address alone.
- First, scanned PDFs – documents that exist only as image data with no text layer – require cloud-grade OCR to extract structure reliably.
- Second, enterprises processing tens of thousands of documents annually cannot rely on local compute capacity and need a scalable, cloud-native solution with no hardware investment. Amazon Textract solves both.

What this action does
Auto-Tag PDF (Amazon Textract) integrates AWS’s Amazon Textract – Amazon Web Services’ cloud-based OCR and document intelligence service – directly into PDFix Desktop and PDFix SDK. Textract analyzes each page of a PDF using AWS infrastructure, extracting text, reading order, tables, headings, and form elements. PDFix then converts that structural output into a complete PDF accessibility tag tree, producing a document that meets PDF/UA (ISO 14289) and WCAG 2.1 compliance requirements.
Because processing runs entirely on AWS infrastructure, there are no local compute requirements. The action scales to any document volume without hardware constraints and is billed on a pay-as-you-go basis per page processed through AWS.
It ships as two variants inside PDFix Desktop:
- Auto-Tag (Textract): Sends the PDF to Amazon Textract, receives the structural analysis, and applies a full accessibility tag structure to the document automatically.
- Create Layout Template JSON (Textract): Uses Textract’s structural analysis to generate a reusable layout template JSON file rather than tagging immediately. Use this when you need to review and standardize the detected structure before applying it to a batch of similarly formatted documents.
Pricing: Requires an AWS account. Amazon Textract charges per page processed. New AWS accounts include limited Free Tier access for initial testing and evaluation.
Layout templates – reuse your tagging
After processing a document with Amazon Textract, you can generate a layout template JSON file that captures the detected document structure as a reusable set of rules. Apply the same template to hundreds or thousands of similarly formatted PDFs – standardizing tag structure across an entire document library and eliminating repeated Textract calls for documents with identical layouts.
This is particularly valuable for organizations processing recurring document series such as monthly reports, regulatory filings, or standardized form collections.
Action
Paid · Cloud | Auto-Tag (Textract) | Automatically tags PDF using Amazon Textract. |
Paid · Cloud | Create Layout Template JSON (Textract) | Automatically creates layout template json using Amazon Textract, saving it as JSON file. |
Frequently Asked Questions
What is Amazon Textract?
Amazon Textract is a machine learning service developed by Amazon Web Services that automatically extracts text, tables, forms, and document structure from PDFs and images. Unlike basic OCR, Textract identifies the logical structure of a document – understanding which text is a heading, which is a table cell, and what the reading order should be. PDFix converts this structural output into a conforming PDF accessibility tag tree.
Does this action work on scanned PDFs?
Yes. This is Textract’s primary advantage over local auto-tagging models. Amazon Textract processes both native digital PDFs and scanned image-only documents, applying AWS’s cloud OCR models to extract text and structure even from low-quality or photographed pages. For scanned document archives that local tools struggle to process reliably, Textract is the recommended auto-tagging action.
How much does it cost?
Amazon Textract pricing is based on the number of pages processed and the features used. Pricing varies by AWS region and document type. New AWS accounts receive limited Free Tier access each month for initial testing. For production volume pricing, refer to the Amazon Textract pricing page on AWS. There is no additional PDFix fee for using this action beyond standard PDFix Desktop or SDK licensing.
Are my documents sent to Amazon’s servers?
Yes. This is a cloud-based action. Documents are transmitted to Amazon Web Services infrastructure in your selected AWS region for processing. If your documents contain sensitive or regulated data, review AWS’s data handling and compliance policies before use. For workflows where data cannot leave your local environment, use the Auto-Tag PDF (Docling IBM) action instead, which runs entirely on your machine.
Which accessibility standards does this action support?
The AutoTag (Textract) action produces tag structures that support PDF/UA-1 (ISO 14289-1) and WCAG 2.1 Success Criteria 1.3.1 and 1.3.2. After processing, use the VeraPDF Validation action to confirm compliance across all output files.
What is the difference between Auto-Tag (Textract) and Auto-Tag PDF (Docling IBM)?
Both actions auto-tag PDFs for accessibility but are designed for different scenarios.
Docling IBM is 100% free, open-source, and runs entirely on your local machine – no data leaves your environment and there is no per-page cost. It is the right choice for most native digital PDFs, privacy-sensitive documents, and workflows where cost control is a priority.
Amazon Textract is a paid cloud service that excels where Docling is limited: scanned or image-based PDFs, documents with degraded quality, and high-volume enterprise environments where cloud scalability and zero local compute are requirements. For organizations already using AWS infrastructure, Textract also integrates naturally into existing cloud workflows. When in doubt, test both actions on a sample document and compare output quality for your specific document type.

Paid · Cloud







