a graphic of figure standing next to square labeled pdfix desktop with pdf auto-tagging inviting to join live pdfix webinar

Auto-Tagging Methods in PDFix Desktop

Join us to see how PDFix Desktop brings intelligent automation to PDF accessibility — no coding required. Explore four powerful auto-tagging modes, from instant tagging to AI-generated templates for complex layouts. Watch the live demo, download sample files, and leave us a question or comment during the session.

Key Takeaways

During the webinar, we explored the main auto-tagging methods in PDFix Desktop that help make PDFs accessible efficiently and accurately:

  1. Basic Auto-Tagging
  2. Auto-Detected Layout Template (Preflight)
  3. AI-Generated Layout Template
  4. Pre-Defined Layout Template
  5. We selected one of our test files for this demonstration. You can find it on our GitHub under the repository Weekly Market Commentary Sample

These PDF auto-tagging techniques empower users – especially non-developers – to create accessible, standards-compliant PDFs faster while improving accuracy and workflow automation.

Webinar Source: Auto-Tagging PDFs with PDFix Desktop

Frequently Asked Questions

How do I see past PDFix webinars?

You can find all our past and upcoming webinars in the Webinar category on pdfix.net. More recorded sessions are also available on our YouTube channel: Team PDFix

Will we be able to try what is explained during the session?

Absolutely! You can follow along and experiment using the same materials. All the documents and JSON template files are available for download on our GitHub: Weekly Market Sample

Can we achieve similar auto-tagging in the SDK by providing an external layout? What structure of this layout and method in SDK to call?

Yep, everything you see in this webinar is replicable in the SDK. See a separate webinar for auto-tagging with PDFix SDK.

How is the Layout Template created?

Templates can be created with the Preflight function in PDFix Desktop, using an AI model, or manually. Template examples are available on GitHub.

Is there a template functionality that can define the layout of one page and apply it to every page (for example, a three-column layout)?

Yes, it’s possible. If you need help creating one, just contact us — we can assist with multi-page and repeating layouts.

By using AI, are the templates automatically created?

Yes, that’s correct. Templates can be generated automatically using AI. PDFix Desktop then applies them to fix and enhance the tagging structure.

Are we going to need credentials to use the AI model?

Some AI models require credentials, some don’t. PDFix Desktop offers a free AI layout models such as Paddle.

What if I don’t have any credentials — is there a default PDFix AI model I can buy?

PDFix Desktop offers a free AI layout models such as Paddle which does not require credentials.

Are we going to need credentials to use the AI model?

Yes. Each model has its own credentials or is free, depending on licensing. If you have your own, we can integrate it into the PDFix pipeline.

Can you train the AI for better tagging results?

Yep! You can train and prepare your own model, then integrate it easily into PDFix.

Do you plan to support other AI identification services like Microsoft Azure?

We continuously work on imtegration of new LLM models into PDFix Desktop, including Microsoft Azure. Please check for PDFix Marketplace updates.

Is there anything that can be done with a scanned, hand-written document?

Yes! We offer an external OCR action that can be applied before auto-tagging. It automatically adds an OCR text layer to scanned PDF files. Learn more here: OCR Tesseract Action

I have PDFs with tables and LLMs struggle to interpret them. Can PDFix help preprocess these for easier querying?

Yes, we have an external Table Summary mode that improves table readability for LLMs: Generate Table Summary with OpenAI

What about heavy math documents? Is there an OpenAI-only solution for MathML generation?

We support both OpenAI and Paddle for MathML detection.

How about PDF forms — can AI auto-tag and create descriptions?

Tagging PDF Forms is challenging, but possible. Each form field is properly tagged based on PDF/UA standard. The form field descriptions can be auto-generated from field names or tooltips.

I’m working with complex PDF layouts — multi-column pages, images, graphics, and split tables. Does auto-tagging handle this, or is it best for simple documents?

Manually created templates can handle auto-tagging of complex layouts. If you need help creating one, just contact us — we can assist with complex layouts.

Is the validation done in PDFix compliant with PAC validation?

Not completely. PDFix relies on the open-source veraPDF tool for PDF/UA validation. You can learn more here: PDF Accessibility Validators

Does PDFix help fix common errors in PDFs exported from InDesign — like unnesting figure and table tags from paragraph tags?

Yep! Check our related blog and webinar here: How to automate fixes in InDesign created PDFs

I saw “AI Alt Text” listed in the process — can it write Alt Text straight into tags?

Yes, exactly! You can use the free BLIP or paid OpenAI model for automatic Alternate Text generation.

How frequently do you update the software?

Constantly. External actions are updated whenever new versions are released. PDFix SDK and Desktop are updated at least quarterly — or more often if needed.

Is it possible to use Podman instead of Docker Desktop (e.g. to improve performance/resource usage)?

Yes, it is possible to use Podman instead of Docker Desktop for running external actions integrated with PDFix Desktop.
PDFix Desktop supports actions that can be distributed through the PDFix Marketplace or installed manually using an action configuration file (JSON).
Each action configuration file defines:

  • Action metadata — such as name, category, and subtype, enabling integration into specific workflows (e.g., template creation or tag editing).
  • Program execution pattern — including the command-line call and its arguments.
  • Argument definitions — allowing customization of how the action runs and interacts with PDFix Desktop.

Because the action system executes external programs through command-line calls, Podman can be used in the same way as Docker Desktop or any other CLI-based container runtime.
Example: Executing the Action with Docker

docker run -v $(pwd):/data -w /data --rm pdfix/autotag-textract:latest \
  tag --aws_id ${AWS_ID} --aws_secret ${AWS_SECRET} --aws_region ${AWS_REGION} \
  -i /data/input.pdf -o /data/output.pdf

Example: Executing the Same Action with Podman

podman run -v $(pwd):/data -w /data --rm pdfix/autotag-textract:latest \
tag --aws_id ${AWS_ID} --aws_secret ${AWS_SECRET} --aws_region ${AWS_REGION} \
-i /data/input.pdf -o /data/output.pdf

Since Podman provides a Docker-compatible command-line interface, no additional configuration changes are required in the PDFix action definition.
Simply replace docker with podman in the execution command. For detailed guidance on creating or installing custom actions, please contact us.

Where do I find AWS keys?

  • Create or sign in to your AWS account (console).
  • Create an IAM user (or use an existing one) and enable Programmatic access so it can get access keys. When you create the access key pair, save the secret — AWS shows the secret only once. AWS Documentation
  • Attach Textract permissions to that user/role. For testing you can attach the managed policy AmazonTextractFullAccess; for production prefer least-privilege (grant only the textract:* actions you need). AWS Documentation
  • Create the access key for that IAM user (Access Key ID + Secret Access Key) via the IAM → Users → Security credentials → Create access key UI. Store those credentials securely (see storage below). AWS Documentation

In which cases would you recommend Paddle instead of Tesseract?

Paddle is currently supported only for layout recognition in auto-tagging and template workflows. It does not include OCR functionality.
At the moment, Tesseract is the only OCR engine supported in the PDFix Marketplace. Please check for future updates that may add OCR support for Paddle or other engines.


Leave us a Question or Comment

Posted

in

,

Tags: