Auto-tag PDFs Effortlessly with PDFix Templates PART 1

Auto-tag PDFs Effortlessly with PDFix Templates – Part 1

Leave us a Question or Comment

Learn how to auto-tag PDFs using intelligent content recognition and pre-designed templates saving you hours of work while ensuring PDF accessibility compliance. In our recent webinar, we delved into the powerful features of PDFix Desktop, focusing on auto-tagging and template creation to streamline PDF accessibility. Here’s a breakdown of what we covered and how these tools can revolutionize your document remediation process.

” An automatic creation of accessible PDF/UA from already existing PDF can be done with the help of templates. In this way, rules are worked through, such as a text in 24 points size is H1 and a text in 18 points is H2. You can specify certain patterns for lists or orient yourself to positioning. In the sum of rules, the content of a page can be recognized precisely, and this works particularly well with forms such as invoices. However, this also applies to Office documents that are based on templates. PDFix shows in a free webinar how such rules can be defined and applied.” – Michael Karbe, Managing Director Actino Software

Key Takeaways

Here is what you’ll learn:

  • Introduction to PDFix Templates
    • Explore content recognition engine for pre-designed templates
    • Discover the best use cases for its application
  • Configuring Template Settings
    • Live demonstration of template creation in PDFix Desktop
    • Learn to set up templates using pre-defined queries and properties for elements such as:
      • Fixed-positioned elements on a page
      • Headings
      • Headers and footers
      • Artifacts
      • Image alternative text
  • Automated Batch Tagging with Templates
    • Efficiently auto-tag multiple PDFs using pre-defined templates

Introducing PDFix Desktop: Automation at Its Best

PDFix Desktop is designed to simplify PDF accessibility through automation. By leveraging advanced intelligent layout recognition, the tool eliminates manual errors, saves time, and ensures consistency across documents.

Key Features Highlighted in the Webinar

  1. Auto-Tagging with Intelligent Layout Recognition
    • PDFix Desktop uses a bottom-up approach to identify and tag elements like text, images, tables, and lists. This process ensures accurate reading order and logical structure, which are critical for accessibility.
  2. Template Creation for High-Volume Remediation
    • Templates allow you to automate the remediation of documents with similar layouts. Once created, templates can be reused, saving significant time and effort.

How Templates Work in PDFix Desktop

Templates are one of the most requested features among PDFix users. Here’s how they work:

  1. Object Identification
    • Templates use queries to identify objects (e.g., words, lines, images) based on properties like position, font size, or content. These objects can then be assigned specific roles, such as headings, lists, or artifacts.
  2. Customizable Rules
    • You can define rules to handle complex layouts, such as tables or multi-column text. For example, if a table is incorrectly recognized, you can adjust the template to ensure proper tagging.
  3. Reusability
    • Once a template is created, it can be saved and applied to other documents with similar layouts. This is especially useful for businesses processing large volumes of standardized documents, like invoices or reports.

Why Reading Order Matters in PDF Accessibility

Reading order is the backbone of an accessible PDF. It determines how assistive technologies like screen readers navigate and interpret the content. If the reading order is incorrect, users may encounter a jumbled or illogical flow of information, rendering the document inaccessible.

Customizing Reading Order in PDFix Desktop

PDFix Desktop offers three methods to define reading order:

  1. Default Reading Order (Value 0):
    • This built-in method uses internal logic to detect reading order. While it works for many documents, it may not be perfect for complex layouts.
  2. Original Content Position (Value 1):
    • This method retains the reading order based on how the PDF was originally generated. However, this often leads to incorrect reading orders, as seen in our webinar demo.
  3. XY Reading Order (Value 2):
    • This method arranges content from the top-left corner to the bottom-right corner, mimicking how humans naturally read. It’s ideal for documents with consistent layouts.

Leveraging Templates for High-Volume PDF Remediation

Templates are a game-changer for automating PDF accessibility. They allow you to define rules for tagging and structuring documents, which can then be reused across multiple files with similar layouts.

Key Features of PDFix Desktop Templates

  1. Object Identification: Templates use queries to identify objects (e.g., text, images, tables) based on properties like position, font size, or content.
  2. Customizable Rules: You can define rules to handle complex layouts, such as multi-column text or tables without visible borders.
  3. Reusability: Once a template is created, it can be saved and applied to other documents, saving time and ensuring consistency.

Creating a Template for an Invoice Document

In the webinar, we walked through the process of creating a template for an invoice document. Here’s how we did it:

  1. Correcting Reading Order:
    • We used the XY Reading Order method to ensure the logo, invoice address, and table were tagged in the correct sequence.
  2. Handling Paragraphs and Tables:
    • By adjusting the text line distance and table detection settings, we ensured that paragraphs and tables were accurately identified and tagged.
  3. Artifact Unnecessary Content:
    • We marked footers and repeated content as artifacts to exclude them from the reading order.
  4. Adding Alternate Text for Images:
    • We used the tag image function to assign alternate text to the company logo, ensuring it is accessible to screen readers.
  5. Defining Headings:
    • By specifying font name and size, we automatically tagged headings as H1.

Auto-Tagging Multiple PDFs with a Single Template

One of the most powerful features of PDFix Desktop is its ability to auto-tag multiple documents using a single template. During the webinar, we demonstrated how:

  1. Save the Template:
    • The template was saved as a JSON file, making it easy to reuse and modify.
  2. Apply the Template to Multiple Files:
    • We selected a batch of invoice documents and applied the template, achieving consistent tagging across all files.

This feature is particularly useful for businesses that generate high volumes of standardized documents, such as invoices, reports, or contracts.

When to Use Templates: Key Considerations

While templates are incredibly powerful, they’re not a one-size-fits-all solution. Here are some factors to consider when deciding whether to create a template:

  1. Document Complexity:
    • Templates are ideal for documents with consistent layouts. For highly variable documents, manual remediation may be more efficient.
  2. Volume of Documents:
    • If you’re processing hundreds or thousands of similar documents, investing time in creating a template can save significant effort in the long run.
  3. Specific Use Cases:
    • Templates are particularly useful for addressing recurring issues, such as incorrect table tagging or missing alternate text.

Get started with PDFix Desktop today and experience the power of automation in PDF remediation.