Mittwoch, 17. Januar 2024

Leveraging OCI Document Understanding AI for Document Processing

Leveraging AI for Document Processing with the Document AI Scanner

Oracle's new Document Understanding AI provides a powerful way to extract insights from scanned documents using artificial intelligence. Built on Oracle APEX and integrating with the Oracle Cloud Infrastructure Document Understanding AI service, this open source tool that I developed streamlines the intake and analysis of document collections.

I was first introduced to Document AI's capabilities and REST API through Jon Dixon's insightful blog Quickly Turn Text into Data with APEX & Document AI

In his blog, Jon provides a comprehensive overview of Document AI and demonstrates how to call the AnalyzeDocument endpoint directly from an APEX application. 

What it Does


The Document AI Scanner allows users to upload multiple PDF or image files for processing as a batch. It then passes the documents to the Document Understanding AI service which performs text extraction, table and field detection, document classification and more.

Once analysis is complete, users can view detailed results within the app or download them as a ZIP file. Key-value pairs and other extracted data can also be exported directly to an accounting database for further processing.

Key Capabilities

  • Full text extraction and searchability of uploaded documents
  • Identification of tables and structured fields
  • Classification of document types like invoices or forms
  • Language detection to support proper NLS settings
  • Long running jobs that don't time out
  • Downloadable analysis files for record keeping

Behind the Scenes

The app leverages algorithms to process the raw AI output. Using techniques like relative positioning and data type validation, it forms key-value pairs from text fragments.

My client needed to process German bills, but the text recognition was limited to the ASCII character set. This meant German umlauts and "sz" were missing from extracted text, and date/number formats were misinterpreted. To address this, I developed my own text interpreter by hand-crafting rules to handle these language-specific cases and produce better key-value pairs from the documents.

  • The absence of German characters in the OCR output made it hard to match names and addresses within the address table. A special index had to be built to support matching despite missing characters like umlauts and ß.
  • Dates, numbers and currencies are formatted differently in German compared to English. It took work to parse these values accurately based on local conventions.
  • German abbreviations and acronyms that may be used in bills needed to be researched and accounted for in the interpreter rules.
  • Idiomatic phrases, terminology and jargon specific to billing/invoicing in German posed challenges to understand and interpret correctly.
  • Ambiguous or incomplete text fragments from OCR output limited the amount of context available to accurately interpret meanings. Additional heuristics may have been needed.
  • Ensuring the interpreter worked robustly across a variety different bill/invoice styles, templates and edge cases from various issuing companies.
  • Debugging and testing the complex set of hand-crafted rules required considerable effort to refine the interpreter to a useful level of accuracy.
  • Integrating the custom interpreter with the existing application architecture and output formats required careful programming.
It was a challenge to develop the customized interpreter within the limited capabilities of the general Document AI service. My hope is that Oracle continues expanding the service to support more languages out of the box. Alternatively, the ability to train custom models is valuable, but requires significant labeled data - which I did not have for this exact task. Overall the app demonstrates how artificial intelligence can be leveraged through a combination of commercial and custom solutions.

The Document AI Scanner application showcases how technical challenges with language and interpretation can be overcome through a combination of commercial and custom solutions. While developing the bespoke text interpreter involved effort, it allowed valuable insights to be extracted from an otherwise limited data set.

For those interested in exploring the application further or contributing enhancements, the source code is available on GitHub at:

https://github.com/dstrack/Document-AI-Scanner

Please check out the repo for full implementation details, installation instructions, and how to get involved with ongoing development. I hope this project helps advance capabilities for automated document processing globally.