Files
oci_vision_invoice/README.md
2025-07-21 20:15:39 -03:00

5.7 KiB

📄 Automatic Invoice Processing with OCI Vision and OCI Generative AI

🧠 Objective

This tutorial demonstrates how to implement an automated pipeline that monitors a bucket in Oracle Cloud Infrastructure (OCI) for incoming invoice images, extracts textual content using OCI Vision, and then applies OCI Generative AI (LLM) to extract structured fiscal data like invoice number, customer, and item list.


🚀 Use Cases

  • Automating invoice ingestion from Object Storage.
  • Extracting structured data from semi-structured scanned documents.
  • Integrating OCR and LLM in real-time pipelines using OCI AI services.

🧱 Oracle Cloud Services Used

Service Purpose
OCI Vision Performs OCR (Optical Character Recognition) on uploaded invoice images.
OCI Generative AI Extracts structured JSON data from raw OCR text using few-shot prompts.
Object Storage Stores input invoice images and output JSON results.

⚙️ Prerequisites

  1. An OCI account with access to:
    • Vision AI
    • Generative AI
    • Object Storage
  2. A Python 3.10 at least
  3. A bucket for input images (e.g., input-bucket) and another for output files (e.g., output-bucket).
  4. A config with:
    {
      "oci_profile": "DEFAULT",
      "namespace": "your_namespace",
      "input_bucket": "input-bucket",
      "output_bucket": "output-bucket",
      "compartment_id": "ocid1.compartment.oc1..xxxx",
      "llm_endpoint": "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
    }
    

🛠️ How to Run

  1. Execute the requirements.txt with:

    pip install -r requirements.txt

  2. Run the Python script main.py.

  3. Upload invoice images (e.g., .png, .jpg) to your input bucket.

  4. Wait for the image to be processed and the extracted JSON saved in the output bucket.


🧩 Code Walkthrough

1. Load Configuration

with open("./config", "r") as f:
    config_data = json.load(f)

Loads all required configuration values such as namespace, bucket names, compartment ID, and LLM endpoint.


2. Initialize OCI Clients

oci_config = oci.config.from_file("~/.oci/config", PROFILE)
object_storage = oci.object_storage.ObjectStorageClient(oci_config)
ai_vision_client = oci.ai_vision.AIServiceVisionClient(oci_config)

Sets up the OCI SDK clients to access Object Storage and AI Vision services.


3. Initialize LLM

llm = ChatOCIGenAI(
    model_id="meta.llama-3.1-405b-instruct",
    service_endpoint=LLM_ENDPOINT,
    compartment_id=COMPARTMENT_ID,
    auth_profile=PROFILE,
    model_kwargs={"temperature": 0.7, "top_p": 0.75, "max_tokens": 2000},
)

Initializes the OCI Generative AI model for natural language understanding and text-to-structure conversion.


4. Few-shot Prompt

few_shot_examples = [ ... ]
instruction = """
You are a fiscal data extractor.
...
"""

Uses few-shot learning by providing an example of expected output so the model learns how to extract structured fields like number of invoice, customer, location, and items.


5. OCR with OCI Vision

def perform_ocr(file_name):
    ...

This function:

  • Sends the image to OCI Vision.
  • Requests text detection.
  • Returns the extracted raw text.

6. Data Extraction with LLM

def extract_data_with_llm(ocr_text, file_name):
    ...

This function:

  • Combines instructions + few-shot example + OCR text.
  • Sends it to OCI Generative AI.
  • Receives structured JSON fields (as string).

7. Save Output to Object Storage

def save_output(result, file_name):
    ...

Uploads the structured result into the output bucket using the original filename (with .json extension).


8. Main Loop: Monitor and Process

def monitor_bucket():
    ...

Main routine that:

  • Monitors the input bucket every 30 seconds.
  • Detects new .png, .jpg, .jpeg files.
  • Runs OCR + LLM + Upload in sequence.
  • Keeps track of already processed files in memory.

9. Entry Point

if __name__ == "__main__":
    monitor_bucket()

Starts the bucket watcher and begins processing invoices automatically.


Expected Output

For each uploaded invoice image:

  • A corresponding .json file is generated with structured content like:
{
  "file": "nota123.png",
  "result": "{ "nf": "NF102030", "customer": "Comercial ABC Ltda", ... }",
  "timestamp": "2025-07-21T12:34:56.789Z"
}

🧪 Testing Suggestions

  • Use real or dummy invoices with legible product lines and emitente.
  • Upload multiple images in sequence to see automated processing.
  • Log into OCI Console > Object Storage to verify results in both buckets.

📌 Notes

  • OCI Vision supports Portuguese OCR (language="POR" can be used instead of "ENG").
  • LLM prompt can be adjusted to extract other fields like CNPJ, quantidade, data de emissão, etc.
  • Consider persisting processed_files with a database or file to make the process fault-tolerant.

📚 References

Acknowledgments

  • Author - Cristiano Hoshikawa (Oracle LAD A-Team Solution Engineer)