Files
rfp_response_automation/README_COMPLETE_TUTORIAL.md
2026-02-18 20:34:33 -03:00

296 lines
6.7 KiB
Markdown

# 🧠 Oracle GraphRAG RFP AI -- Complete Tutorial
Enterprise-grade deterministic RFP validation engine built with:
- Oracle Autonomous Database 23ai
- Oracle Property Graph
- OCI Generative AI (LLMs + Embeddings)
- FAISS Vector Search
- Flask REST API
- Hybrid Graph + Vector + JSON reasoning
------------------------------------------------------------------------
# 📌 Introduction
This project implements a **deterministic RFP validation engine**.
Unlike traditional RAG systems that generate conceptual answers, this
solution is designed to:
- Validate contractual and compliance requirements
- Produce only: YES / NO / PARTIAL
- Provide exact documentary evidence
- Eliminate hallucination risk
- Ensure full traceability
This tutorial walks through the full architecture and implementation.
------------------------------------------------------------------------
# 🏗️ Full Architecture
PDF Documents
└─► Semantic Chunking
├─► FAISS Vector Index
├─► LLM Triple Extraction
│ └─► Oracle 23ai Property Graph
│ ├─► Structured JSON Node Properties
│ ├─► Edge Confidence Weights
│ └─► Evidence Table
└─► Hybrid Retrieval Layer
├─► Vector Recall
├─► Graph Filtering
├─► Oracle Text
└─► Graph-aware Reranking
└─► Deterministic LLM Decision
└─► REST Response
------------------------------------------------------------------------
# 🧩 Step 1 -- Environment Setup
You need:
- Oracle Autonomous Database 23ai
- OCI Generative AI enabled
- Python 3.10+
- FAISS installed
- Oracle Python driver (`oracledb`)
Install dependencies:
pip install oracledb langchain faiss-cpu flask pypandoc
------------------------------------------------------------------------
# 📄 Step 2 -- PDF Ingestion
- Load PDFs
- Perform semantic chunking
- Normalize headings and tables
- Store chunk metadata including:
- chunk_hash
- source_url
Chunks feed both:
- FAISS
- Graph extraction
------------------------------------------------------------------------
# 🧠 Step 3 -- Triple Extraction (Graph Creation)
The function:
create_knowledge_graph(chunks)
Uses LLM to extract ONLY explicit relationships:
SERVICE -[SUPPORTS_CAPABILITY]-> CAPABILITY
SERVICE -[DOES_NOT_SUPPORT]-> CAPABILITY
SERVICE -[HAS_LIMITATION]-> LIMITATION
SERVICE -[HAS_SLA]-> SLA_VALUE
No inference allowed.
------------------------------------------------------------------------
# 🏛️ Step 4 -- Oracle Property Graph Setup
Graph is created automatically:
CREATE PROPERTY GRAPH GRAPH_NAME
VERTEX TABLES (...)
EDGE TABLES (...)
Nodes are stored in:
KG_NODES_GRAPH_NAME
Edges in:
KG_EDGES_GRAPH_NAME
Evidence in:
KG_EVIDENCE_GRAPH_NAME
------------------------------------------------------------------------
# 🧩 Step 5 -- Structured Node Properties (Important)
Each node includes structured JSON properties.
Default structure:
``` json
{
"metadata": {
"created_by": "RFP_AI_V2",
"version": "2.0",
"created_at": "UTC_TIMESTAMP"
},
"analysis": {
"confidence_score": null,
"source": "DOCUMENT_RAG",
"extraction_method": "LLM_TRIPLE_EXTRACTION"
},
"governance": {
"validated": false,
"review_required": false
}
}
```
Implementation:
``` python
def build_default_node_properties():
return {
"metadata": {
"created_by": "RFP_AI_V2",
"version": "2.0",
"created_at": datetime.utcnow().isoformat()
},
"analysis": {
"confidence_score": None,
"source": "DOCUMENT_RAG",
"extraction_method": "LLM_TRIPLE_EXTRACTION"
},
"governance": {
"validated": False,
"review_required": False
}
}
```
This guarantees:
- No empty `{}` stored
- Auditability
- Governance extension capability
- Enterprise extensibility
------------------------------------------------------------------------
# 🔎 Step 6 -- Hybrid Retrieval Strategy
The system combines:
1. FAISS semantic recall
2. Graph filtering via Oracle Text
3. Graph-aware reranking
4. Deterministic LLM evaluation
This ensures:
- High recall
- High precision
- No hallucination
------------------------------------------------------------------------
# 🎯 Step 7 -- RFP Requirement Parsing
Each question becomes structured:
``` json
{
"requirement_type": "NON_FUNCTIONAL",
"subject": "authentication",
"expected_value": "MFA",
"keywords": ["authentication", "mfa"]
}
```
This structure guides retrieval and evaluation.
------------------------------------------------------------------------
# 📊 Step 8 -- Deterministic Decision Engine
LLM output format:
``` json
{
"answer": "YES | NO | PARTIAL",
"confidence": "HIGH | MEDIUM | LOW",
"justification": "Short factual explanation",
"evidence": [
{
"quote": "Exact document text",
"source": "Document reference"
}
]
}
```
Rules:
- If not explicitly stated → NO
- No inference
- Must provide documentary evidence
------------------------------------------------------------------------
# 🌐 Step 9 -- Running the Application
Run preprocessing once:
python graphrag_rerank.py
Run web UI:
python app.py
Open:
http://localhost:8100
Or use REST:
curl -X POST http://localhost:8100/chat -H "Content-Type: application/json" -d '{"question": "Does the platform support MFA?"}'
------------------------------------------------------------------------
# 🧪 Example RFP Questions
Security, SLA, Performance, Compliance, Vendor Lock-in, Backup,
Governance.
The engine validates each with deterministic logic.
------------------------------------------------------------------------
# 🔐 Design Principles
- Evidence-first
- Deterministic outputs
- Zero hallucination tolerance
- Enterprise auditability
- Structured graph reasoning
------------------------------------------------------------------------
# 🚀 Future Extensions
- Confidence scoring via graph density
- Weighted edge scoring
- SLA numeric comparison engine
- JSON-based filtering
- PGQL advanced reasoning
- Enterprise governance workflows
------------------------------------------------------------------------
# 📌 Conclusion
Oracle GraphRAG RFP AI is not a chatbot.
It is a compliance validation engine built for enterprise RFP
automation, legal due diligence, and procurement decision support.
Deterministic. Traceable. Expandable.