mirror of
https://github.com/hoshikawa2/rfp_response_automation.git
synced 2026-03-06 02:10:41 +00:00
296 lines
6.7 KiB
Markdown
296 lines
6.7 KiB
Markdown
# 🧠 Oracle GraphRAG RFP AI -- Complete Tutorial
|
|
|
|
Enterprise-grade deterministic RFP validation engine built with:
|
|
|
|
- Oracle Autonomous Database 23ai
|
|
- Oracle Property Graph
|
|
- OCI Generative AI (LLMs + Embeddings)
|
|
- FAISS Vector Search
|
|
- Flask REST API
|
|
- Hybrid Graph + Vector + JSON reasoning
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 📌 Introduction
|
|
|
|
This project implements a **deterministic RFP validation engine**.
|
|
|
|
Unlike traditional RAG systems that generate conceptual answers, this
|
|
solution is designed to:
|
|
|
|
- Validate contractual and compliance requirements
|
|
- Produce only: YES / NO / PARTIAL
|
|
- Provide exact documentary evidence
|
|
- Eliminate hallucination risk
|
|
- Ensure full traceability
|
|
|
|
This tutorial walks through the full architecture and implementation.
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🏗️ Full Architecture
|
|
|
|
PDF Documents
|
|
└─► Semantic Chunking
|
|
├─► FAISS Vector Index
|
|
├─► LLM Triple Extraction
|
|
│ └─► Oracle 23ai Property Graph
|
|
│ ├─► Structured JSON Node Properties
|
|
│ ├─► Edge Confidence Weights
|
|
│ └─► Evidence Table
|
|
└─► Hybrid Retrieval Layer
|
|
├─► Vector Recall
|
|
├─► Graph Filtering
|
|
├─► Oracle Text
|
|
└─► Graph-aware Reranking
|
|
└─► Deterministic LLM Decision
|
|
└─► REST Response
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🧩 Step 1 -- Environment Setup
|
|
|
|
You need:
|
|
|
|
- Oracle Autonomous Database 23ai
|
|
- OCI Generative AI enabled
|
|
- Python 3.10+
|
|
- FAISS installed
|
|
- Oracle Python driver (`oracledb`)
|
|
|
|
Install dependencies:
|
|
|
|
pip install oracledb langchain faiss-cpu flask pypandoc
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 📄 Step 2 -- PDF Ingestion
|
|
|
|
- Load PDFs
|
|
- Perform semantic chunking
|
|
- Normalize headings and tables
|
|
- Store chunk metadata including:
|
|
- chunk_hash
|
|
- source_url
|
|
|
|
Chunks feed both:
|
|
|
|
- FAISS
|
|
- Graph extraction
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🧠 Step 3 -- Triple Extraction (Graph Creation)
|
|
|
|
The function:
|
|
|
|
create_knowledge_graph(chunks)
|
|
|
|
Uses LLM to extract ONLY explicit relationships:
|
|
|
|
SERVICE -[SUPPORTS_CAPABILITY]-> CAPABILITY
|
|
SERVICE -[DOES_NOT_SUPPORT]-> CAPABILITY
|
|
SERVICE -[HAS_LIMITATION]-> LIMITATION
|
|
SERVICE -[HAS_SLA]-> SLA_VALUE
|
|
|
|
No inference allowed.
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🏛️ Step 4 -- Oracle Property Graph Setup
|
|
|
|
Graph is created automatically:
|
|
|
|
CREATE PROPERTY GRAPH GRAPH_NAME
|
|
VERTEX TABLES (...)
|
|
EDGE TABLES (...)
|
|
|
|
Nodes are stored in:
|
|
|
|
KG_NODES_GRAPH_NAME
|
|
|
|
Edges in:
|
|
|
|
KG_EDGES_GRAPH_NAME
|
|
|
|
Evidence in:
|
|
|
|
KG_EVIDENCE_GRAPH_NAME
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🧩 Step 5 -- Structured Node Properties (Important)
|
|
|
|
Each node includes structured JSON properties.
|
|
|
|
Default structure:
|
|
|
|
``` json
|
|
{
|
|
"metadata": {
|
|
"created_by": "RFP_AI_V2",
|
|
"version": "2.0",
|
|
"created_at": "UTC_TIMESTAMP"
|
|
},
|
|
"analysis": {
|
|
"confidence_score": null,
|
|
"source": "DOCUMENT_RAG",
|
|
"extraction_method": "LLM_TRIPLE_EXTRACTION"
|
|
},
|
|
"governance": {
|
|
"validated": false,
|
|
"review_required": false
|
|
}
|
|
}
|
|
```
|
|
|
|
Implementation:
|
|
|
|
``` python
|
|
def build_default_node_properties():
|
|
return {
|
|
"metadata": {
|
|
"created_by": "RFP_AI_V2",
|
|
"version": "2.0",
|
|
"created_at": datetime.utcnow().isoformat()
|
|
},
|
|
"analysis": {
|
|
"confidence_score": None,
|
|
"source": "DOCUMENT_RAG",
|
|
"extraction_method": "LLM_TRIPLE_EXTRACTION"
|
|
},
|
|
"governance": {
|
|
"validated": False,
|
|
"review_required": False
|
|
}
|
|
}
|
|
```
|
|
|
|
This guarantees:
|
|
|
|
- No empty `{}` stored
|
|
- Auditability
|
|
- Governance extension capability
|
|
- Enterprise extensibility
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🔎 Step 6 -- Hybrid Retrieval Strategy
|
|
|
|
The system combines:
|
|
|
|
1. FAISS semantic recall
|
|
2. Graph filtering via Oracle Text
|
|
3. Graph-aware reranking
|
|
4. Deterministic LLM evaluation
|
|
|
|
This ensures:
|
|
|
|
- High recall
|
|
- High precision
|
|
- No hallucination
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🎯 Step 7 -- RFP Requirement Parsing
|
|
|
|
Each question becomes structured:
|
|
|
|
``` json
|
|
{
|
|
"requirement_type": "NON_FUNCTIONAL",
|
|
"subject": "authentication",
|
|
"expected_value": "MFA",
|
|
"keywords": ["authentication", "mfa"]
|
|
}
|
|
```
|
|
|
|
This structure guides retrieval and evaluation.
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 📊 Step 8 -- Deterministic Decision Engine
|
|
|
|
LLM output format:
|
|
|
|
``` json
|
|
{
|
|
"answer": "YES | NO | PARTIAL",
|
|
"confidence": "HIGH | MEDIUM | LOW",
|
|
"justification": "Short factual explanation",
|
|
"evidence": [
|
|
{
|
|
"quote": "Exact document text",
|
|
"source": "Document reference"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Rules:
|
|
|
|
- If not explicitly stated → NO
|
|
- No inference
|
|
- Must provide documentary evidence
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🌐 Step 9 -- Running the Application
|
|
|
|
Run preprocessing once:
|
|
|
|
python graphrag_rerank.py
|
|
|
|
Run web UI:
|
|
|
|
python app.py
|
|
|
|
Open:
|
|
|
|
http://localhost:8100
|
|
|
|
Or use REST:
|
|
|
|
curl -X POST http://localhost:8100/chat -H "Content-Type: application/json" -d '{"question": "Does the platform support MFA?"}'
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🧪 Example RFP Questions
|
|
|
|
Security, SLA, Performance, Compliance, Vendor Lock-in, Backup,
|
|
Governance.
|
|
|
|
The engine validates each with deterministic logic.
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🔐 Design Principles
|
|
|
|
- Evidence-first
|
|
- Deterministic outputs
|
|
- Zero hallucination tolerance
|
|
- Enterprise auditability
|
|
- Structured graph reasoning
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 🚀 Future Extensions
|
|
|
|
- Confidence scoring via graph density
|
|
- Weighted edge scoring
|
|
- SLA numeric comparison engine
|
|
- JSON-based filtering
|
|
- PGQL advanced reasoning
|
|
- Enterprise governance workflows
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
# 📌 Conclusion
|
|
|
|
Oracle GraphRAG RFP AI is not a chatbot.
|
|
|
|
It is a compliance validation engine built for enterprise RFP
|
|
automation, legal due diligence, and procurement decision support.
|
|
|
|
Deterministic. Traceable. Expandable.
|