mirror of
https://github.com/hoshikawa2/rfp_response_automation.git
synced 2026-03-06 10:11:08 +00:00
first commit
This commit is contained in:
295
README_COMPLETE_TUTORIAL.md
Normal file
295
README_COMPLETE_TUTORIAL.md
Normal file
@@ -0,0 +1,295 @@
|
||||
# 🧠 Oracle GraphRAG RFP AI -- Complete Tutorial
|
||||
|
||||
Enterprise-grade deterministic RFP validation engine built with:
|
||||
|
||||
- Oracle Autonomous Database 23ai
|
||||
- Oracle Property Graph
|
||||
- OCI Generative AI (LLMs + Embeddings)
|
||||
- FAISS Vector Search
|
||||
- Flask REST API
|
||||
- Hybrid Graph + Vector + JSON reasoning
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 📌 Introduction
|
||||
|
||||
This project implements a **deterministic RFP validation engine**.
|
||||
|
||||
Unlike traditional RAG systems that generate conceptual answers, this
|
||||
solution is designed to:
|
||||
|
||||
- Validate contractual and compliance requirements
|
||||
- Produce only: YES / NO / PARTIAL
|
||||
- Provide exact documentary evidence
|
||||
- Eliminate hallucination risk
|
||||
- Ensure full traceability
|
||||
|
||||
This tutorial walks through the full architecture and implementation.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🏗️ Full Architecture
|
||||
|
||||
PDF Documents
|
||||
└─► Semantic Chunking
|
||||
├─► FAISS Vector Index
|
||||
├─► LLM Triple Extraction
|
||||
│ └─► Oracle 23ai Property Graph
|
||||
│ ├─► Structured JSON Node Properties
|
||||
│ ├─► Edge Confidence Weights
|
||||
│ └─► Evidence Table
|
||||
└─► Hybrid Retrieval Layer
|
||||
├─► Vector Recall
|
||||
├─► Graph Filtering
|
||||
├─► Oracle Text
|
||||
└─► Graph-aware Reranking
|
||||
└─► Deterministic LLM Decision
|
||||
└─► REST Response
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🧩 Step 1 -- Environment Setup
|
||||
|
||||
You need:
|
||||
|
||||
- Oracle Autonomous Database 23ai
|
||||
- OCI Generative AI enabled
|
||||
- Python 3.10+
|
||||
- FAISS installed
|
||||
- Oracle Python driver (`oracledb`)
|
||||
|
||||
Install dependencies:
|
||||
|
||||
pip install oracledb langchain faiss-cpu flask pypandoc
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 📄 Step 2 -- PDF Ingestion
|
||||
|
||||
- Load PDFs
|
||||
- Perform semantic chunking
|
||||
- Normalize headings and tables
|
||||
- Store chunk metadata including:
|
||||
- chunk_hash
|
||||
- source_url
|
||||
|
||||
Chunks feed both:
|
||||
|
||||
- FAISS
|
||||
- Graph extraction
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🧠 Step 3 -- Triple Extraction (Graph Creation)
|
||||
|
||||
The function:
|
||||
|
||||
create_knowledge_graph(chunks)
|
||||
|
||||
Uses LLM to extract ONLY explicit relationships:
|
||||
|
||||
SERVICE -[SUPPORTS_CAPABILITY]-> CAPABILITY
|
||||
SERVICE -[DOES_NOT_SUPPORT]-> CAPABILITY
|
||||
SERVICE -[HAS_LIMITATION]-> LIMITATION
|
||||
SERVICE -[HAS_SLA]-> SLA_VALUE
|
||||
|
||||
No inference allowed.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🏛️ Step 4 -- Oracle Property Graph Setup
|
||||
|
||||
Graph is created automatically:
|
||||
|
||||
CREATE PROPERTY GRAPH GRAPH_NAME
|
||||
VERTEX TABLES (...)
|
||||
EDGE TABLES (...)
|
||||
|
||||
Nodes are stored in:
|
||||
|
||||
KG_NODES_GRAPH_NAME
|
||||
|
||||
Edges in:
|
||||
|
||||
KG_EDGES_GRAPH_NAME
|
||||
|
||||
Evidence in:
|
||||
|
||||
KG_EVIDENCE_GRAPH_NAME
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🧩 Step 5 -- Structured Node Properties (Important)
|
||||
|
||||
Each node includes structured JSON properties.
|
||||
|
||||
Default structure:
|
||||
|
||||
``` json
|
||||
{
|
||||
"metadata": {
|
||||
"created_by": "RFP_AI_V2",
|
||||
"version": "2.0",
|
||||
"created_at": "UTC_TIMESTAMP"
|
||||
},
|
||||
"analysis": {
|
||||
"confidence_score": null,
|
||||
"source": "DOCUMENT_RAG",
|
||||
"extraction_method": "LLM_TRIPLE_EXTRACTION"
|
||||
},
|
||||
"governance": {
|
||||
"validated": false,
|
||||
"review_required": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation:
|
||||
|
||||
``` python
|
||||
def build_default_node_properties():
|
||||
return {
|
||||
"metadata": {
|
||||
"created_by": "RFP_AI_V2",
|
||||
"version": "2.0",
|
||||
"created_at": datetime.utcnow().isoformat()
|
||||
},
|
||||
"analysis": {
|
||||
"confidence_score": None,
|
||||
"source": "DOCUMENT_RAG",
|
||||
"extraction_method": "LLM_TRIPLE_EXTRACTION"
|
||||
},
|
||||
"governance": {
|
||||
"validated": False,
|
||||
"review_required": False
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This guarantees:
|
||||
|
||||
- No empty `{}` stored
|
||||
- Auditability
|
||||
- Governance extension capability
|
||||
- Enterprise extensibility
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🔎 Step 6 -- Hybrid Retrieval Strategy
|
||||
|
||||
The system combines:
|
||||
|
||||
1. FAISS semantic recall
|
||||
2. Graph filtering via Oracle Text
|
||||
3. Graph-aware reranking
|
||||
4. Deterministic LLM evaluation
|
||||
|
||||
This ensures:
|
||||
|
||||
- High recall
|
||||
- High precision
|
||||
- No hallucination
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🎯 Step 7 -- RFP Requirement Parsing
|
||||
|
||||
Each question becomes structured:
|
||||
|
||||
``` json
|
||||
{
|
||||
"requirement_type": "NON_FUNCTIONAL",
|
||||
"subject": "authentication",
|
||||
"expected_value": "MFA",
|
||||
"keywords": ["authentication", "mfa"]
|
||||
}
|
||||
```
|
||||
|
||||
This structure guides retrieval and evaluation.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 📊 Step 8 -- Deterministic Decision Engine
|
||||
|
||||
LLM output format:
|
||||
|
||||
``` json
|
||||
{
|
||||
"answer": "YES | NO | PARTIAL",
|
||||
"confidence": "HIGH | MEDIUM | LOW",
|
||||
"justification": "Short factual explanation",
|
||||
"evidence": [
|
||||
{
|
||||
"quote": "Exact document text",
|
||||
"source": "Document reference"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- If not explicitly stated → NO
|
||||
- No inference
|
||||
- Must provide documentary evidence
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🌐 Step 9 -- Running the Application
|
||||
|
||||
Run preprocessing once:
|
||||
|
||||
python graphrag_rerank.py
|
||||
|
||||
Run web UI:
|
||||
|
||||
python app.py
|
||||
|
||||
Open:
|
||||
|
||||
http://localhost:8100
|
||||
|
||||
Or use REST:
|
||||
|
||||
curl -X POST http://localhost:8100/chat -H "Content-Type: application/json" -d '{"question": "Does the platform support MFA?"}'
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🧪 Example RFP Questions
|
||||
|
||||
Security, SLA, Performance, Compliance, Vendor Lock-in, Backup,
|
||||
Governance.
|
||||
|
||||
The engine validates each with deterministic logic.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🔐 Design Principles
|
||||
|
||||
- Evidence-first
|
||||
- Deterministic outputs
|
||||
- Zero hallucination tolerance
|
||||
- Enterprise auditability
|
||||
- Structured graph reasoning
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 🚀 Future Extensions
|
||||
|
||||
- Confidence scoring via graph density
|
||||
- Weighted edge scoring
|
||||
- SLA numeric comparison engine
|
||||
- JSON-based filtering
|
||||
- PGQL advanced reasoning
|
||||
- Enterprise governance workflows
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
# 📌 Conclusion
|
||||
|
||||
Oracle GraphRAG RFP AI is not a chatbot.
|
||||
|
||||
It is a compliance validation engine built for enterprise RFP
|
||||
automation, legal due diligence, and procurement decision support.
|
||||
|
||||
Deterministic. Traceable. Expandable.
|
||||
Reference in New Issue
Block a user