first commit

2026-03-06 10:11:08 +00:00 · 2026-02-18 20:34:33 -03:00
parent 2f819da943
commit 60f0dcaac4
50 changed files with 8099 additions and 1471 deletions
--- a/README_COMPLETE_TUTORIAL.md
+++ b/README_COMPLETE_TUTORIAL.md
@@ -0,0 +1,295 @@
+# 🧠 Oracle GraphRAG RFP AI -- Complete Tutorial
+
+Enterprise-grade deterministic RFP validation engine built with:
+
+-   Oracle Autonomous Database 23ai
+-   Oracle Property Graph
+-   OCI Generative AI (LLMs + Embeddings)
+-   FAISS Vector Search
+-   Flask REST API
+-   Hybrid Graph + Vector + JSON reasoning
+
+------------------------------------------------------------------------
+
+# 📌 Introduction
+
+This project implements a **deterministic RFP validation engine**.
+
+Unlike traditional RAG systems that generate conceptual answers, this
+solution is designed to:
+
+-   Validate contractual and compliance requirements
+-   Produce only: YES / NO / PARTIAL
+-   Provide exact documentary evidence
+-   Eliminate hallucination risk
+-   Ensure full traceability
+
+This tutorial walks through the full architecture and implementation.
+
+------------------------------------------------------------------------
+
+# 🏗️ Full Architecture
+
+    PDF Documents
+     └─► Semantic Chunking
+         ├─► FAISS Vector Index
+         ├─► LLM Triple Extraction
+         │     └─► Oracle 23ai Property Graph
+         │           ├─► Structured JSON Node Properties
+         │           ├─► Edge Confidence Weights
+         │           └─► Evidence Table
+         └─► Hybrid Retrieval Layer
+                ├─► Vector Recall
+                ├─► Graph Filtering
+                ├─► Oracle Text
+                └─► Graph-aware Reranking
+                      └─► Deterministic LLM Decision
+                            └─► REST Response
+
+------------------------------------------------------------------------
+
+# 🧩 Step 1 -- Environment Setup
+
+You need:
+
+-   Oracle Autonomous Database 23ai
+-   OCI Generative AI enabled
+-   Python 3.10+
+-   FAISS installed
+-   Oracle Python driver (`oracledb`)
+
+Install dependencies:
+
+    pip install oracledb langchain faiss-cpu flask pypandoc
+
+------------------------------------------------------------------------
+
+# 📄 Step 2 -- PDF Ingestion
+
+-   Load PDFs
+-   Perform semantic chunking
+-   Normalize headings and tables
+-   Store chunk metadata including:
+    -   chunk_hash
+    -   source_url
+
+Chunks feed both:
+
+-   FAISS
+-   Graph extraction
+
+------------------------------------------------------------------------
+
+# 🧠 Step 3 -- Triple Extraction (Graph Creation)
+
+The function:
+
+    create_knowledge_graph(chunks)
+
+Uses LLM to extract ONLY explicit relationships:
+
+    SERVICE -[SUPPORTS_CAPABILITY]-> CAPABILITY
+    SERVICE -[DOES_NOT_SUPPORT]-> CAPABILITY
+    SERVICE -[HAS_LIMITATION]-> LIMITATION
+    SERVICE -[HAS_SLA]-> SLA_VALUE
+
+No inference allowed.
+
+------------------------------------------------------------------------
+
+# 🏛️ Step 4 -- Oracle Property Graph Setup
+
+Graph is created automatically:
+
+    CREATE PROPERTY GRAPH GRAPH_NAME
+    VERTEX TABLES (...)
+    EDGE TABLES (...)
+
+Nodes are stored in:
+
+    KG_NODES_GRAPH_NAME
+
+Edges in:
+
+    KG_EDGES_GRAPH_NAME
+
+Evidence in:
+
+    KG_EVIDENCE_GRAPH_NAME
+
+------------------------------------------------------------------------
+
+# 🧩 Step 5 -- Structured Node Properties (Important)
+
+Each node includes structured JSON properties.
+
+Default structure:
+
+``` json
+{
+  "metadata": {
+    "created_by": "RFP_AI_V2",
+    "version": "2.0",
+    "created_at": "UTC_TIMESTAMP"
+  },
+  "analysis": {
+    "confidence_score": null,
+    "source": "DOCUMENT_RAG",
+    "extraction_method": "LLM_TRIPLE_EXTRACTION"
+  },
+  "governance": {
+    "validated": false,
+    "review_required": false
+  }
+}
+```
+
+Implementation:
+
+``` python
+def build_default_node_properties():
+    return {
+        "metadata": {
+            "created_by": "RFP_AI_V2",
+            "version": "2.0",
+            "created_at": datetime.utcnow().isoformat()
+        },
+        "analysis": {
+            "confidence_score": None,
+            "source": "DOCUMENT_RAG",
+            "extraction_method": "LLM_TRIPLE_EXTRACTION"
+        },
+        "governance": {
+            "validated": False,
+            "review_required": False
+        }
+    }
+```
+
+This guarantees:
+
+-   No empty `{}` stored
+-   Auditability
+-   Governance extension capability
+-   Enterprise extensibility
+
+------------------------------------------------------------------------
+
+# 🔎 Step 6 -- Hybrid Retrieval Strategy
+
+The system combines:
+
+1.  FAISS semantic recall
+2.  Graph filtering via Oracle Text
+3.  Graph-aware reranking
+4.  Deterministic LLM evaluation
+
+This ensures:
+
+-   High recall
+-   High precision
+-   No hallucination
+
+------------------------------------------------------------------------
+
+# 🎯 Step 7 -- RFP Requirement Parsing
+
+Each question becomes structured:
+
+``` json
+{
+  "requirement_type": "NON_FUNCTIONAL",
+  "subject": "authentication",
+  "expected_value": "MFA",
+  "keywords": ["authentication", "mfa"]
+}
+```
+
+This structure guides retrieval and evaluation.
+
+------------------------------------------------------------------------
+
+# 📊 Step 8 -- Deterministic Decision Engine
+
+LLM output format:
+
+``` json
+{
+  "answer": "YES | NO | PARTIAL",
+  "confidence": "HIGH | MEDIUM | LOW",
+  "justification": "Short factual explanation",
+  "evidence": [
+    {
+      "quote": "Exact document text",
+      "source": "Document reference"
+    }
+  ]
+}
+```
+
+Rules:
+
+-   If not explicitly stated → NO
+-   No inference
+-   Must provide documentary evidence
+
+------------------------------------------------------------------------
+
+# 🌐 Step 9 -- Running the Application
+
+Run preprocessing once:
+
+    python graphrag_rerank.py
+
+Run web UI:
+
+    python app.py
+
+Open:
+
+    http://localhost:8100
+
+Or use REST:
+
+    curl -X POST http://localhost:8100/chat -H "Content-Type: application/json" -d '{"question": "Does the platform support MFA?"}'
+
+------------------------------------------------------------------------
+
+# 🧪 Example RFP Questions
+
+Security, SLA, Performance, Compliance, Vendor Lock-in, Backup,
+Governance.
+
+The engine validates each with deterministic logic.
+
+------------------------------------------------------------------------
+
+# 🔐 Design Principles
+
+-   Evidence-first
+-   Deterministic outputs
+-   Zero hallucination tolerance
+-   Enterprise auditability
+-   Structured graph reasoning
+
+------------------------------------------------------------------------
+
+# 🚀 Future Extensions
+
+-   Confidence scoring via graph density
+-   Weighted edge scoring
+-   SLA numeric comparison engine
+-   JSON-based filtering
+-   PGQL advanced reasoning
+-   Enterprise governance workflows
+
+------------------------------------------------------------------------
+
+# 📌 Conclusion
+
+Oracle GraphRAG RFP AI is not a chatbot.
+
+It is a compliance validation engine built for enterprise RFP
+automation, legal due diligence, and procurement decision support.
+
+Deterministic. Traceable. Expandable.