first commit

2026-03-06 18:21:02 +00:00 · 2026-01-08 18:26:09 -03:00
commit 557ea10653
3 changed files with 1093 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,325 @@
+
+# 🧠 Oracle GraphRAG for RFP Validation
+
+**GraphRAG-based AI system for factual RFP requirement validation using Oracle 23ai, OCI Generative AI, and Vector Search**
+
+---
+
+## 📌 Overview
+
+This project implements an **AI-driven RFP validation engine** designed to answer *formal RFP requirements* using **explicit, verifiable evidence** extracted from technical documentation.
+
+Instead of responding to open-ended conceptual questions, the system evaluates **whether a requirement is met**, returning **YES / NO / PARTIAL**, along with **exact textual evidence** and full traceability.
+
+The solution combines:
+
+- Retrieval-Augmented Generation (RAG) over PDFs
+- GraphRAG for structured factual relationships
+- Oracle 23ai Property Graph + Oracle Text
+- OCI Generative AI (LLMs & Embeddings)
+- FAISS vector search
+- Flask REST API
+
+This project is based on the article: [Analyze PDF Documents in Natural Language with OCI Generative AI](https://docs.oracle.com/en/learn/oci-genai-pdf)
+
+See the details about this material to setup/configure your development environment, Oracle Autonomous Database AI and other components.
+
+
+---
+
+## 🎯 Why RFP-Centric (and not Concept Q&A)
+
+While typical knowledge base projects focus on extracting information about concepts, step-by-step instructions, and numerous answers to questions asked about a particular subject, an RFP requires a very special approach.
+
+>**Note:** Traditional RAG systems are optimized for *conceptual explanations*. RFPs require **objective validation**, not interpretation.
+
+This project shifts the AI role from:
+
+❌ *“Explain how the product works”*  
+to  
+✅ *“Prove whether this requirement is met, partially met, or not met”*
+
+---
+
+## 🧩 Core Capabilities
+
+### ✅ RFP Requirement Parsing
+
+Each question is parsed into a structured requirement:
+
+```json
+{
+  "requirement_type": "COMPLIANCE | FUNCTIONAL | NON_FUNCTIONAL",
+  "subject": "authentication",
+  "expected_value": "MFA",
+  "decision_type": "YES_NO | YES_NO_PARTIAL",
+  "keywords": ["authentication", "mfa", "identity"]
+}
+```
+
+---
+
+### 🧠 Knowledge Graph (GraphRAG)
+
+Facts are extracted **only when explicitly stated** in documentation and stored as graph triples:
+
+```
+REQUIREMENT -[HAS_METRIC]-> messages per hour
+REQUIREMENT -[HAS_VALUE]-> < 1 hour
+REQUIREMENT -[SUPPORTED_BY]-> Document section
+```
+
+This ensures:
+- No hallucination
+- No inferred assumptions
+- Full auditability
+
+---
+
+### 🔎 Hybrid Retrieval Strategy
+
+1. **Vector Search (FAISS)**
+2. **Oracle Graph + Oracle Text**
+3. **Graph-aware Re-ranking**
+
+---
+
+### 📊 Deterministic RFP Decision Output
+
+```json
+{
+  "answer": "YES | NO | PARTIAL",
+  "justification": "Short factual explanation",
+  "evidence": [
+    {
+      "quote": "Exact text from the document",
+      "source": "Document or section"
+    }
+  ]
+}
+```
+
+---
+
+## 🏗️ Architecture
+
+```
+PDFs
+ └─► Semantic Chunking
+     └─► FAISS Vector Index
+         └─► RAG Retrieval
+             └─► GraphRAG (Oracle 23ai)
+                 └─► Evidence-based LLM Decision
+                     └─► REST API Response
+```
+
+---
+
+## 🚀 REST API
+
+### Health Check
+GET /health
+
+### RFP Validation
+POST /chat
+
+```json
+{
+  "question": "Does the platform support MFA and integration with corporate identity providers?"
+}
+```
+
+---
+
+## 🧪 Example Use Cases
+
+- Enterprise RFP / RFQ validation
+- Pre-sales technical due diligence
+- Compliance checks
+- SaaS capability assessment
+- Audit-ready AI answers
+
+---
+
+## 🛠️ Technology Stack
+
+- Oracle Autonomous Database 23ai
+- OCI Generative AI
+- LangChain / LangGraph
+- FAISS
+- Flask
+- Python
+
+---
+
+## 🔐 Design Principles
+
+- Evidence-first
+- Deterministic outputs
+- No hallucination tolerance
+- Explainability
+
+---
+
+# GraphRAG for RFP Validation – Code Walkthrough
+
+> **Status:** Demo / Reference Implementation  
+> **Derived from:** Official Oracle Generative AI & GraphRAG learning material  
+> https://docs.oracle.com/en/learn/oci-genai-pdf
+
+---
+
+## 🎯 Purpose of This Code
+
+This code implements a **GraphRAG-based pipeline focused on RFP (Request for Proposal) validation**, not generic Q&A.
+
+>**Download** the code [graphrag_rerank.py](./files/graphrag_rerank.py)
+
+The main goal is to:
+- Extract **explicit, verifiable facts** from large PDF contracts and datasheets
+- Store those facts as **structured graph relationships**
+- Answer RFP questions using **YES / NO / PARTIAL** decisions
+- Always provide **document-backed evidence**, never hallucinations
+
+This represents a **strategic shift** from concept-based LLM answers to **compliance-grade validation**.
+
+---
+
+## 🧠 High-Level Architecture
+
+1. **PDF Ingestion**
+    - PDFs are read using OCR-aware loaders
+    - Large documents are split into semantic chunks
+
+2. **Semantic Chunking (LLM-driven)**
+    - Headings, tables, metrics, and sections are normalized
+    - Output is optimized for both vector search and fact extraction
+
+3. **Vector Index (FAISS)**
+    - Chunks are embedded using OCI Cohere multilingual embeddings
+    - Enables semantic recall
+
+4. **Knowledge Graph (Oracle 23ai)**
+    - Explicit facts are extracted as triples:
+        - `REQUIREMENT -[HAS_METRIC]-> RTO`
+        - `REQUIREMENT -[HAS_VALUE]-> 1 hour`
+    - Stored in Oracle Property Graph tables
+
+5. **RFP Requirement Parsing**
+    - Each user question is converted into a structured requirement:
+      ```json
+      {
+        "requirement_type": "NON_FUNCTIONAL",
+        "subject": "authentication",
+        "expected_value": "",
+        "keywords": ["mfa", "ldap", "sso"]
+      }
+      ```
+
+6. **Graph + Vector Fusion**
+    - Graph terms reinforce document reranking
+    - Ensures high-precision evidence retrieval
+
+7. **Deterministic RFP Decision**
+    - LLM outputs are constrained to:
+        - `YES`
+        - `NO`
+        - `PARTIAL`
+    - Always backed by quotes from source documents
+
+---
+
+## 🗂️ Key Code Sections Explained
+
+### Oracle Autonomous & Graph Setup
+- Creates entity and relation tables if not present
+- Builds an Oracle **PROPERTY GRAPH**
+- Uses Oracle Text indexes for semantic filtering
+
+### `create_knowledge_graph()`
+- Uses LLM to extract **ONLY explicit facts**
+- No inference, no assumptions
+- Inserts entities and relations safely using MERGE
+
+### `parse_rfp_requirement()`
+- Converts free-text questions into structured RFP requirements
+- Enforces strict JSON output using `<json>` tags
+- Includes safe fallback logic
+
+### `query_knowledge_graph()`
+- Uses Oracle Text (`CONTAINS`) with sanitized queries
+- Filters graph facts by RFP keywords
+- Returns only relevant evidence
+
+### Graph-aware Re-ranking
+- Combines:
+    - Vector similarity
+    - Graph-derived terms
+- Improves precision on contractual questions
+
+### Final RFP Decision Chain
+- Implemented with LangChain `RunnableMap`
+- Clean separation of:
+    - Requirement parsing
+    - Context retrieval
+    - Decision generation
+
+---
+
+## ✅ Why This Is NOT a Generic RAG
+
+| Traditional RAG | This GraphRAG |
+|----------------|---------------|
+| Answers concepts | Validates requirements |
+| May hallucinate | Evidence-only |
+| Free-form text | Deterministic YES/NO |
+| No structure | Knowledge graph |
+| Chatbot | RFP analyst |
+
+---
+
+## ⚠️ Important Design Principles
+
+- **Evidence-first**: If not explicitly stated → NO
+- **No inference**: LLM is forbidden to assume
+- **Auditability**: Every answer is traceable
+- **Enterprise-grade**: Designed for legal, procurement, compliance
+
+---
+
+## 📌 Intended Use Cases
+
+- RFP response automation
+- Vendor compliance validation
+- Contractual due diligence
+- Pre-sales technical qualification
+- Regulatory checks
+
+---
+
+## 🧪 Demo Disclaimer
+
+This code is:
+- A **demo / reference implementation**
+- Not production-hardened
+- Intended for education, experimentation, and architecture discussions
+
+---
+
+## 👤 Acknowledgments
+
+- **Author** - Cristiano Hoshikawa (Oracle LAD A-Team Solution Engineer)
+
+---
+
+## 📎 References
+
+[Analyze PDF Documents in Natural Language with OCI Generative AI](https://docs.oracle.com/en/learn/oci-genai-pdf)
+
+---
+
+## ⚠️ Disclaimer
+
+This is a demo / reference architecture.  
+Final answers depend strictly on indexed documentation.
+