CognitiveDB: A Hybrid Memory System for Large Language Model Applications

Authors: Biki Das

Affiliation: Independent Research

Date: December 2025


Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, yet they suffer from fundamental limitations in persistent memory and contextual reasoning. We present CognitiveDB, a novel hybrid memory database that combines episodic memory storage, semantic knowledge graphs, and vector embeddings to provide LLM applications with human-like memory capabilities. Our system implements a three-tier memory architecture inspired by cognitive science: episodic memory for factual recall, semantic memory for conceptual relationships, and a unified retrieval mechanism that leverages graph traversal, vector similarity, and keyword matching. We introduce several key innovations including assertion-aware fact extraction, graph-first retrieval with multi-hop traversal, and hybrid cognitive scoring. Experimental results demonstrate that CognitiveDB significantly improves factual accuracy in LLM applications compared to traditional Retrieval-Augmented Generation (RAG) approaches, particularly for complex multi-hop reasoning queries.

Keywords: Memory Systems, Knowledge Graphs, Vector Databases, Large Language Models, Retrieval-Augmented Generation, Cognitive Architecture


1. Introduction

The emergence of Large Language Models has revolutionized natural language processing, enabling applications ranging from conversational agents to code generation systems [1]. However, LLMs face significant challenges when deployed in real-world applications that require persistent memory across sessions, accurate recall of user-specific information, and complex reasoning over accumulated knowledge [2].

Traditional approaches to augmenting LLM memory fall into two categories: (1) vector-based Retrieval-Augmented Generation (RAG) systems that store and retrieve text chunks based on embedding similarity [3], and (2) knowledge graph systems that maintain structured relationships between entities [4]. Each approach has distinct limitations: vector-based systems excel at semantic similarity but struggle with precise factual recall and multi-hop reasoning, while knowledge graphs provide structured reasoning but lack the flexibility of natural language understanding.

We present CognitiveDB, a hybrid memory system that addresses these limitations by integrating three complementary memory mechanisms:

Episodic Memory: Stores discrete facts and experiences with temporal context, salience scoring, and decay mechanisms
Semantic Memory: Maintains a knowledge graph of concepts and their relationships, enabling structured reasoning
Vector Memory: Provides dense embeddings for semantic similarity search

Our key contributions include:

A unified memory architecture that combines episodic, semantic, and vector-based retrieval
An assertion-aware fact extraction system that filters negative and hypothetical statements
A graph-first retrieval algorithm with multi-hop traversal for complex queries
A hybrid cognitive scoring mechanism that combines vector similarity, graph connectivity, and keyword matching
Source tracking and confidence weighting for improved factual accuracy

The remainder of this paper is organized as follows: Section 2 reviews related work, Section 3 describes the system architecture, Section 4 details our algorithms, Section 5 presents experimental results, and Section 6 concludes with future directions.


2. Related Work

2.1 Retrieval-Augmented Generation

RAG systems have emerged as the dominant paradigm for augmenting LLMs with external knowledge [3]. Lewis et al. introduced the foundational RAG architecture, which retrieves relevant documents based on query embeddings and incorporates them into the LLM context. Subsequent work has focused on improving retrieval quality through dense passage retrieval [5], hybrid sparse-dense methods [6], and iterative refinement [7].

However, RAG systems face fundamental limitations. As demonstrated by Guo et al. [8], retrieved information can be noisy or irrelevant, and over-reliance on external knowledge can suppress the model's intrinsic reasoning capabilities. Their GraphRAG-FI framework addresses this through two-stage filtering and integration with the LLM's internal knowledge.

2.2 Knowledge Graphs for LLMs

Knowledge graphs provide structured representations of entities and relationships, enabling logical reasoning and multi-hop queries [9]. Recent work has explored integrating knowledge graphs with LLMs through various approaches:

GraphRAG [10]: Organizes information into hierarchical knowledge graphs and uses graph traversal for retrieval
GNN-RAG [11]: Leverages Graph Neural Networks to process knowledge graph structures
Hybrid GraphRAG [12]: Combines vector-based and graph-based retrieval for improved accuracy

Li et al. [13] proposed an all-in-one graph-based index that unifies dense vectors, sparse vectors, full-text search, and knowledge graph retrieval within a single structure, demonstrating that hybrid approaches outperform single-path retrieval methods.

2.3 Cognitive Database Systems

Bordawekar et al. [14] introduced the concept of Cognitive Databases, proposing to endow relational databases with AI capabilities through word embeddings. Their approach treats structured data as meaningful unstructured text and uses vector space models to capture latent semantic relationships. This work inspired our approach of combining structured and unstructured representations.

2.4 Assertion Detection in NLP

Accurate extraction of factual information requires distinguishing between positive assertions, negations, and hypothetical statements. Kocaman et al. [15] demonstrated that assertion status detection is critical for accurately attributing extracted facts, identifying six assertion classes: present, absent, possible, hypothetical, conditional, and associated with someone else. We incorporate assertion-aware filtering into our fact extraction pipeline.

2.5 Memory Systems in Cognitive Science

Our architecture draws inspiration from cognitive science models of human memory. The distinction between episodic and semantic memory, first proposed by Tulving [16], forms the foundation of our dual-memory approach. Episodic memory stores specific experiences and events, while semantic memory maintains general knowledge and concepts. We extend this model with vector-based similarity search to enable flexible retrieval.


3. System Architecture

CognitiveDB implements a three-tier memory architecture designed to support LLM applications with persistent, queryable memory. Figure 1 illustrates the overall system design.

┌─────────────────────────────────────────────────────────────────┐
│                        CognitiveDB                               │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │  Episodic   │  │  Semantic   │  │   Vector    │              │
│  │   Memory    │  │   Graph     │  │   Store     │              │
│  │             │  │             │  │             │              │
│  │ • Facts     │  │ • Concepts  │  │ • Embeddings│              │
│  │ • Insights  │  │ • Relations │  │ • HNSW Index│              │
│  │ • Summaries │  │ • Traversal │  │ • Similarity│              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
│         │                │                │                      │
│         └────────────────┼────────────────┘                      │
│                          ▼                                       │
│              ┌─────────────────────┐                            │
│              │   Hybrid Retrieval  │                            │
│              │                     │                            │
│              │ • Graph-First       │                            │
│              │ • Vector Search     │                            │
│              │ • Keyword Match     │                            │
│              │ • Cognitive Scoring │                            │
│              └─────────────────────┘                            │
│                          │                                       │
│                          ▼                                       │
│              ┌─────────────────────┐                            │
│              │    LLM Context      │                            │
│              │    Construction     │                            │
│              └─────────────────────┘                            │
└─────────────────────────────────────────────────────────────────┘

Figure 1: CognitiveDB System Architecture
Figure Visualization

3.1 Episodic Memory Store

The episodic memory store maintains discrete memory units representing facts, conversations, insights, and summaries. Each memory is defined as:

Memory = {
    id: UUID,
    content: String,
    type: MemoryType ∈ {Fact, Conversation, Insight, Summary},
    collection: String,
    timestamp: DateTime,
    salience: Float ∈ [0, 1],
    embedding_id: Option<UUID>,
    metadata: Map<String, String>
}

Key features of the episodic store include:

Salience Scoring: Each memory has an associated salience score representing its importance. Salience is computed based on:

User flagging (explicit importance markers)
Emotional valence (detected sentiment)
Access frequency (memories accessed more often gain salience)
Recency (newer memories have higher initial salience)

Temporal Decay: Memory salience decays over time according to:

S(t) = S0 ⋅ e-λ t

where S0 is the initial salience, λ is the decay rate, and t is time elapsed.

Memory Consolidation: Similar memories are periodically consolidated using LLM-based summarization, reducing redundancy while preserving key information.

3.2 Semantic Knowledge Graph

The semantic graph stores concepts and their relationships, enabling structured reasoning and multi-hop queries.

Concept = {
    id: UUID,
    name: String,
    type: Option<ConceptType>,  // person, place, thing, idea
    collection: String,
    embedding_id: Option<UUID>,
    attributes: Map<String, String>
}

Relation = {
    id: UUID,
    from: UUID,
    to: UUID,
    relation_type: String,
    weight: Float ∈ [0, 1],
    collection: String
}

Supported relation types include: has, is, located_in, part_of, related_to, prefers, likes, hates, allergic_to, works_at, manages, and others.

3.3 Vector Store

The vector store maintains dense embeddings for semantic similarity search. We use the HNSW (Hierarchical Navigable Small World) algorithm [17] for efficient approximate nearest neighbor search.

Each embedding is associated with either a memory or concept, enabling:

Semantic similarity search across memories
Concept matching based on embedding proximity
Cross-modal retrieval (text to concept, concept to memory)

3.4 Storage Engine

CognitiveDB implements a Log-Structured Merge-tree (LSM) storage engine with:

Write-Ahead Log (WAL): Ensures durability through sequential writes
MemTable: In-memory buffer for recent writes
SSTables: Sorted String Tables for persistent storage
Compaction: Background process to merge and optimize storage

The storage engine supports both in-memory and persistent modes, with automatic persistence of auxiliary data (semantic graph, vector indices) alongside the primary data.


4. Algorithms

4.1 Cognitive Ingestion Pipeline

When content is ingested into CognitiveDB, it undergoes a multi-stage processing pipeline:

Algorithm 1: Cognitive Ingestion
─────────────────────────────────────────────────────────────────
Input: content (String), collection (String), metadata (Map)
Output: IngestResult

1.  embedding ← GenerateEmbedding(content)
2.  memory ← CreateMemory(content, Conversation, collection)
3.  memory.embedding_id ← StoreVector(embedding, collection)
4.  Store(memory)
5.  
6.  // Background extraction (non-blocking)
7.  SPAWN:
8.      IF metadata.source_type = "user_input" THEN
9.          facts ← ExtractFacts(content)
10.         facts ← FilterNegativeAssertions(facts)
11.         FOR each fact IN facts DO
12.             fact_memory ← CreateMemory(fact, Fact, collection)
13.             fact_embedding ← GenerateEmbedding(fact)
14.             fact_memory.embedding_id ← StoreVector(fact_embedding)
15.             Store(fact_memory)
16.         END FOR
17.         
18.         concepts ← ExtractConcepts(content)
19.         FOR each concept IN concepts DO
20.             AddConcept(concept.name, collection)
21.         END FOR
22.         
23.         relations ← ExtractRelationships(content)
24.         FOR each rel IN relations DO
25.             from_id ← FindOrCreateConcept(rel.from, collection)
26.             to_id ← FindOrCreateConcept(rel.to, collection)
27.             AddRelation(from_id, to_id, rel.type, rel.weight)
28.         END FOR
29.     END IF
30. 
31. RETURN IngestResult(memory.id, facts, concepts)
─────────────────────────────────────────────────────────────────
Algorithm Specification

4.2 Assertion-Aware Fact Extraction

A critical innovation in CognitiveDB is the filtering of negative and hypothetical assertions during fact extraction. When an LLM responds with statements like "I don't have information about X", naive systems store this as a fact, which then pollutes future retrieval.

We implement a two-layer defense:

Layer 1: Prompt Engineering

The fact extraction prompt explicitly instructs the LLM to:

Preserve entity names exactly as mentioned
Extract only positive statements of fact
Avoid generic substitutions ("the guest" instead of "Rama")

Layer 2: Pattern-Based Filtering

Extracted facts are filtered against a comprehensive set of negative assertion patterns:

NEGATIVE_PATTERNS = {
    "does not have information",
    "doesn't know",
    "no information about",
    "not specified",
    "not mentioned",
    "unable to find",
    "speaker does not",
    "unknown",
    ...
}

Function FilterNegativeAssertions(facts):
    RETURN facts.filter(f → 
        NOT any(pattern IN NEGATIVE_PATTERNS 
                WHERE f.content.toLowerCase().contains(pattern)))

This approach is inspired by assertion detection research in clinical NLP [15], where distinguishing between present, absent, and hypothetical assertions is critical for accurate information extraction.

4.3 Graph-First Retrieval with Multi-Hop Traversal

Traditional RAG systems rely primarily on vector similarity, which can miss relevant information that is semantically distant but logically connected. CognitiveDB implements a graph-first retrieval strategy that traverses the semantic graph before falling back to vector search.

Algorithm 2: Graph-First Retrieval
─────────────────────────────────────────────────────────────────
Input: query (String), collection (String), max_hops (Int)
Output: List<GraphKnowledge>

1.  query_words ← Tokenize(query).filter(w → w.length ≥ 3)
2.  concepts ← GetConcepts(collection)
3.  relations ← GetRelations(collection)
4.  adjacency ← BuildAdjacencyList(relations)
5.  
6.  // Find starting concepts matching query
7.  starting_concepts ← []
8.  FOR each concept IN concepts DO
9.      score ← ComputeMatchScore(concept.name, query, query_words)
10.     IF score > 0.3 THEN
11.         starting_concepts.append((concept, score))
12.     END IF
13. END FOR
14. 
15. starting_concepts.sortByScoreDescending()
16. knowledge_paths ← []
17. 
18. // BFS traversal from each starting concept
19. FOR each (start, score) IN starting_concepts.take(5) DO
20.     visited ← {}
21.     queue ← [(start.id, [start.id], [start.name], score, 0)]
22.     
23.     WHILE queue NOT empty DO
24.         (current, path, names, conf, hops) ← queue.pop()
25.         IF hops ≥ max_hops THEN CONTINUE
26.         visited.add(current)
27.         
28.         FOR each (relation, neighbor) IN adjacency[current] DO
29.             IF neighbor IN visited THEN CONTINUE
30.             
31.             new_names ← names + ["-[" + relation.type + "]->", 
32.                                   GetConceptName(neighbor)]
33.             new_conf ← conf × relation.weight
34.             
35.             IF hops + 1 ≥ 1 THEN
36.                 statement ← GenerateStatement(new_names)
37.                 knowledge_paths.append(GraphKnowledge{
38.                     path: new_names.join(" "),
39.                     statement: statement,
40.                     confidence: new_conf,
41.                     hops: hops + 1,
42.                     source: start.name
43.                 })
44.             END IF
45.             
46.             IF hops + 1 < max_hops THEN
47.                 queue.push((neighbor, path + [neighbor], 
48.                            new_names, new_conf, hops + 1))
49.             END IF
50.         END FOR
51.     END WHILE
52. END FOR
53. 
54. RETURN knowledge_paths.sortByConfidence().deduplicate().take(10)
─────────────────────────────────────────────────────────────────
Algorithm Specification

4.4 Hybrid Cognitive Scoring

CognitiveDB combines multiple signals to compute a unified cognitive score for each retrieved memory. This approach is inspired by research showing that hybrid retrieval methods outperform single-path approaches [12, 13].

Algorithm 3: Hybrid Cognitive Scoring
─────────────────────────────────────────────────────────────────
Input: query, memories, concepts, graph_knowledge
Output: List<ScoredMemory>

1.  query_words ← Tokenize(query).filter(w → w.length ≥ 3)
2.  
3.  // Build relevant terms from concepts and their relations
4.  relevant_terms ← {}
5.  FOR each concept IN concepts DO
6.      relevant_terms.add(concept.name.toLowerCase())
7.      FOR each rel IN concept.relations DO
8.          relevant_terms.add(rel.target_name.toLowerCase())
9.      END FOR
10. END FOR
11. 
12. FOR each memory IN memories DO
13.     content_lower ← memory.content.toLowerCase()
14.     
15.     // Base cognitive score (from vector similarity, recency, salience)
16.     score ← memory.cognitive_score
17.     
18.     // Keyword match boost
19.     keyword_matches ← count(w IN query_words WHERE content_lower.contains(w))
20.     score += 0.1 × min(keyword_matches, 3)
21.     
22.     // Graph connectivity boost
23.     FOR each concept IN concepts DO
24.         IF content_lower.contains(concept.name.toLowerCase()) THEN
25.             score += 0.15 × concept.relevance
26.         END IF
27.         
28.         FOR each rel IN concept.relations DO
29.             IF content_lower.contains(rel.target_name.toLowerCase()) THEN
30.                 // Higher boost if relation matches query semantically
31.                 IF any(w IN query_words WHERE 
32.                        rel.type.contains(w) OR rel.target_name.contains(w)) THEN
33.                     score += 0.2 × concept.relevance × rel.weight
34.                 ELSE
35.                     score += 0.1 × concept.relevance × rel.weight
36.                 END IF
37.             END IF
38.         END FOR
39.     END FOR
40.     
41.     // Graph knowledge path boost
42.     FOR each gk IN graph_knowledge DO
43.         gk_words ← Tokenize(gk.statement)
44.         matches ← count(w IN gk_words WHERE w.length ≥ 3 AND content_lower.contains(w))
45.         IF matches ≥ 2 THEN
46.             score += 0.15 × gk.confidence
47.         END IF
48.     END FOR
49.     
50.     // Source confidence boost
51.     IF memory.metadata.source_type = "user_input" THEN
52.         score += 0.1
53.     END IF
54.     
55.     memory.cognitive_score ← score
56. END FOR
57. 
58. RETURN memories.sortByCognitiveScoreDescending()
─────────────────────────────────────────────────────────────────
Algorithm Specification

The cognitive score combines:

Vector Similarity (weight: 0.5): Semantic similarity between query and memory embeddings
Recency (weight: 0.3): Temporal proximity, with recent memories scoring higher
Salience (weight: 0.2): Importance score based on user flagging and access patterns
Keyword Match (boost: +0.1 per word, max 0.3): Direct term overlap
Graph Connectivity (boost: +0.15-0.2): Presence of related concepts
Source Confidence (boost: +0.1): User-stated facts rank higher than extracted facts

4.5 Concept-First Context Construction

When building context for LLM prompts, CognitiveDB prioritizes structured knowledge over raw facts. This approach ensures that graph-derived knowledge (which represents verified relationships) takes precedence over potentially noisy vector-retrieved content.

Algorithm 4: Context Construction
─────────────────────────────────────────────────────────────────
Input: graph_knowledge, concepts, facts
Output: context (String)

1.  context_parts ← []
2.  
3.  // Priority 1: Graph Knowledge (highest confidence)
4.  IF graph_knowledge NOT empty THEN
5.      graph_context ← graph_knowledge
6.          .filter(k → k.confidence > 0.3)
7.          .map(k → "• " + k.statement + " (via: " + k.source + ")")
8.          .join("\n")
9.      IF graph_context NOT empty THEN
10.         context_parts.append("Known facts from knowledge graph:\n" + graph_context)
11.     END IF
12. END IF
13. 
14. // Priority 2: Concepts with Relations
15. IF concepts NOT empty THEN
16.     concept_context ← concepts.map(c →
17.         "• " + c.name + 
18.         (c.type ? " (" + c.type + ")" : "") +
19.         (c.relations NOT empty ? 
20.             " → " + c.relations.map(r → r.type + " " + r.target).join(", ") 
21.             : "")
22.     ).join("\n")
23.     context_parts.append("Related concepts:\n" + concept_context)
24. END IF
25. 
26. // Priority 3: Supporting Facts
27. IF facts NOT empty THEN
28.     facts_context ← facts.map(f → "• " + f.content).join("\n")
29.     context_parts.append("Supporting facts:\n" + facts_context)
30. END IF
31. 
32. IF context_parts empty THEN
33.     RETURN "No previous conversation history."
34. END IF
35. 
36. RETURN context_parts.join("\n\n---\n\n")
─────────────────────────────────────────────────────────────────
Algorithm Specification

This prioritization ensures that:

Direct answers from graph traversal appear first
Structured concept relationships provide context
Raw facts serve as supporting evidence

5. Implementation

CognitiveDB is implemented in Rust for performance and memory safety, with the following components:

5.1 Technology Stack

ComponentTechnology
Core EngineRust
StorageCustom LSM-tree with WAL
Vector IndexHNSW (custom implementation)
EmbeddingsGoogle Gemini / OpenAI
LLM IntegrationProvider-agnostic interface
APIHTTP (REST) + gRPC
SDKTypeScript

5.2 API Design

CognitiveDB exposes a RESTful API with the following endpoints:

POST /ingest     - Ingest content with cognitive processing
POST /recall     - Retrieve memories using hybrid search
POST /store      - Store raw memory without processing
GET  /memory/:id - Retrieve specific memory
DELETE /memory/:id - Delete memory
POST /decay      - Apply salience decay
POST /consolidate - Consolidate similar memories
POST /reflect    - Generate insights from recent memories
GET  /stats      - Collection statistics
GET  /graph      - Knowledge graph visualization
POST /purge      - Clear collection

5.3 Performance Characteristics

OperationComplexityTypical Latency
IngestO(d) + O(log n)50-200ms
RecallO(k log n) + O(m)20-100ms
Graph TraversalO(b^h)5-50ms
Vector SearchO(log n)10-30ms

Where:

d = embedding dimensions (768-1536)
n = number of memories
k = number of results
m = number of concepts
b = average branching factor
h = traversal depth (max 3)

6. Experimental Evaluation

6.1 Experimental Setup

We evaluate CognitiveDB on a hotel management assistant scenario, where the system must remember guest preferences, policies, and relationships across multiple conversations.

Dataset: 50 guest profiles with preferences, allergies, and booking history

Queries: 200 test queries ranging from simple lookups to multi-hop reasoning

Baseline: Standard RAG with vector-only retrieval

6.2 Query Categories

CategoryExampleHops Required
Simple Lookup"What is the check-in time?"1
Entity-Specific"What food does Rama hate?"1-2
Multi-hop"Which state is our hotel in?"2-3
Preference Recall"Does Mr. Sharma have any allergies?"1

6.3 Results

MetricVector RAGCognitiveDBImprovement
Simple Lookup Accuracy78%92%+14%
Entity-Specific Accuracy45%81%+36%
Multi-hop Accuracy23%67%+44%
Preference Recall62%89%+27%
Overall Accuracy52%82%+30%

6.4 Analysis

Simple Lookups: Both systems perform well, but CognitiveDB's graph-first approach provides more direct answers.

Entity-Specific Queries: The largest improvement comes from proper entity preservation in fact extraction. Vector RAG often retrieved generic facts ("The guest hates peanuts") instead of entity-specific ones ("Rama hates peanuts").

Multi-hop Queries: CognitiveDB's graph traversal enables answering questions that require following relationship chains (e.g., Hotel → located_in → City → is_in → State).

Preference Recall: Hybrid scoring boosts facts that mention concepts related to the query, improving recall of user preferences.

6.5 Ablation Study

ConfigurationAccuracy
Vector Only52%
+ Graph Traversal68%
+ Hybrid Scoring75%
+ Assertion Filtering79%
+ Source Tracking82%

Each component contributes to the overall improvement, with graph traversal providing the largest single gain.


7. Discussion

7.1 Limitations

Extraction Quality: The system's effectiveness depends heavily on LLM-based extraction quality. Poor entity recognition or relationship extraction degrades downstream performance.

Scalability: Graph traversal complexity grows exponentially with depth. We limit traversal to 3 hops, which may miss some long-range relationships.

Cold Start: New collections lack the semantic graph structure needed for graph-first retrieval, falling back to vector-only search.

Domain Specificity: Relation types are currently predefined. Domain-specific applications may require custom relation vocabularies.

7.2 Comparison with Related Systems

FeatureCognitiveDBVector RAGGraphRAGMem0
Episodic Memory
Semantic Graph
Multi-hop Traversal
Hybrid ScoringPartial
Assertion Filtering
Source Tracking
Salience Decay

7.3 Design Principles

Our experience developing CognitiveDB suggests several design principles for LLM memory systems:

Graph-First, Vector-Second: Structured knowledge should take precedence over semantic similarity for factual queries.
Preserve Entity Identity: Fact extraction must maintain entity names rather than generalizing to pronouns or generic terms.
Filter Negative Assertions: Statements about missing information should not be stored as facts.
Track Provenance: Distinguishing user-stated facts from LLM-extracted facts enables confidence weighting.
Hybrid Retrieval: No single retrieval method is optimal; combining vector, graph, and keyword approaches yields best results.

8. Future Work

8.1 Planned Enhancements

Semantic Deduplication: Consolidate semantically similar facts to reduce redundancy and improve retrieval precision.

Confidence Calibration: Learn optimal weights for hybrid scoring based on query type and domain.

Incremental Graph Learning: Update the semantic graph incrementally as new information is ingested, without full reprocessing.

Multi-modal Support: Extend the architecture to support image and audio memories alongside text.

8.2 Research Directions

Temporal Reasoning: Enable queries about temporal relationships ("What did Rama order last week?").

Causal Inference: Extend the graph to capture causal relationships and enable counterfactual reasoning.

Federated Memory: Support distributed memory across multiple agents while maintaining consistency.

Privacy-Preserving Retrieval: Implement differential privacy for sensitive memory retrieval.


9. Conclusion

We presented CognitiveDB, a hybrid memory system that combines episodic memory, semantic knowledge graphs, and vector embeddings to provide LLM applications with human-like memory capabilities. Our key innovations—assertion-aware fact extraction, graph-first retrieval with multi-hop traversal, and hybrid cognitive scoring—address fundamental limitations of traditional RAG systems.

Experimental results demonstrate significant improvements in factual accuracy, particularly for entity-specific and multi-hop queries. The system achieves 82% overall accuracy compared to 52% for vector-only RAG, representing a 30 percentage point improvement.

CognitiveDB is open-source and available at [repository URL], with SDKs for TypeScript and integration examples for popular LLM frameworks including LangChain and Vercel AI.


References

[1]
Brown, T., Mann, B., Ryder, N., et al. (2020). "Language Models are Few-Shot Learners." Advances in Neural Information Processing Systems, 33, 1877-1901. https://arxiv.org/abs/2005.14165
[2]
Ji, Z., Lee, N., Frieske, R., et al. (2023). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys, 55(12), 1-38. https://doi.org/10.1145/3571730
[3]
Lewis, P., Perez, E., Piktus, A., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems, 33, 9459-9474. https://arxiv.org/abs/2005.11401
[4]
Hogan, A., Blomqvist, E., Cochez, M., et al. (2021). "Knowledge Graphs." ACM Computing Surveys, 54(4), 1-37. https://doi.org/10.1145/3447772
[5]
Karpukhin, V., Oguz, B., Min, S., et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering." Proceedings of EMNLP, 6769-6781. https://arxiv.org/abs/2004.04906
[6]
Chen, D., Fisch, A., Weston, J., & Bordes, A. (2017). "Reading Wikipedia to Answer Open-Domain Questions." Proceedings of ACL, 1870-1879. https://arxiv.org/abs/1704.00051
[7]
Asai, A., Wu, Z., Wang, Y., et al. (2023). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." arXiv preprint arXiv:2310.11511. https://arxiv.org/abs/2310.11511
[8]
Guo, K., Shomer, H., Zeng, S., et al. (2025). "Empowering GraphRAG with Knowledge Filtering and Integration." arXiv preprint arXiv:2503.13804. https://arxiv.org/abs/2503.13804
[9]
Pan, J.Z., Vetere, G., Gomez-Perez, J.M., & Wu, H. (2017). Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer. https://doi.org/10.1007/978-3-319-45654-6
[10]
Edge, D., Trinh, H., Cheng, N., et al. (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." arXiv preprint arXiv:2404.16130. https://arxiv.org/abs/2404.16130
[11]
Mavromatis, C., & Karypis, G. (2024). "GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning." arXiv preprint arXiv:2405.20139. https://arxiv.org/abs/2405.20139
[12]
Ahmad, S., Nezami, Z., Hafeez, M., & Zaidi, S.A.R. (2025). "Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks." arXiv preprint arXiv:2507.03608. https://arxiv.org/abs/2507.03608
[13]
Li, Z., Li, Y., Zhu, Y., et al. (2025). "All-in-one Graph-based Indexing for Hybrid Search on GPUs." Proceedings of the VLDB Endowment, 19(1). https://www.vldb.org/pvldb/vol19/
[14]
Bordawekar, R., Bandyopadhyay, B., & Shmueli, O. (2017). "Cognitive Database: A Step towards Endowing Relational Databases with Artificial Intelligence Capabilities." arXiv preprint arXiv:1712.07199. https://arxiv.org/abs/1712.07199
[15]
Kocaman, V., Gul, Y., Kaya, M.A., et al. (2025). "Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP." Proceedings of Text2Story'25 Workshop, arXiv:2503.17425. https://arxiv.org/abs/2503.17425
[16]
Tulving, E. (1972). "Episodic and Semantic Memory." In E. Tulving & W. Donaldson (Eds.), Organization of Memory (pp. 381-403). Academic Press. https://doi.org/10.1016/B978-0-12-701750-4.50025-8
[17]
Malkov, Y.A., & Yashunin, D.A. (2020). "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs." IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 824-836. https://doi.org/10.1109/TPAMI.2018.2889473
[18]
Sarmah, B., Mehta, D., Hall, B., et al. (2024). "HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction." Proceedings of the 5th ACM International Conference on AI in Finance, 608-616. https://doi.org/10.1145/3677052.3698615
[19]
Wang, Y., Lipka, N., Rossi, R.A., et al. (2024). "Knowledge Graph Prompting for Multi-Document Question Answering." Proceedings of AAAI, 19206-19214. https://doi.org/10.1609/aaai.v38i17.29870
[20]
Luo, L., Ju, J., Xiong, B., et al. (2024). "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning." Proceedings of ICLR. https://openreview.net/forum?id=hUybyDDLlx

Manuscript submitted December 2025

Back to Research