CognitiveDB: A Hybrid Memory System for Large Language Model Applications
Authors: Biki Das
Affiliation: Independent Research
Date: December 2025
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, yet they suffer from fundamental limitations in persistent memory and contextual reasoning. We present CognitiveDB, a novel hybrid memory database that combines episodic memory storage, semantic knowledge graphs, and vector embeddings to provide LLM applications with human-like memory capabilities. Our system implements a three-tier memory architecture inspired by cognitive science: episodic memory for factual recall, semantic memory for conceptual relationships, and a unified retrieval mechanism that leverages graph traversal, vector similarity, and keyword matching. We introduce several key innovations including assertion-aware fact extraction, graph-first retrieval with multi-hop traversal, and hybrid cognitive scoring. Experimental results demonstrate that CognitiveDB significantly improves factual accuracy in LLM applications compared to traditional Retrieval-Augmented Generation (RAG) approaches, particularly for complex multi-hop reasoning queries.
Keywords: Memory Systems, Knowledge Graphs, Vector Databases, Large Language Models, Retrieval-Augmented Generation, Cognitive Architecture
1. Introduction
The emergence of Large Language Models has revolutionized natural language processing, enabling applications ranging from conversational agents to code generation systems [1]. However, LLMs face significant challenges when deployed in real-world applications that require persistent memory across sessions, accurate recall of user-specific information, and complex reasoning over accumulated knowledge [2].
Traditional approaches to augmenting LLM memory fall into two categories: (1) vector-based Retrieval-Augmented Generation (RAG) systems that store and retrieve text chunks based on embedding similarity [3], and (2) knowledge graph systems that maintain structured relationships between entities [4]. Each approach has distinct limitations: vector-based systems excel at semantic similarity but struggle with precise factual recall and multi-hop reasoning, while knowledge graphs provide structured reasoning but lack the flexibility of natural language understanding.
We present CognitiveDB, a hybrid memory system that addresses these limitations by integrating three complementary memory mechanisms:
Our key contributions include:
The remainder of this paper is organized as follows: Section 2 reviews related work, Section 3 describes the system architecture, Section 4 details our algorithms, Section 5 presents experimental results, and Section 6 concludes with future directions.
2. Related Work
2.1 Retrieval-Augmented Generation
RAG systems have emerged as the dominant paradigm for augmenting LLMs with external knowledge [3]. Lewis et al. introduced the foundational RAG architecture, which retrieves relevant documents based on query embeddings and incorporates them into the LLM context. Subsequent work has focused on improving retrieval quality through dense passage retrieval [5], hybrid sparse-dense methods [6], and iterative refinement [7].
However, RAG systems face fundamental limitations. As demonstrated by Guo et al. [8], retrieved information can be noisy or irrelevant, and over-reliance on external knowledge can suppress the model's intrinsic reasoning capabilities. Their GraphRAG-FI framework addresses this through two-stage filtering and integration with the LLM's internal knowledge.
2.2 Knowledge Graphs for LLMs
Knowledge graphs provide structured representations of entities and relationships, enabling logical reasoning and multi-hop queries [9]. Recent work has explored integrating knowledge graphs with LLMs through various approaches:
Li et al. [13] proposed an all-in-one graph-based index that unifies dense vectors, sparse vectors, full-text search, and knowledge graph retrieval within a single structure, demonstrating that hybrid approaches outperform single-path retrieval methods.
2.3 Cognitive Database Systems
Bordawekar et al. [14] introduced the concept of Cognitive Databases, proposing to endow relational databases with AI capabilities through word embeddings. Their approach treats structured data as meaningful unstructured text and uses vector space models to capture latent semantic relationships. This work inspired our approach of combining structured and unstructured representations.
2.4 Assertion Detection in NLP
Accurate extraction of factual information requires distinguishing between positive assertions, negations, and hypothetical statements. Kocaman et al. [15] demonstrated that assertion status detection is critical for accurately attributing extracted facts, identifying six assertion classes: present, absent, possible, hypothetical, conditional, and associated with someone else. We incorporate assertion-aware filtering into our fact extraction pipeline.
2.5 Memory Systems in Cognitive Science
Our architecture draws inspiration from cognitive science models of human memory. The distinction between episodic and semantic memory, first proposed by Tulving [16], forms the foundation of our dual-memory approach. Episodic memory stores specific experiences and events, while semantic memory maintains general knowledge and concepts. We extend this model with vector-based similarity search to enable flexible retrieval.
3. System Architecture
CognitiveDB implements a three-tier memory architecture designed to support LLM applications with persistent, queryable memory. Figure 1 illustrates the overall system design.
┌─────────────────────────────────────────────────────────────────┐
│ CognitiveDB │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Episodic │ │ Semantic │ │ Vector │ │
│ │ Memory │ │ Graph │ │ Store │ │
│ │ │ │ │ │ │ │
│ │ • Facts │ │ • Concepts │ │ • Embeddings│ │
│ │ • Insights │ │ • Relations │ │ • HNSW Index│ │
│ │ • Summaries │ │ • Traversal │ │ • Similarity│ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Hybrid Retrieval │ │
│ │ │ │
│ │ • Graph-First │ │
│ │ • Vector Search │ │
│ │ • Keyword Match │ │
│ │ • Cognitive Scoring │ │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ LLM Context │ │
│ │ Construction │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Figure 1: CognitiveDB System Architecture3.1 Episodic Memory Store
The episodic memory store maintains discrete memory units representing facts, conversations, insights, and summaries. Each memory is defined as:
Memory = {
id: UUID,
content: String,
type: MemoryType ∈ {Fact, Conversation, Insight, Summary},
collection: String,
timestamp: DateTime,
salience: Float ∈ [0, 1],
embedding_id: Option<UUID>,
metadata: Map<String, String>
}Key features of the episodic store include:
Salience Scoring: Each memory has an associated salience score representing its importance. Salience is computed based on:
Temporal Decay: Memory salience decays over time according to:
where S0 is the initial salience, λ is the decay rate, and t is time elapsed.
Memory Consolidation: Similar memories are periodically consolidated using LLM-based summarization, reducing redundancy while preserving key information.
3.2 Semantic Knowledge Graph
The semantic graph stores concepts and their relationships, enabling structured reasoning and multi-hop queries.
Concept = {
id: UUID,
name: String,
type: Option<ConceptType>, // person, place, thing, idea
collection: String,
embedding_id: Option<UUID>,
attributes: Map<String, String>
}
Relation = {
id: UUID,
from: UUID,
to: UUID,
relation_type: String,
weight: Float ∈ [0, 1],
collection: String
}Supported relation types include: has, is, located_in, part_of, related_to, prefers, likes, hates, allergic_to, works_at, manages, and others.
3.3 Vector Store
The vector store maintains dense embeddings for semantic similarity search. We use the HNSW (Hierarchical Navigable Small World) algorithm [17] for efficient approximate nearest neighbor search.
Each embedding is associated with either a memory or concept, enabling:
3.4 Storage Engine
CognitiveDB implements a Log-Structured Merge-tree (LSM) storage engine with:
The storage engine supports both in-memory and persistent modes, with automatic persistence of auxiliary data (semantic graph, vector indices) alongside the primary data.
4. Algorithms
4.1 Cognitive Ingestion Pipeline
When content is ingested into CognitiveDB, it undergoes a multi-stage processing pipeline:
Algorithm 1: Cognitive Ingestion
─────────────────────────────────────────────────────────────────
Input: content (String), collection (String), metadata (Map)
Output: IngestResult
1. embedding ← GenerateEmbedding(content)
2. memory ← CreateMemory(content, Conversation, collection)
3. memory.embedding_id ← StoreVector(embedding, collection)
4. Store(memory)
5.
6. // Background extraction (non-blocking)
7. SPAWN:
8. IF metadata.source_type = "user_input" THEN
9. facts ← ExtractFacts(content)
10. facts ← FilterNegativeAssertions(facts)
11. FOR each fact IN facts DO
12. fact_memory ← CreateMemory(fact, Fact, collection)
13. fact_embedding ← GenerateEmbedding(fact)
14. fact_memory.embedding_id ← StoreVector(fact_embedding)
15. Store(fact_memory)
16. END FOR
17.
18. concepts ← ExtractConcepts(content)
19. FOR each concept IN concepts DO
20. AddConcept(concept.name, collection)
21. END FOR
22.
23. relations ← ExtractRelationships(content)
24. FOR each rel IN relations DO
25. from_id ← FindOrCreateConcept(rel.from, collection)
26. to_id ← FindOrCreateConcept(rel.to, collection)
27. AddRelation(from_id, to_id, rel.type, rel.weight)
28. END FOR
29. END IF
30.
31. RETURN IngestResult(memory.id, facts, concepts)
─────────────────────────────────────────────────────────────────4.2 Assertion-Aware Fact Extraction
A critical innovation in CognitiveDB is the filtering of negative and hypothetical assertions during fact extraction. When an LLM responds with statements like "I don't have information about X", naive systems store this as a fact, which then pollutes future retrieval.
We implement a two-layer defense:
Layer 1: Prompt Engineering
The fact extraction prompt explicitly instructs the LLM to:
Layer 2: Pattern-Based Filtering
Extracted facts are filtered against a comprehensive set of negative assertion patterns:
NEGATIVE_PATTERNS = {
"does not have information",
"doesn't know",
"no information about",
"not specified",
"not mentioned",
"unable to find",
"speaker does not",
"unknown",
...
}
Function FilterNegativeAssertions(facts):
RETURN facts.filter(f →
NOT any(pattern IN NEGATIVE_PATTERNS
WHERE f.content.toLowerCase().contains(pattern)))This approach is inspired by assertion detection research in clinical NLP [15], where distinguishing between present, absent, and hypothetical assertions is critical for accurate information extraction.
4.3 Graph-First Retrieval with Multi-Hop Traversal
Traditional RAG systems rely primarily on vector similarity, which can miss relevant information that is semantically distant but logically connected. CognitiveDB implements a graph-first retrieval strategy that traverses the semantic graph before falling back to vector search.
Algorithm 2: Graph-First Retrieval
─────────────────────────────────────────────────────────────────
Input: query (String), collection (String), max_hops (Int)
Output: List<GraphKnowledge>
1. query_words ← Tokenize(query).filter(w → w.length ≥ 3)
2. concepts ← GetConcepts(collection)
3. relations ← GetRelations(collection)
4. adjacency ← BuildAdjacencyList(relations)
5.
6. // Find starting concepts matching query
7. starting_concepts ← []
8. FOR each concept IN concepts DO
9. score ← ComputeMatchScore(concept.name, query, query_words)
10. IF score > 0.3 THEN
11. starting_concepts.append((concept, score))
12. END IF
13. END FOR
14.
15. starting_concepts.sortByScoreDescending()
16. knowledge_paths ← []
17.
18. // BFS traversal from each starting concept
19. FOR each (start, score) IN starting_concepts.take(5) DO
20. visited ← {}
21. queue ← [(start.id, [start.id], [start.name], score, 0)]
22.
23. WHILE queue NOT empty DO
24. (current, path, names, conf, hops) ← queue.pop()
25. IF hops ≥ max_hops THEN CONTINUE
26. visited.add(current)
27.
28. FOR each (relation, neighbor) IN adjacency[current] DO
29. IF neighbor IN visited THEN CONTINUE
30.
31. new_names ← names + ["-[" + relation.type + "]->",
32. GetConceptName(neighbor)]
33. new_conf ← conf × relation.weight
34.
35. IF hops + 1 ≥ 1 THEN
36. statement ← GenerateStatement(new_names)
37. knowledge_paths.append(GraphKnowledge{
38. path: new_names.join(" "),
39. statement: statement,
40. confidence: new_conf,
41. hops: hops + 1,
42. source: start.name
43. })
44. END IF
45.
46. IF hops + 1 < max_hops THEN
47. queue.push((neighbor, path + [neighbor],
48. new_names, new_conf, hops + 1))
49. END IF
50. END FOR
51. END WHILE
52. END FOR
53.
54. RETURN knowledge_paths.sortByConfidence().deduplicate().take(10)
─────────────────────────────────────────────────────────────────4.4 Hybrid Cognitive Scoring
CognitiveDB combines multiple signals to compute a unified cognitive score for each retrieved memory. This approach is inspired by research showing that hybrid retrieval methods outperform single-path approaches [12, 13].
Algorithm 3: Hybrid Cognitive Scoring
─────────────────────────────────────────────────────────────────
Input: query, memories, concepts, graph_knowledge
Output: List<ScoredMemory>
1. query_words ← Tokenize(query).filter(w → w.length ≥ 3)
2.
3. // Build relevant terms from concepts and their relations
4. relevant_terms ← {}
5. FOR each concept IN concepts DO
6. relevant_terms.add(concept.name.toLowerCase())
7. FOR each rel IN concept.relations DO
8. relevant_terms.add(rel.target_name.toLowerCase())
9. END FOR
10. END FOR
11.
12. FOR each memory IN memories DO
13. content_lower ← memory.content.toLowerCase()
14.
15. // Base cognitive score (from vector similarity, recency, salience)
16. score ← memory.cognitive_score
17.
18. // Keyword match boost
19. keyword_matches ← count(w IN query_words WHERE content_lower.contains(w))
20. score += 0.1 × min(keyword_matches, 3)
21.
22. // Graph connectivity boost
23. FOR each concept IN concepts DO
24. IF content_lower.contains(concept.name.toLowerCase()) THEN
25. score += 0.15 × concept.relevance
26. END IF
27.
28. FOR each rel IN concept.relations DO
29. IF content_lower.contains(rel.target_name.toLowerCase()) THEN
30. // Higher boost if relation matches query semantically
31. IF any(w IN query_words WHERE
32. rel.type.contains(w) OR rel.target_name.contains(w)) THEN
33. score += 0.2 × concept.relevance × rel.weight
34. ELSE
35. score += 0.1 × concept.relevance × rel.weight
36. END IF
37. END IF
38. END FOR
39. END FOR
40.
41. // Graph knowledge path boost
42. FOR each gk IN graph_knowledge DO
43. gk_words ← Tokenize(gk.statement)
44. matches ← count(w IN gk_words WHERE w.length ≥ 3 AND content_lower.contains(w))
45. IF matches ≥ 2 THEN
46. score += 0.15 × gk.confidence
47. END IF
48. END FOR
49.
50. // Source confidence boost
51. IF memory.metadata.source_type = "user_input" THEN
52. score += 0.1
53. END IF
54.
55. memory.cognitive_score ← score
56. END FOR
57.
58. RETURN memories.sortByCognitiveScoreDescending()
─────────────────────────────────────────────────────────────────The cognitive score combines:
4.5 Concept-First Context Construction
When building context for LLM prompts, CognitiveDB prioritizes structured knowledge over raw facts. This approach ensures that graph-derived knowledge (which represents verified relationships) takes precedence over potentially noisy vector-retrieved content.
Algorithm 4: Context Construction
─────────────────────────────────────────────────────────────────
Input: graph_knowledge, concepts, facts
Output: context (String)
1. context_parts ← []
2.
3. // Priority 1: Graph Knowledge (highest confidence)
4. IF graph_knowledge NOT empty THEN
5. graph_context ← graph_knowledge
6. .filter(k → k.confidence > 0.3)
7. .map(k → "• " + k.statement + " (via: " + k.source + ")")
8. .join("\n")
9. IF graph_context NOT empty THEN
10. context_parts.append("Known facts from knowledge graph:\n" + graph_context)
11. END IF
12. END IF
13.
14. // Priority 2: Concepts with Relations
15. IF concepts NOT empty THEN
16. concept_context ← concepts.map(c →
17. "• " + c.name +
18. (c.type ? " (" + c.type + ")" : "") +
19. (c.relations NOT empty ?
20. " → " + c.relations.map(r → r.type + " " + r.target).join(", ")
21. : "")
22. ).join("\n")
23. context_parts.append("Related concepts:\n" + concept_context)
24. END IF
25.
26. // Priority 3: Supporting Facts
27. IF facts NOT empty THEN
28. facts_context ← facts.map(f → "• " + f.content).join("\n")
29. context_parts.append("Supporting facts:\n" + facts_context)
30. END IF
31.
32. IF context_parts empty THEN
33. RETURN "No previous conversation history."
34. END IF
35.
36. RETURN context_parts.join("\n\n---\n\n")
─────────────────────────────────────────────────────────────────This prioritization ensures that:
5. Implementation
CognitiveDB is implemented in Rust for performance and memory safety, with the following components:
5.1 Technology Stack
| Component | Technology |
|---|---|
| Core Engine | Rust |
| Storage | Custom LSM-tree with WAL |
| Vector Index | HNSW (custom implementation) |
| Embeddings | Google Gemini / OpenAI |
| LLM Integration | Provider-agnostic interface |
| API | HTTP (REST) + gRPC |
| SDK | TypeScript |
5.2 API Design
CognitiveDB exposes a RESTful API with the following endpoints:
POST /ingest - Ingest content with cognitive processing
POST /recall - Retrieve memories using hybrid search
POST /store - Store raw memory without processing
GET /memory/:id - Retrieve specific memory
DELETE /memory/:id - Delete memory
POST /decay - Apply salience decay
POST /consolidate - Consolidate similar memories
POST /reflect - Generate insights from recent memories
GET /stats - Collection statistics
GET /graph - Knowledge graph visualization
POST /purge - Clear collection5.3 Performance Characteristics
| Operation | Complexity | Typical Latency |
|---|---|---|
| Ingest | O(d) + O(log n) | 50-200ms |
| Recall | O(k log n) + O(m) | 20-100ms |
| Graph Traversal | O(b^h) | 5-50ms |
| Vector Search | O(log n) | 10-30ms |
Where:
6. Experimental Evaluation
6.1 Experimental Setup
We evaluate CognitiveDB on a hotel management assistant scenario, where the system must remember guest preferences, policies, and relationships across multiple conversations.
Dataset: 50 guest profiles with preferences, allergies, and booking history
Queries: 200 test queries ranging from simple lookups to multi-hop reasoning
Baseline: Standard RAG with vector-only retrieval
6.2 Query Categories
| Category | Example | Hops Required |
|---|---|---|
| Simple Lookup | "What is the check-in time?" | 1 |
| Entity-Specific | "What food does Rama hate?" | 1-2 |
| Multi-hop | "Which state is our hotel in?" | 2-3 |
| Preference Recall | "Does Mr. Sharma have any allergies?" | 1 |
6.3 Results
| Metric | Vector RAG | CognitiveDB | Improvement |
|---|---|---|---|
| Simple Lookup Accuracy | 78% | 92% | +14% |
| Entity-Specific Accuracy | 45% | 81% | +36% |
| Multi-hop Accuracy | 23% | 67% | +44% |
| Preference Recall | 62% | 89% | +27% |
| Overall Accuracy | 52% | 82% | +30% |
6.4 Analysis
Simple Lookups: Both systems perform well, but CognitiveDB's graph-first approach provides more direct answers.
Entity-Specific Queries: The largest improvement comes from proper entity preservation in fact extraction. Vector RAG often retrieved generic facts ("The guest hates peanuts") instead of entity-specific ones ("Rama hates peanuts").
Multi-hop Queries: CognitiveDB's graph traversal enables answering questions that require following relationship chains (e.g., Hotel → located_in → City → is_in → State).
Preference Recall: Hybrid scoring boosts facts that mention concepts related to the query, improving recall of user preferences.
6.5 Ablation Study
| Configuration | Accuracy |
|---|---|
| Vector Only | 52% |
| + Graph Traversal | 68% |
| + Hybrid Scoring | 75% |
| + Assertion Filtering | 79% |
| + Source Tracking | 82% |
Each component contributes to the overall improvement, with graph traversal providing the largest single gain.
7. Discussion
7.1 Limitations
Extraction Quality: The system's effectiveness depends heavily on LLM-based extraction quality. Poor entity recognition or relationship extraction degrades downstream performance.
Scalability: Graph traversal complexity grows exponentially with depth. We limit traversal to 3 hops, which may miss some long-range relationships.
Cold Start: New collections lack the semantic graph structure needed for graph-first retrieval, falling back to vector-only search.
Domain Specificity: Relation types are currently predefined. Domain-specific applications may require custom relation vocabularies.
7.2 Comparison with Related Systems
| Feature | CognitiveDB | Vector RAG | GraphRAG | Mem0 |
|---|---|---|---|---|
| Episodic Memory | ✓ | ✓ | ✗ | ✓ |
| Semantic Graph | ✓ | ✗ | ✓ | ✗ |
| Multi-hop Traversal | ✓ | ✗ | ✓ | ✗ |
| Hybrid Scoring | ✓ | ✗ | Partial | ✗ |
| Assertion Filtering | ✓ | ✗ | ✗ | ✗ |
| Source Tracking | ✓ | ✗ | ✗ | ✗ |
| Salience Decay | ✓ | ✗ | ✗ | ✓ |
7.3 Design Principles
Our experience developing CognitiveDB suggests several design principles for LLM memory systems:
8. Future Work
8.1 Planned Enhancements
Semantic Deduplication: Consolidate semantically similar facts to reduce redundancy and improve retrieval precision.
Confidence Calibration: Learn optimal weights for hybrid scoring based on query type and domain.
Incremental Graph Learning: Update the semantic graph incrementally as new information is ingested, without full reprocessing.
Multi-modal Support: Extend the architecture to support image and audio memories alongside text.
8.2 Research Directions
Temporal Reasoning: Enable queries about temporal relationships ("What did Rama order last week?").
Causal Inference: Extend the graph to capture causal relationships and enable counterfactual reasoning.
Federated Memory: Support distributed memory across multiple agents while maintaining consistency.
Privacy-Preserving Retrieval: Implement differential privacy for sensitive memory retrieval.
9. Conclusion
We presented CognitiveDB, a hybrid memory system that combines episodic memory, semantic knowledge graphs, and vector embeddings to provide LLM applications with human-like memory capabilities. Our key innovations—assertion-aware fact extraction, graph-first retrieval with multi-hop traversal, and hybrid cognitive scoring—address fundamental limitations of traditional RAG systems.
Experimental results demonstrate significant improvements in factual accuracy, particularly for entity-specific and multi-hop queries. The system achieves 82% overall accuracy compared to 52% for vector-only RAG, representing a 30 percentage point improvement.
CognitiveDB is open-source and available at [repository URL], with SDKs for TypeScript and integration examples for popular LLM frameworks including LangChain and Vercel AI.
References
Manuscript submitted December 2025