NeuroMemory Architecture

Status

Concept Phase. This document outlines the proposed architecture for long-term memory and state persistence. Implementation is currently in the roadmap phase.

The Memory Problem

Traditional LLMs suffer from two critical flaws regarding memory:

Amnesia: They rely on a "context window" (e.g., 4096 tokens). Once a conversation exceeds this, information is lost forever.
Privacy Paradox: If you train a centralized model on user data to give it memory, you leak secrets to every other user. If you don't, the model remains impersonal.

NeuroShard solves this via a Hybrid Memory Architecture: dissociating general intelligence (Global) from personal identity (Local).

The Hybrid Architecture

NeuroMemory uses a Three-Layer Compositional approach. Instead of a single monolithic weight matrix, the final model is a composition of three distinct layers at inference time.

Layer 1: The Cortex (Global Base Model)

Source: Genesis Dataset (Verified Public Data).
Nature: Immutable regarding user interactions. Contains logic, reasoning, grammar, and world knowledge.
Training: Global DiLoCo + PoNW.
Size: Full Model (e.g., 350M - 120B params).
Privacy: Public.

Layer 2: The Commons (Community Knowledge)

Source: Opt-in user contributions, sanitized and verified.
Nature: Shared cultural knowledge, news, real-time events.
Training: Aggregated "Fact Mining".
Size: Medium Adapter (LoRA).
Privacy: Public.

Layer 3: The Soul (Personal Memory)

Source: Private user conversations and preferences.
Nature: Secrets, biographical data, style preferences, long-term history.
Training: Local/Private LoRA updates.
Size: Small Adapter (e.g., 5-50MB).
Privacy: Strictly Private. Encrypted by user wallet.

Technical Implementation

1. Hierarchical Storage

Memory is not just one thing; it is a pipeline from short-term to long-term storage.

Tier	Technology	Persistence	Capacity	Latency
Working Memory	Context Window	Seconds	~2048 Tokens	Instant
Episodic Memory	Vector DB (RAG)	Days/Weeks	Unlimited	Low (Retrieval)
Procedural Memory	LoRA Weights	Permanent	Fixed Size	Zero (In-weights)

2. The Learning Loop

How does a conversation become a permanent memory?

Interaction: User chats with the node.
Extraction: A background process analyzes the chat (within the user's private enclave) to extract facts.
- Input: "I'm allergic to peanuts."
- Extraction: Fact(subject="User", relation="allergy", object="peanuts", confidence=0.99)
Vector Storage: The fact is stored in a local Vector DB for immediate RAG retrieval.
Consolidation (The "Dream" Phase):
- Periodically (e.g., nightly), the node fine-tunes the Personal LoRA Adapter on the accumulated facts in the Vector DB.
- This moves memory from "searchable text" to "learned intuition".

3. Low-Rank Adaptation (LoRA)

We cannot retrain the multi-gigabyte base model for every user. Instead, we use LoRA:

W_{f i n a l} = W_{b a s e} + Δ W_{c o m m u n i t y} + Δ W_{p e r s o n a l}

Efficiency: A LoRA adapter for a 7B model can be as small as 10MB.
Portability: Users can carry their "Soul" (adapter) on a USB drive or store it encrypted on IPFS.
Security: The base model weights $W_{b a s e}$ are never touched by private data.

Privacy & Ownership Model

The "Sovereign Mind" Principle

In NeuroShard, you own your AI's memory.

Encryption: Personal adapters are encrypted with the user's wallet private key.
Portability: If you switch nodes, you simply authorize the new node to load your encrypted adapter.
The "Right to Forget":
- Delete Fact: Remove specific entry from Vector DB.
- Wipe Memory: Delete the LoRA adapter. The AI instantly "forgets" you, reverting to the base model.

Threat Model Mitigation

Threat	Mitigation
Model Poisoning	Malicious data only affects the user's own Personal Adapter. The Base Model is immune.
Data Leakage	Personal Adapters are never merged into the Global Base Model.
Node Compromise	Adapters are encrypted at rest. In-memory context is flushed after inference.

Economic Model

Memory requires storage and compute.

Storage Fees: Users pay a small amount of NEURO to nodes for hosting their Encrypted Personal Adapters and Vector indices.
Contribution Rewards: Users can choose to "publish" non-sensitive facts to the Community Layer (e.g., "The cafe on Main St is closed today").
- If verified by consensus, the user earns NEURO.
- This incentivizes the creation of a real-time, shared world model.

Implementation Roadmap

Phase 1: Session Memory (Short-Term)

[ ] Implement local Vector Store (e.g., Chroma/FAISS).
[ ] Enable RAG (Retrieval Augmented Generation) for chat history.
[ ] Allow multi-turn conversations via API.

Phase 2: Personal Adapters (Long-Term)

[ ] Implement LoRA loading/unloading mechanism in inference engine.
[ ] Create "Fact Extraction" background worker.
[ ] Build encrypted storage format for adapters.

Phase 3: Community Consensus

[ ] Protocol for submitting facts to the Community Layer.
[ ] Verification mechanism for public facts.

NeuroMemory Architecture ​

The Memory Problem ​

The Hybrid Architecture ​

Layer 1: The Cortex (Global Base Model) ​

Layer 2: The Commons (Community Knowledge) ​

Layer 3: The Soul (Personal Memory) ​

Technical Implementation ​

1. Hierarchical Storage ​

2. The Learning Loop ​

3. Low-Rank Adaptation (LoRA) ​

Privacy & Ownership Model ​

The "Sovereign Mind" Principle ​

Threat Model Mitigation ​

Economic Model ​

Implementation Roadmap ​

Phase 1: Session Memory (Short-Term) ​

Phase 2: Personal Adapters (Long-Term) ​

Phase 3: Community Consensus ​