Skip to content

Introduction

NeuroShard is a decentralized architecture for collective intelligence — a protocol that turns a global network of heterogeneous devices into one continuously evolving optimizer that collectively trains a shared model, NeuroLLM, from random initialization.

The Problem with Centralized AI

The current landscape of artificial intelligence is defined by a small number of vertically integrated platforms that own the models, the data, and the distribution channels:

  • Centralized Control: Alignment, safety, and access policies are set unilaterally with limited transparency
  • Infrastructure Barriers: Training frontier models requires billion-dollar clusters, shutting out independent researchers
  • Compute Waste: While hyperscale data centers consume enormous energy, the world's edge compute remains unused
  • Data Privacy: Centralized training creates systemic privacy risks

The NeuroShard Solution

NeuroShard proposes a decentralized alternative: a protocol for collaborative model creation where the network, not any single operator, is where the model lives.

Core Principles

1. Organic Scalability

The model architecture is not fixed but elastic. Using a Dynamic Layer Pool, the model expands as new nodes join — from 85M (Nano) to 123B (XL) parameters — with no fixed tier jumps.

2. Verifiable Computation

Through Proof of Neural Work (PoNW), the network cryptographically validates that participants are performing useful training operations. Gradients are proofs of work. Chained PoNW creates tamper-evident epoch chains via ECDSA-signed gradient commitments.

3. Byzantine Tolerance

The training process is secured against malicious actors through robust statistics (Krum, Multi-Krum, Trimmed Mean, Geometric Median, Coordinate-wise Median) and a fraud-proof slashing mechanism.

4. Economic Alignment

The NEURO token aligns incentives, rewarding participants for verifiable contributions of compute, data, and uptime. Role multipliers (Driver 1.0×, Worker 0.8×, Validator 1.2×) and stake multipliers (up to 2×) shape earnings.

NeuroLLM: The Model

NeuroLLM is not a fork or fine-tune of an existing model. It is a new transformer architecture built from scratch for distributed, verifiable training:

FeatureDescription
Efficient GradientsOptimized for gradient compression and gossip transmission
Stable TrainingRobust to asynchronous, noisy gradient updates
Scalable ArchitectureGrows from 85M to 123B+ parameters as the network matures
Privacy-CompatibleSupports differential privacy in training data

Architecture Components

  • RMSNorm — More stable for distributed training than LayerNorm
  • Rotary Position Embeddings (RoPE) — No fixed maximum sequence length
  • Grouped Query Attention (GQA) — 3× reduction in KV cache size
  • SwiGLU Activation — Better gradient flow than ReLU or GELU
  • Mixture of Experts (MoE) — 8 expert networks per layer, top-2 routing, scarcity bonuses for rare experts
  • NeuroMemory — Three-tier persistent memory (Cortex / Commons / Soul) with personal LoRA adapters

How It Works

Key Differentiators

FeatureNeuroShardOther Systems
Model OwnershipCommunity-owned, trained from scratchPre-trained by corporations
ArchitectureDynamic, grows with networkFixed model sizes
TrainingDecentralized DiLoCo, 500× less commsCentralized data centers
ExpertsMoE per layer — efficient sparse activationDense models
MemoryPersistent NeuroMemory with personal LoRAStateless inference
RewardsProof of Neural Work — role + stake multipliersProof of Stake / Output
DataGenesis dataset (FineWeb-Edu, RedPajama)Unknown training data
No Pre-mineZero pre-mine, zero ICOFounder allocations common

Whitepaper

The full technical whitepaper is available to registered members at neuroshard.com/whitepaper. Create a free account to access it.

Next Steps

Released under the Apache License 2.0.