Building Madhuram was centered on achieving the minimum architectural complexity required for high-fidelity reasoning. This guide details the foundational choices that allow the model to operate efficiently across diverse environments.
Parameters
Context Window
Training Tokens
Architecture
Madhuram-v0.5 is optimized to maintain a minimal memory footprint while delivering competitive performance. With 150 million parameters trained on 275 billion tokens, the model is architected to run natively on edge devices without the traditional reliance on significant hardware quantization.
A primary feature of Madhuram is its 4096-token context window. This is supported by the Perseus architecture, which utilizes dense structural information within positional embeddings to capture long-range dependencies efficiently. The architecture enables precise retrieval across the full context length, allowing the model to maintain coherence and reference accuracy even in extended sequences.
Evaluated on standard NLP benchmarks, Madhuram-v0.5 demonstrates competitive performance against larger models in the same class. The following results reflect zero-shot performance across commonsense reasoning and language understanding tasks.
| Model | ARC-C | ARC-E | HellaSwag | OBQA | PIQA | WinoGrande | BoolQ | SIQA | Average |
|---|---|---|---|---|---|---|---|---|---|
| Madhuram-v0.5 | 31.40 | 58.96 | 45.94 | 34.00 | 70.84 | 52.17 | 56.57 | 38.54 | 48.55 |
| SmolLM2-135M | 34.50 | 58.90 | 43.60 | 41.10 | 68.90 | 52.80 | 60.50 | 43.50 | 50.48 |
| Gemma-3-270M-pt | 31.50 | 57.50 | 41.40 | 34.60 | 68.50 | 53.90 | 56.50 | 43.10 | 48.38 |
| MobileLLM125M-LS | 28.70 | 45.80 | 39.50 | 41.10 | 65.70 | 52.10 | 60.40 | 42.90 | 47.03 |
| MobileLLM-R1-140M | 32.50 | 47.30 | 32.90 | 31.50 | 62.50 | 51.00 | 57.20 | 42.60 | 44.69 |
Benchmarks evaluated using lm-evaluation-harness. Best scores in each category highlighted in green.
The model serves as a versatile baseline for distributed intelligence, targeting environments where low latency and privacy are paramount:
We continue to evaluate the model's performance on specialized benchmarks and expert reasoning tasks. For inquiries regarding specific evaluation results or research collaborations, please contact the lab.