Perseus: Efficient Language Modeling

June 30, 2025
8 min read

Executive Summary

In an era where language models grow exponentially in size and computational requirements, Perseus introduces a paradigm shift: mathematical innovation over brute-force scaling. Our 73M-parameter model, Madhuram, matches the performance of significantly larger models while requiring up to 45% fewer parameters.

Key Results

The Efficiency Crisis in Language Models

Current language model development follows the "bigger is better" formula, i.e., more parameters + more data = better performance. This approach has led to:

Perseus challenges this paradigm by demonstrating that innovation can achieve similar results with dramatically fewer resources.

Perseus Architecture: The Core Innovation

Advanced Positional Embeddings

Traditional language models use simple sinusoidal patterns to encode token positions. Perseus replaces this with a sophisticated mathematical framework that captures richer structural information about token relationships.

Key Technical Advantages:

Architectural Components

Perseus integrates several specialized components:

Training Methodology

Dataset Composition (130B tokens)

Training Configuration

Technical Implementation Challenges

Developing Perseus wasn't without challenges. Some of the most significant technical hurdles included:

Benchmark Performance

We evaluated Madhuram using lm-eval on standard NLP benchmarks, achieving an overall average accuracy of 43.82% across diverse commonsense reasoning and language understanding tasks. The results demonstrate that our 73M parameter model delivers competitive performance against larger models.

Task TypeMadhuramSmolLM2-135MMobileLLM-125MOPT-350M
ARC (Average)36.66%43.90%37.25%33.80%
HellaSwag34.62%42.10%39.50%40.10%
OBQA33%34.60%41.10%33.30%
PIQA63.76%68.40%65.70%64.80%
SIQA38.43%-42.90%42.60%
WinoGrande51.30%51.30%52.10%52.40%
BoolQ56.12%-60.40%54.00%
Average43.82%-47.03%43.90%

Table 1: Performance comparison of Madhuram with MobileLLM, OPT-350M [Refer here], and SmolLM2 [Refer here] across different NLP tasks.

Madhuram vs. GPT3-Small

BenchmarkMadhuram (Zero-shot)GPT3-Small (Zero-shot)GPT3-Small (Few-shot)
ARC-Easy47.47%43.60%42.70%
ARC-Challenge25.85%26.60%25.50%
HellaSwag34.62%33.70%35.50%
OBQA33%35.60%37.00%
PIQA63.76%64.60%64.30%
WinoGrande51.30%52.00%51.30%
BoolQ56.12%49.70%43.10%
Average44.59%43.69%42.77%

Table 2: Zero-shot and few-shot performance against GPT3-Small [Refer here]

Knowledge-Intensive Tasks: Madhuram vs. STAR-5/Quality

TaskMadhuramSTAR-5/QualityDifference
ARC (Easy)47.47%39.10%+8.37%
HellaSwag34.62%29.20%+5.42%
WinoGrande51.30%52.10%-0.80%
PIQA63.76%62.10%+1.66%
SciQ70.20%72.70%-2.50%
Average53.47%51.00%+2.47%

Table 3: Performance against Liquid AI's STAR-5/Quality model [Refer here] demonstrates Madhuram's edge in knowledge-intensive tasks.

Overall Efficiency Comparison

ModelParametersTraining TokensParameter Efficiency
Madhuram (Perseus)73M130B1.00x
SmolLM2-135M135M2T0.55x
MobileLLM-125M125M1T0.59x
GPT3-Small125M300B0.59x

Parameter Efficiency = (Madhuram Parameters / Comparison Model Parameters).

Real-World Impact & Applications

Limitations & Future Work

Current Limitations

Future Directions

Seeking Collaborators

We're actively seeking partnerships for:

Conclusion

Perseus demonstrates that the future of language models lies not in endless scaling, but in mathematical innovation and architectural efficiency. By rethinking fundamental components like positional embeddings, we can create models that are:

The implications extend beyond academic research to practical applications that can benefit from efficient, high-performance language understanding. As we continue developing Perseus, we invite the community to join us in building a more accessible and sustainable future for AI.


For collaboration opportunities, or technical questions, please reach out to our team. Together, we can make lightweight, accessible language models a reality for everyone.
Contact us at [email protected].