Abstract
Fortytwo, a novel protocol using swarm inference and distributed pairwise ranking consensus, outperforms majority voting and demonstrates higher accuracy and resilience in decentralized AI systems.
As centralized AI hits compute ceilings and diminishing returns from ever-larger training runs, meeting demand requires an inference layer that scales horizontally in both capacity and capability. We present Fortytwo, a novel protocol that leverages swarm intelligence principles and distributed pairwise ranking consensus to achieve superior performance in AI inference. Our approach reimagines collaboration among AI nodes using swarm inference: a peer-ranked, reputation-weighted consensus across heterogeneous models that surfaces the highest-quality responses. Using pairwise ranking with a custom Bradley-Terry-style aggregation model, we demonstrate that swarm inference substantially outperforms majority voting, achieving 85.90% on GPQA Diamond versus 68.69% for majority voting with the same model set - an improvement of +17.21 percentage points (approximately +25.1% relative). The protocol incorporates on-chain reputation so node influence adapts to demonstrated accuracy over time, yielding a meritocratic consensus that filters low-quality or malicious participants. To resist Sybil attacks, Fortytwo employs proof-of-capability in its consensus: nodes must successfully complete calibration/test requests and stake reputation to enter ranking rounds, making multi-identity attacks economically unattractive while preserving openness. Across six challenging benchmarks, including GPQA Diamond, LiveCodeBench, and AIME, our evaluation indicates higher accuracy and strong resilience to adversarial and noisy free-form prompting (e.g., prompt-injection degradation of only 0.12% versus 6.20% for a monolithic single-model baseline), while retaining practical deployability. Together, these results establish a foundation for decentralized AI systems - democratizing access to high-quality inference through collective intelligence without sacrificing reliability or security.
Community
This paper introduces Fortytwo, a decentralized AI inference protocol that coordinates heterogeneous models through swarm inference: a peer-ranked, reputation-weighted consensus mechanism. We extend the Bradley–Terry aggregation framework for distributed ranking and show that collective consensus significantly improves inference quality over majority voting, achieving 85.9% on GPQA Diamond (+17.21 points, +25.1% relative). The approach demonstrates strong resilience to noisy and adversarial prompts, with only 0.12% degradation under prompt injection (extraneous info / CatAttack) versus 6.20% for single-model baselines.
Models citing this paper 2
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
