Introduction
For most of 2025 the frontier of open‑weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou. Chinese research labs such as Alibaba’s Qwen, DeepSeek, Moonshot, and Baidu have been rapidly pushing the envelope with large‑scale, open Mixture‑of‑Experts (MoE) models that combine permissive licensing with top‑tier benchmark performance. While OpenAI has also released its own open‑source LLMs—gpt‑oss‑20B and 120B—the uptake of these models has been slowed by a flood of equally or better‑performing alternatives.
Against this backdrop, a small U.S. company is taking a bold step to reclaim the narrative. Arcee AI, a startup that has previously focused on compact, enterprise‑grade models, has announced the release of Trinity Mini and Trinity Nano Preview, the first two models in its new “Trinity” family. These models are fully trained in the United States, built from scratch on American infrastructure, and released under an enterprise‑friendly Apache 2.0 license. The launch signals a rare attempt by a U.S. startup to build end‑to‑end open‑weight models at scale, offering businesses and developers the ability to own the weights and the training pipeline.
In this post we explore the technical innovations behind Trinity, the strategic motivations that drive Arcee’s vision of model sovereignty, and the broader implications for the open‑source AI ecosystem.
Main Content
Trinity Models and Architecture
Trinity Mini is a 26‑billion‑parameter MoE model that activates only 3 billion parameters per token. Trinity Nano Preview is a 6‑billion‑parameter model with roughly 800 million active non‑embedding parameters. Both models employ Arcee’s new Attention‑First Mixture‑of‑Experts (AFMoE) architecture, a custom MoE design that blends global sparsity, local and global attention, and gated attention techniques.
AFMoE departs from traditional MoE by tightly integrating sparse expert routing with an enhanced attention stack. It uses sigmoid routing, a smoother method that blends multiple experts rather than selecting a single one, and incorporates grouped‑query attention, gated attention, and a local/global pattern that improves long‑context reasoning. The result is a model that is more stable during training, more efficient at scale, and better at understanding longer conversations.
The architecture also emphasizes depth‑scaled normalization and avoids auxiliary loss, allowing the model to scale to deeper layers without divergence. With 128 experts and 8 active per token, Trinity Mini can process context windows up to 131,072 tokens, depending on the provider.
Benchmark Performance and Practical Use
Trinity Mini’s performance on a range of reasoning tasks is competitive with larger models. On the SimpleQA benchmark, which tests factual recall and uncertainty handling, Trinity Mini outperforms gpt‑oss. On MMLU (zero‑shot), it scores 84.95, while on Math‑500 it achieves 92.10. The model also performs strongly on GPQA‑Diamond and BFCL V3, which evaluate multi‑step function calling and real‑world tool use.
Latency and throughput measurements across providers such as Together and Clarifai show over 200 tokens per second throughput with sub‑three‑second end‑to‑end latency, making Trinity Mini viable for interactive applications and agent pipelines. Trinity Nano, while smaller and less stable on edge cases, demonstrates the viability of sparse MoE architecture at under 1 billion active parameters per token.
Open‑Source Licensing and Ecosystem Integration
Both Trinity models are released under the permissive Apache 2.0 license, allowing unrestricted commercial and research use. Trinity Mini is available on Hugging Face, OpenRouter, and Arcee’s own chat interface at chat.arcee.ai. API pricing via OpenRouter is $0.045 per million input tokens and $0.15 per million output tokens, with a free tier for a limited time.
The models are already integrated into popular applications such as Benchable.ai, Open WebUI, and SillyTavern, and are supported by Hugging Face Transformers, VLLM, LM Studio, and llama.cpp. This wide ecosystem support lowers the barrier to entry for developers who want to experiment with or deploy the models.
Data Sovereignty with DatologyAI
A key differentiator for Arcee is its control over training data. Unlike many open models that rely on web‑scraped or legally ambiguous datasets, Arcee partnered with DatologyAI—a data curation startup founded by former Meta and DeepMind researcher Ari Morcos—to build a clean, high‑quality corpus. DatologyAI’s platform automates filtering, deduplication, and quality enhancement across modalities, ensuring the training data is free from noise, bias, and copyright risk.
For Trinity, DatologyAI helped construct a 10‑trillion‑token curriculum organized into three phases: 7 T of general data, 1.8 T of high‑quality text, and 1.2 T of STEM‑heavy material, including math and code. The same partnership powered Arcee’s earlier AFM‑4.5B model but on a smaller scale. For Trinity Large, DatologyAI has produced over 10 trillion synthetic tokens paired with 10 T curated web tokens to form a 20 T‑token training corpus.
Infrastructure with Prime Intellect
Training a 26‑billion‑parameter MoE model in the United States requires significant compute resources. Arcee’s infrastructure partner, Prime Intellect, provides a decentralized GPU marketplace and training stack that supports large‑scale training. Prime Intellect supplied the orchestration stack, modified TorchTitan runtime, and a 512‑GPU H200 cluster running a bf16 pipeline with HSDP parallelism for Trinity Mini and Nano. It also hosts the 2048‑GPU B300 cluster used to train Trinity Large.
While Prime Intellect’s long‑term vision is decentralized compute, its short‑term value for Arcee lies in efficient, transparent training infrastructure that remains under U.S. jurisdiction, with known provenance and security controls.
The Thesis of Model Sovereignty
Arcee’s push into full pre‑training reflects a broader thesis: the future of enterprise AI will depend on owning the training loop, not just fine‑tuning. As systems evolve to adapt from live usage and interact with tools autonomously, compliance and control over training objectives will matter as much as raw performance.
“As applications get more ambitious, the boundary between ‘model’ and ‘product’ keeps moving,” CTO Lucas Atkins notes. “To build that kind of software you need to control the weights and the training pipeline, not only the instruction layer.”
This framing sets Trinity apart from other open‑weight efforts that patch existing base models. Arcee has built its own foundation—from data to deployment, infrastructure to optimizer—alongside partners who share its vision of openness and sovereignty.
Looking Ahead: Trinity Large
Training is currently underway for Trinity Large, a 420‑billion‑parameter MoE model that scales the AFMoE architecture to a larger expert set. The dataset includes 20 T tokens, split evenly between synthetic data from DatologyAI and curated web data. The model is expected to launch in January 2026, with a full technical report to follow.
If successful, Trinity Large would become one of the only fully open‑weight, U.S.‑trained frontier‑scale models, positioning Arcee as a serious player in the open ecosystem at a time when most American LLM efforts are either closed or based on non‑U.S. foundations.
Conclusion
Arcee’s Trinity launch is a rare and bold statement in a landscape dominated by Chinese research labs. By building the first U.S.‑trained open‑weight MoE models from scratch, Arcee demonstrates that small, lesser‑known companies can still push the boundaries of AI while maintaining transparency, data sovereignty, and an open‑source ethos. The combination of a robust AFMoE architecture, clean data pipelines, and U.S.‑based infrastructure gives Trinity a unique position in the ecosystem. Whether Trinity Large can match the capabilities of its better‑funded peers remains to be seen, but the momentum generated by Trinity Mini and Nano already shows that model sovereignty—owning the weights and the training loop—may well define the next era of AI.
Call to Action
If you’re a developer, researcher, or business leader looking to experiment with cutting‑edge open‑weight models, explore Trinity Mini on Hugging Face or try it out in the chat interface at chat.arcee.ai. For those interested in the underlying architecture, the AFMoE design and training pipeline are available for review, and the models can be fine‑tuned or integrated into your own applications under the Apache 2.0 license. Join the conversation on X, contribute to the open‑source community, and help shape a future where AI models are built, owned, and governed in the United States.