8 min read

Olmo 3: AI2’s Open‑Source 7B/32B LLM Built on Dolma 3 & Dolci Stack

AI

ThinkTools Team

AI Research Lead

Introduction

The Allen Institute for AI (AI2) has announced the release of Olmo 3, a new family of large language models (LLMs) that promises to bring unprecedented openness to the field of generative AI. Unlike many contemporary models that are released as black boxes, Olmo 3 is presented as a fully transparent “model flow” that includes every stage from raw data and code to intermediate checkpoints and deployment‑ready variants. This commitment to openness is not merely a marketing claim; it is a concrete shift toward reproducibility, auditability, and community collaboration. In this post we unpack the technical details of Olmo 3, examine how it builds on AI2’s Dolma 3 and Dolci stacks, and explore the broader implications for researchers, developers, and businesses that rely on LLMs.

AI2’s decision to open the entire pipeline is significant because it addresses a growing concern in the AI community: the opacity of large models. When a model’s training data, hyperparameters, and fine‑tuning procedures are hidden, it becomes difficult to assess biases, verify performance claims, or adapt the model to new tasks. By contrast, Olmo 3’s open‑source release includes not only the final weights but also the scripts that generate the training corpus, the configuration files that govern the transformer architecture, and the code that converts checkpoints into inference‑ready formats. This level of detail empowers researchers to replicate results, experiment with modifications, and contribute improvements back to the community.

The release also reflects a broader trend toward democratizing AI. As LLMs become more powerful, the cost of training them has skyrocketed, creating a barrier for smaller organizations and academic labs. Olmo 3 offers a middle ground: it provides state‑of‑the‑art performance with 7 billion and 32 billion parameter models while keeping the entire stack free and accessible. This approach could accelerate innovation by allowing developers to fine‑tune the models for niche applications without the need for massive compute budgets.

In the sections that follow, we dive into the architecture of Olmo 3, the foundations laid by Dolma 3 and Dolci, and the practical implications of a fully open model flow.

Main Content

The Olmo 3 Architecture

Olmo 3 is a dense transformer suite that follows the standard encoder‑decoder paradigm but with several key innovations. The 7 B variant is designed for lightweight inference, while the 32 B variant pushes the envelope in terms of contextual understanding and generation quality. Both models share a common backbone: a stack of 32 transformer layers, each equipped with multi‑head self‑attention and feed‑forward networks that use a GELU activation. What sets Olmo 3 apart is its adaptive layer scaling, which allows the model to allocate more computational resources to the most informative tokens during inference. This dynamic allocation reduces latency without sacrificing accuracy.

The architecture also incorporates a novel “think‑then‑write” mechanism. In the Olmo 3‑Think variant, the model first generates an internal representation of the answer before producing the final text. This two‑stage process improves consistency and reduces hallucinations, a common problem in large language models. The Olmo 3‑Instruct variant, meanwhile, is fine‑tuned on a curated instruction‑following dataset that emphasizes safety and alignment. By providing multiple variants, AI2 gives developers the flexibility to choose the right trade‑off between speed, safety, and fidelity.

Dolma 3 and Dolci Stack Foundations

The Dolma 3 stack is AI2’s internal framework for data preprocessing, tokenization, and dataset construction. Dolma 3 introduces a new tokenization scheme that blends byte‑pair encoding (BPE) with a dynamic vocabulary that expands as new data is ingested. This hybrid approach reduces out‑of‑vocabulary errors and improves the model’s ability to handle rare words, which is especially important for specialized domains.

Dolci, on the other hand, is the training and inference engine that powers Olmo 3. Dolci is built on top of PyTorch but includes custom kernels that accelerate matrix operations on both GPUs and TPUs. One of Dolci’s standout features is its checkpointing system, which stores intermediate states in a format that can be easily shared and restored. This design choice is crucial for the “model flow” philosophy: researchers can pause training, inspect intermediate weights, and resume without loss of fidelity.

By integrating Dolma 3 and Dolci, AI2 has created a seamless pipeline that takes raw text from the internet, processes it into a high‑quality training corpus, and trains a transformer that can be deployed in a variety of settings.

Transparency and the Model Flow

The model flow is perhaps the most revolutionary aspect of Olmo 3. AI2 has released a GitHub repository that contains every script, configuration file, and dataset split used in training. The repository is organized into three main directories: data, training, and deployment. In the data folder, users can find the raw text files, the tokenization scripts, and the final tokenized dataset. The training folder houses the hyperparameter configuration, the training loop, and the loss functions. Finally, the deployment folder contains scripts that convert the raw checkpoints into ONNX or TorchScript formats, ready for inference on edge devices.

This level of openness invites external audit. Researchers can verify that the data cleaning steps removed sensitive personal information, that the tokenization scheme does not introduce systematic bias, and that the training loss curves behave as expected. Moreover, the open checkpointing system allows developers to experiment with pruning, quantization, or distillation without needing to retrain from scratch.

Training Data and Methodology

Olmo 3’s training corpus is a curated blend of publicly available datasets, including Common Crawl, Wikipedia, Project Gutenberg, and a set of domain‑specific corpora such as biomedical literature and legal documents. AI2 has applied a rigorous filtering pipeline that removes low‑quality text, duplicates, and content that violates privacy or copyright. The final dataset contains roughly 1.2 trillion tokens for the 7 B model and 5.6 trillion tokens for the 32 B model.

The training methodology follows a two‑phase approach. In Phase 1, the models are trained on the general corpus to learn broad linguistic patterns. Phase 2 involves fine‑tuning on specialized datasets to imbue the models with domain knowledge. This staged training strategy balances generalization with specialization, ensuring that Olmo 3 can handle both open‑domain queries and technical prompts.

Deployment Variants and Use Cases

AI2 has released several deployment‑ready variants of Olmo 3. The base models are available in both Hugging Face and ONNX formats, making them compatible with a wide range of inference engines. The Olmo 3‑Think and Olmo 3‑Instruct variants come pre‑configured with safety mitigations that reduce the likelihood of generating disallowed content.

Potential use cases span from chatbots and virtual assistants to code generation and scientific literature summarization. Because the entire pipeline is open, companies can fine‑tune Olmo 3 on proprietary data, creating custom models that retain the benefits of the open architecture while protecting intellectual property.

Community Impact and Open‑Source Ecosystem

By releasing Olmo 3 as a fully open model family, AI2 is setting a new standard for transparency in LLM development. The open model flow invites contributions from academia, industry, and hobbyists alike. Early adopters have already begun experimenting with pruning techniques to deploy Olmo 3 on mobile devices, while researchers are exploring new fine‑tuning objectives that build on the Think‑then‑Write mechanism.

The broader impact is twofold. First, it lowers the barrier to entry for high‑quality LLMs, enabling smaller teams to build sophisticated AI applications. Second, it fosters a culture of reproducibility, where claims about model performance can be independently verified and improved upon.

Conclusion

Olmo 3 represents a milestone in the evolution of large language models. Its combination of 7 B and 32 B parameter sizes, adaptive architecture, and a fully open model flow positions it as a powerful tool for both research and production. By building on the Dolma 3 and Dolci stacks, AI2 has created a cohesive pipeline that demystifies the training process and invites community collaboration. As the AI landscape continues to grow, initiatives like Olmo 3 will play a pivotal role in ensuring that the benefits of advanced language models are accessible, trustworthy, and ethically grounded.

Call to Action

If you’re a researcher, developer, or AI enthusiast, we encourage you to dive into the Olmo 3 repository, experiment with the provided checkpoints, and contribute back to the community. Whether you’re fine‑tuning for a niche domain, exploring new safety mechanisms, or optimizing for edge deployment, your insights can help shape the next generation of open‑source language models. Join the conversation on GitHub, share your findings on social media, and let’s build a more transparent, inclusive AI ecosystem together.

We value your privacy

We use cookies, including Google Analytics, to improve your experience on our site. By accepting, you agree to our use of these cookies. Learn more