Introduction
In the last few years the AI landscape has been dominated by a handful of gigantic language models that require massive GPU clusters for training and inference. For most enterprises, however, the reality of deploying AI is far from the laboratory conditions that fuel these research breakthroughs. Latency budgets, memory limits, and thermal constraints on laptops, tablets, and embedded devices mean that a model that performs well on a cloud‑based testbed may still be unusable in production. Liquid AI, a startup spun out of MIT in 2023, has taken a different tack. Rather than chasing ever larger parameter counts, the company has focused on building a family of small, efficient foundation models that can run in real‑world environments without sacrificing quality. The company’s second generation of Liquid Foundation Models (LFM2) was unveiled in July 2025, and the team has now released a 51‑page technical report that goes beyond the usual weight‑and‑score narrative. The report lays out the entire design pipeline—from hardware‑in‑the‑loop architecture search to a curriculum‑driven training regime and a post‑training recipe that turns a modestly sized model into a reliable agent. For the first time, the details are available to anyone who wants to replicate or adapt the approach for their own hardware and data constraints.
The significance of this release extends beyond the academic community. Enterprises that have long been forced to rely on cloud inference for complex reasoning tasks now have a tangible, open‑source alternative that can be deployed locally on commodity CPUs or mobile SoCs. By making the entire process transparent, Liquid AI is effectively handing over a playbook that can be customized to fit the unique operational constraints of any organization, from a small startup to a multinational corporation.
In this post we unpack the key innovations behind LFM2, examine how they address the pain points of enterprise AI, and explore the broader implications for the future of hybrid edge‑cloud architectures.
Main Content
Architecture Search on Real Hardware
The first pillar of LFM2’s success is its commitment to hardware‑centric design. Rather than performing architecture search in an abstract simulation, Liquid AI ran the search directly on the target devices that the models were intended for: Snapdragon mobile SoCs and Ryzen laptop CPUs. This approach ensured that the resulting architecture was not only theoretically efficient but also practically viable under real‑world conditions. The search converged on a minimal hybrid design that relies heavily on gated short convolution blocks and a modest number of grouped‑query attention (GQA) layers. These components were repeatedly chosen over more exotic linear‑attention or state‑space hybrids because they delivered a superior quality‑latency‑memory Pareto front when measured on actual hardware.
The implications for enterprise teams are threefold. First, the architecture’s simplicity translates into predictability; the same structural backbone scales smoothly from 350 M to 2.6 B parameters without introducing new bottlenecks. Second, because dense and mixture‑of‑experts variants share the same core, deployment across heterogeneous hardware fleets becomes straightforward. Finally, the models achieve roughly twice the CPU throughput of comparable open models, meaning that routine inference can be handled locally without resorting to cloud endpoints.
Training Pipeline Tailored for Enterprise
A model’s architecture is only part of the story; how it is trained determines its real‑world behavior. LFM2’s training pipeline is deliberately engineered to compensate for the smaller parameter budget through structure rather than brute force. The team began with 10–12 T tokens of pre‑training data, followed by a 32 K‑context mid‑training phase that expands the usable context window without a proportional increase in compute. This strategy allows the model to process longer documents—a critical requirement for many enterprise use cases—while keeping the training cost manageable.
The distillation strategy is another key differentiator. Instead of the conventional KL divergence that can become unstable when teachers provide only partial logits, LFM2 uses a decoupled Top‑K knowledge distillation objective. This approach focuses the student on the most informative logits, improving robustness and reducing training variance.
After pre‑training, the model undergoes a three‑stage post‑training sequence: supervised fine‑tuning (SFT), length‑normalized preference alignment, and model merging. The final step, model merging, blends multiple checkpoints to produce a single, more reliable model. Together, these stages produce a system that behaves more like a practical agent than a toy LLM. It can follow structured formats, adhere to JSON schemas, and manage multi‑turn conversations with a level of consistency that is often missing from similarly sized open models.
Multimodal Extensions for Edge
The LFM2 family is not limited to text. Liquid AI has released a vision‑language variant (LFM2‑VL) and an audio variant (LFM2‑Audio) that are both designed with token efficiency in mind. LFM2‑VL attaches a SigLIP2 encoder through a connector that aggressively reduces visual token count via PixelUnshuffle. High‑resolution inputs trigger dynamic tiling, ensuring that the token budget remains within the limits of mobile hardware. LFM2‑Audio, on the other hand, uses a bifurcated audio path—one for embeddings and one for generation—enabling real‑time transcription or speech‑to‑speech on modest CPUs.
These multimodal designs demonstrate that it is possible to add perception capabilities to small models without blowing up the computational footprint. For enterprises, this means that document understanding can happen directly on field devices, audio transcription can be performed locally for privacy compliance, and multimodal agents can operate within fixed latency envelopes without streaming data off‑device.
Retrieval Models for Agent Systems
Retrieval‑augmented generation (RAG) is a cornerstone of many modern AI workflows, but most retrieval models are too large to run efficiently on edge devices. LFM2‑ColBERT extends late‑interaction retrieval into a footprint that is small enough for enterprise deployments that require multilingual RAG without the overhead of specialized vector‑database accelerators. By running retrieval on the same hardware as the reasoning model, the system eliminates the need to transfer documents across network boundaries, thereby reducing latency and improving governance.
The combination of VL, Audio, and ColBERT variants illustrates that LFM2 is a modular ecosystem rather than a single monolithic model. Each component can be swapped in or out depending on the specific requirements of a given application.
Blueprint for Hybrid Enterprise AI
When viewed as a whole, the LFM2 report sketches a compelling vision for tomorrow’s enterprise AI stack: a hybrid local‑cloud orchestration where small, fast models handle time‑critical perception, formatting, and tool invocation, while larger cloud models provide heavyweight reasoning when needed. This architecture offers several tangible benefits.
Cost control is a major advantage; routine inference handled locally avoids unpredictable cloud billing. Latency determinism is another; on‑device inference eliminates network jitter, which is critical for agent workflows that rely on time‑sensitive decisions. Governance and compliance are simplified because data never leaves the device boundary, making it easier to satisfy privacy regulations. Finally, resilience is enhanced; if the cloud path becomes unavailable, the system can continue to operate using the local models.
In practice, enterprises adopting this hybrid approach will treat the small on‑device models as the control plane of agentic workflows, with large cloud models serving as on‑demand accelerators. LFM2 provides one of the clearest open‑source foundations for that control layer.
Strategic Takeaway
For years, the prevailing wisdom in enterprise AI has been that real‑world applications require cloud inference. LFM2 challenges that assumption by delivering competitive performance across reasoning, instruction following, multilingual tasks, and RAG while achieving substantial latency gains over other open small‑model families. The result is a set of small, open, on‑device models that are strong enough to carry meaningful slices of production workloads.
While LFM2 will not replace frontier‑scale cloud models for the most demanding reasoning tasks, it offers enterprises a reproducible, open, and operationally feasible foundation for building agentic systems that must run anywhere—from phones to industrial endpoints to air‑gapped secure facilities. In the broader context of enterprise AI, LFM2 signals a shift from a cloud‑only paradigm to a hybrid edge‑cloud reality. Organizations that adopt this approach will be better positioned to meet cost, latency, governance, and resilience requirements in a rapidly evolving AI landscape.
Conclusion
Liquid AI’s LFM2 represents more than a new family of small models; it is a comprehensive blueprint that demystifies the process of building efficient, on‑device AI systems tailored to enterprise constraints. By openly sharing the architecture search, training curriculum, and post‑training pipeline, the company has lowered the barrier to entry for organizations that previously felt compelled to rely on cloud inference. The result is a practical, modular ecosystem that can be adapted to a wide range of hardware and use cases, from vision‑language perception on smartphones to real‑time audio transcription on industrial edge devices.
The broader implication is clear: the future of enterprise AI will be hybrid, with small, fast models acting as the control plane and larger cloud models providing heavy lifting when necessary. LFM2 gives enterprises the tools to design, deploy, and govern such systems with confidence, turning what was once a compromise into a strategic advantage.
Call to Action
If your organization is looking to bring AI capabilities closer to the user while maintaining strict latency, cost, and compliance requirements, it’s time to explore Liquid AI’s LFM2. Download the 51‑page technical report, experiment with the open‑source code, and start building your own on‑device foundation models today. By adopting a hardware‑centric, curriculum‑driven approach, you can unlock powerful AI that runs reliably on the devices your customers already use, without the overhead of cloud infrastructure.
Join the conversation on LinkedIn or Twitter, share your experiments, and help shape the next wave of enterprise AI. The blueprint is out there—now it’s up to you to build the future.