Introduction
LinkedIn’s launch of an AI‑powered people search this week marks a watershed moment for enterprise generative AI. The platform, which now boasts more than 1.3 billion members, has taken a concept that seemed almost inevitable—searching for people by intent rather than by keyword—and turned it into a production‑grade service that can serve a global user base with sub‑second latency. The journey from a handful of beta testers to a billion‑user product was anything but linear. It was a marathon of data engineering, policy design, model distillation, and relentless optimization, all carried out in an environment where every millisecond of latency translates into a measurable impact on user engagement and revenue.
For technical leaders and product managers, LinkedIn’s story is a masterclass in how to move generative AI from the lab to the real world at scale. The company’s approach—building a “cookbook” of best practices that can be replicated across products—provides a concrete blueprint for enterprises that want to deploy large language models (LLMs) without falling into the trap of hype and over‑engineering. In this post we unpack the key technical milestones, the strategic decisions that guided the rollout, and the practical lessons that can be applied to any AI‑driven product.
Main Content
From Keywords to Intent: The Core Innovation
Traditional search on LinkedIn was built around keyword matching. A query such as “cancer” would return profiles that contained that exact word, ignoring the semantic richness of the user’s intent. To overcome this limitation, LinkedIn introduced a semantic layer powered by an LLM that can understand the conceptual relationships between terms. When a user types “Who is knowledgeable about curing cancer?”, the model recognizes that “cancer” is closely related to “oncology” and even to broader fields like “genomics research.” The result is a list of professionals that includes oncology leaders, researchers, and scientists whose profiles may not mention the word “cancer” explicitly.
But relevance alone is not enough. The system also weighs the usefulness of each candidate by considering the user’s network. A top oncologist who is a third‑degree connection may be less actionable than a first‑degree contact who is “pretty relevant.” This dual focus on semantic relevance and network proximity ensures that the search results are not only accurate but also actionable.
The 1.3 Billion‑Member Challenge
Scaling a semantic search engine to 1.3 billion users is a monumental technical hurdle. The initial recipe that worked for LinkedIn’s AI job search—where the user base was in the tens of millions—had to be re‑engineered for a graph that is almost an order of magnitude larger. The first step was to create a “golden data set” of a few hundred to a thousand real query‑profile pairs. These pairs were meticulously scored against a detailed 20‑ to 30‑page product policy document that encoded relevance rules and user‑experience guidelines.
To amplify this small set, LinkedIn leveraged a large foundation model to generate synthetic training data. The synthetic data fed a 7‑billion‑parameter “Product Policy” model that could judge relevance with high fidelity. However, this model was too slow for live production. The breakthrough came when the team distilled the 7B policy model into a 1.7B teacher model focused solely on relevance. They then paired it with separate teacher models that predicted specific member actions—such as job applications for the jobs product or connecting and following for people search. The ensemble of teachers produced soft probability scores that a student model learned to mimic via KL divergence loss.
The final architecture is a two‑stage pipeline. An 8B parameter model performs broad retrieval, casting a wide net across the graph. A highly distilled student model then fine‑grains the ranking. For people search, the student model had to be aggressively compressed from 440 M down to 220 M parameters to achieve the necessary speed for 1.3 billion users while maintaining less than a 1 % loss in relevance.
Retrieval vs. Ranking: A New Architectural Shift
People search introduced a new dimension to the problem: retrieval. The previous job‑search stack was built on CPU‑based indexing, which was adequate for a smaller search space. For a billion‑record graph, the latency demands were far higher. LinkedIn moved its indexing to GPU‑based infrastructure, a foundational shift that enabled the system to meet the snappy response times users expect from a modern search engine.
Organizationally, the project benefited from a cross‑team collaboration model. Initially, separate job‑search and people‑search teams worked in parallel. Once the job‑search team cracked the policy‑driven distillation method, leadership brought the architects of that success—product lead Rohan Rajiv and engineering lead Wenjing Zhang—into the people‑search effort. This transfer of knowledge accelerated the development cycle and ensured that the same proven techniques were applied to a new domain.
10× Throughput Gains Through Aggressive Optimization
After solving the retrieval problem, the team turned to ranking efficiency. One of the most significant optimizations was reducing the input size to the ranking model. An RL‑trained summarizer model was introduced to condense the context by a factor of 20 with minimal information loss. Coupled with the 220 M‑parameter student model, this yielded a 10× increase in ranking throughput, allowing LinkedIn to serve the model to its massive user base without compromising latency.
Pragmatism Over Agentic Hype
Throughout the development cycle, LinkedIn’s leadership emphasized that the real value for enterprises lies in perfecting recommender systems, not in chasing the latest agentic hype. The new AI‑powered people search is a tool that will eventually feed into higher‑level agents, but the focus remains on building a robust, efficient search engine. An intelligent query‑routing layer, powered by an LLM, decides whether a user’s query should go to the semantic stack or the legacy lexical search. This pragmatic approach ensures that the system remains reliable while still embracing the benefits of generative AI.
Conclusion
LinkedIn’s rollout of an AI‑powered people search for 1.3 billion users is more than a product launch; it is a case study in how to scale generative AI responsibly and efficiently. By starting with a single vertical, codifying a repeatable distillation pipeline, and relentlessly optimizing every layer of the stack, LinkedIn turned a complex, policy‑driven problem into a production‑grade service that balances relevance, usefulness, and speed. The lessons distilled from this journey—pragmatism, modularity, and a focus on pipeline engineering—are universally applicable to any enterprise looking to deploy large language models at scale.
Call to Action
If you’re building or managing AI products, take a page from LinkedIn’s playbook. Begin with a focused use case, build a small but high‑quality golden dataset, and use synthetic data to bootstrap a policy‑driven model. Then, iterate through distillation, pruning, and creative optimizations like RL‑summarization to meet your performance targets. Remember, the goal is not to create a flashy agent but to deliver a reliable, efficient tool that users can trust. Start small, document your process, and scale gradually—your next billion‑user launch could be just a few iterations away.