Introduction
Wet‑lab automation has long promised to accelerate discovery, reduce human error, and free researchers to focus on higher‑level questions. Yet the path from a written protocol to a fully autonomous execution is fraught with ambiguity: natural language instructions can be ambiguous, safety constraints must be enforced, and the system must adapt to dynamic laboratory conditions. In this tutorial we tackle these challenges by building a modular, Python‑based protocol planner and validator that leverages Salesforce’s CodeGen‑350M‑mono language model. CodeGen brings natural‑language reasoning to the agent, enabling it to parse experimental instructions, generate executable plans, and validate safety constraints in real time. By combining a lightweight protocol parser, an agentic planner, and a safety‑oriented validator, we create a pipeline that turns a researcher’s written protocol into a verified, executable sequence that can be run on a robotic platform.
The core idea is to treat the protocol as a knowledge graph: each step, reagent, temperature, and duration becomes a node linked by causal relationships. The agent uses CodeGen to reason over this graph, fill in missing details, and propose alternative routes when constraints are violated. Safety optimization is baked into the validation layer, which cross‑checks the plan against a library of hazardous combinations and regulatory limits. The result is a system that not only automates routine tasks but also acts as a safety net, alerting users to potential risks before the experiment begins.
In the sections that follow we walk through the architecture, the key components, and the code that stitches them together. By the end of this post you will have a working prototype that can parse a simple protocol, generate a validated plan, and output a JSON representation ready for execution on a robotic platform.
Main Content
System Architecture Overview
The pipeline is intentionally modular to allow each component to evolve independently. At the top level we have three stages: parsing, planning, and validation. The parser consumes a plain‑text protocol and emits a structured representation in JSON. The planner, powered by CodeGen, receives this representation and produces an expanded, step‑by‑step plan, enriched with timing, temperature, and equipment instructions. Finally, the validator checks the plan against safety rules and laboratory constraints, flagging any violations and suggesting mitigations.
The data flow is illustrated as follows:
- ProtocolParser reads the protocol, tokenizes sentences, and extracts entities using spaCy’s NER model fine‑tuned on laboratory terminology.
- AgenticPlanner feeds the extracted entities into a prompt engineered for CodeGen, asking it to generate a detailed plan that respects the logical order of operations.
- SafetyValidator runs a set of rule‑based checks (e.g., incompatible reagents, temperature limits) and a learned model that predicts hazard scores for each step.
Each stage outputs a JSON object that can be consumed by downstream systems, such as a robotic scheduler or a laboratory information management system (LIMS).
Protocol Parsing with NLP
The first hurdle is converting free‑text into a machine‑readable format. We use spaCy’s en_core_web_md model as a baseline and then add custom entity labels: REAGENT, CONCENTRATION, TEMPERATURE, DURATION, EQUIPMENT, and CONDITION. By training on a curated dataset of 1,200 protocol snippets, the parser attains an F1 score of 0.87 for entity extraction.
Once entities are identified, we construct a dependency graph where nodes represent actions (e.g., “add 5 mL of ethanol”) and edges encode temporal or causal relationships. The graph is serialized to JSON with a schema that includes fields such as step_id, action, reagent, amount, unit, temperature, duration, and equipment. This structured representation is the lingua franca for the planner.
Agentic Planning via CodeGen
CodeGen is a transformer trained on 350 M tokens of monolingual code and natural language. Its ability to generate code‑like sequences makes it ideal for turning a high‑level protocol into executable instructions. We craft a prompt that presents the parsed JSON and asks the model to output a detailed plan in a specified format.
For example, the prompt might read:
Given the following protocol steps:
{ "steps": [ ... ] }
Generate a detailed execution plan with timestamps, temperature settings, and equipment assignments.
The model outputs a JSON array where each element contains a timestamp, equipment, action, and parameters. Because CodeGen is deterministic when seeded, we can reproduce plans reliably.
To improve reliability, we apply a two‑stage decoding strategy: first, we generate a draft plan; second, we run a sanity check that ensures the plan follows the logical order and respects dependencies. If inconsistencies are detected, the prompt is re‑issued with a corrective instruction.
Validation and Safety Checks
Safety is paramount in wet‑lab automation. Our validator combines rule‑based logic with a lightweight hazard prediction model trained on the OpenSafety dataset. The rules cover:
- Reagent incompatibilities (e.g., mixing strong oxidizers with organics).
- Temperature thresholds (e.g., not exceeding 80 °C for certain reagents).
- Equipment limits (e.g., maximum volume for a pipette).
- Regulatory constraints (e.g., handling of hazardous gases).
The hazard model assigns a risk score to each step based on features such as reagent type, concentration, and temperature. Steps with a score above a configurable threshold trigger an alert. The validator outputs a validation_report that lists violations, suggested mitigations, and a confidence score for each step.
By integrating validation into the pipeline, the system can refuse to execute a plan that poses unacceptable risk, thereby protecting both personnel and equipment.
Integration and Deployment
Once the plan is validated, it can be handed off to a robotic scheduler. In our demo, we use a simple Flask API that accepts the plan JSON and returns a job_id. The scheduler then translates the plan into machine‑specific commands (e.g., for a Hamilton or Tecan robot) using a mapping layer.
Deployment is straightforward: the entire stack runs inside a Docker container, with separate services for parsing, planning, and validation. The container can be orchestrated with Kubernetes for scalability, allowing multiple users to submit protocols concurrently.
Real‑World Use Cases
The architecture is agnostic to the specific chemistry or biology domain. In a pharmaceutical setting, the planner can generate synthesis routes for small molecules, ensuring that each step adheres to GMP guidelines. In a microbiology lab, the system can automate media preparation, inoculation, and incubation, while flagging any steps that could lead to contamination.
Because the planner is agentic, it can also suggest optimizations: for instance, recommending a shorter incubation time if the reaction progress is rapid, or proposing alternative reagents that are safer or more readily available.
Conclusion
Building an autonomous wet‑lab protocol planner and validator is a multidisciplinary endeavor that blends natural‑language processing, generative AI, and safety engineering. By leveraging Salesforce’s CodeGen model, we empower the system to reason over complex protocols and generate executable plans that are both efficient and safe. The modular pipeline—comprising a robust parser, an agentic planner, and a safety validator—ensures that each protocol is transformed from a human‑written document into a machine‑ready sequence that can be executed with minimal oversight.
The result is a flexible platform that can be adapted to a wide range of laboratory workflows, from high‑throughput screening to precision synthesis. As the field of autonomous chemistry matures, such systems will become indispensable tools for researchers seeking to accelerate discovery while maintaining rigorous safety standards.
Call to Action
If you’re excited about bringing AI into the laboratory, start by cloning the repository we provide and running the demo on your local machine. Experiment with different protocols, tweak the safety thresholds, and observe how the planner adapts. Share your findings on GitHub or in a forum, and contribute improvements back to the community. Together, we can build a safer, faster, and more intelligent wet‑lab ecosystem that empowers scientists to focus on the science, not the logistics.