research

Measuring AI’s capability to accelerate biological research

This framework provides a blueprint for integrating AI into experimental research pipelines, offering developers a way to evaluate model performance in tasks that require precision and domain knowledge—beyond text generation.

OpenAI Blog·December 16, 2025·1 min readresearch

researchMeasuring AI’s capability to accelerate biological research

openai.com

What happened

OpenAI has introduced a new evaluation framework designed to measure how well AI can accelerate real-world biological research, moving beyond benchmark tests to assess performance in wet lab settings. In a demonstration, the company used GPT-5 to optimize a molecular cloning protocol—a common lab task—and evaluated the model's ability to generate executable instructions, avoid safety pitfalls, and produce valid experimental plans. The work highlights both the potential of AI-powered lab assistants and the risks of errors that could lead to wasted resources or safety hazards. For developers building AI workflows, this underscores a shift toward domain-specific, outcome-driven evaluation, where models must prove utility in complex, procedural environments rather than just on static datasets.

Key takeaways

OpenAI proposes a real-world evaluation framework for AI in biological research, focusing on wet lab tasks like molecular cloning.
In a test case, GPT-5 was used to optimize a cloning protocol; the framework measures correctness, safety, and efficiency.
The evaluation aims to quantify both the promise (accelerating research) and risks (errors, safety) of using AI for experimental design.
This approach moves beyond traditional NLP benchmarks to assess AI's practical impact in scientific workflows.