Fresh daily
AI News
Latest AI tool releases, research breakthroughs, and industry news.
Older
Measuring the performance of our models on real-world tasks
OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.
ENEOS Materials brings ChatGPT Enterprise to manufacturing
ENEOS Materials uses ChatGPT Enterprise to speed research, improve plant design safety, and cut HR analysis time by 90%, with 80% reporting better workflows.
Detecting and reducing scheming in AI models
Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.
Building towards age prediction
Learn how OpenAI is building age prediction and parental controls in ChatGPT to create safer, age-appropriate experiences for teens while supporting families with new tools.
How people are using ChatGPT
New research from the largest study of ChatGPT use shows how the tool creates economic value through both personal and professional use. Adoption is broadening beyond early users, closing gaps and making AI a part of everyday life.
Working with US CAISI and UK AISI to build more secure AI systems
OpenAI shares progress on the partnership with the US CAISI and UK AISI to strengthen AI safety and security.
Why language models hallucinate
OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.
Collective alignment: public input on our Model Spec
OpenAI surveyed over 1,000 people worldwide on how AI should behave and compared their views to our Model Spec. Learn how collective alignment is shaping AI defaults to better reflect diverse human values and perspectives.
OpenAI and Anthropic share findings from a joint safety evaluation
OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
Helping people when they need it most
How we think about safety for users experiencing mental or emotional distress, the limits of today’s systems, and the work underway to refine them.
Accelerating life sciences research
Discover how a specialized AI model, GPT-4b micro, helped OpenAI and Retro Bio engineer more effective proteins for stem cell therapy and longevity research.
Medical research with GPT-5
Learn how GPT-5 is used for medical research.
From hard refusals to safe-completions: toward output-centric safety training
Discover how OpenAI's new safe-completions approach in GPT-5 improves both safety and helpfulness in AI responses—moving beyond hard refusals to nuanced, output-centric safety training for handling dual-use prompts.
How Amgen uses GPT-5
Learn how Amgen uses GPT-5.
Estimating worst case frontier risks of open weight LLMs
In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.
OpenAI’s new economic analysis
Analysis provides insights into ChatGPT’s impact on the economy. OpenAI also launches new research collaboration to study AI’s broader effects on the labor market and productivity.
Preparing for future AI risks in biology
Advanced AI can transform biology and medicine—but also raises biosecurity risks. We’re proactively assessing capabilities and implementing safeguards to prevent misuse.
Toward understanding and preventing misalignment generalization
We study how training on incorrect responses can cause broader misalignment in language models and identify an internal feature driving this behavior—one that can be reversed with minimal fine-tuning.
Disrupting malicious uses of AI: June 2025
Our latest report featuring case studies of how we’re detecting and preventing malicious uses of AI.
Introducing HealthBench
HealthBench is a new evaluation benchmark for AI in healthcare which evaluates models in realistic scenarios. Built with input from 250+ physicians, it aims to provide a shared standard for model performance and safety in health.