Fresh daily
AI News
Latest AI tool releases, research breakthroughs, and industry news.
Older
OpenAI o1 System Card
This report outlines the safety work carried out prior to releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk evaluations according to our Preparedness Framework.
Morgan Stanley is shaping the future of financial services
Morgan Stanley uses AI evals to shape the future of financial services
Advancing red teaming with people and AI
Advancing red teaming with people and AI
Data-driven beauty and creativity with ChatGPT
Data-driven beauty: How The Estée Lauder Companies unlocks insights with ChatGPT
Introducing SimpleQA
A factuality benchmark called SimpleQA that measures the ability for language models to answer short, fact-seeking questions.
Simplifying, stabilizing, and scaling continuous-time consistency models
We’ve simplified, stabilized, and scaled continuous-time consistency models, achieving comparable sample quality to leading diffusion models, while using only two sampling steps.
OpenAI and the Lenfest Institute AI Collaborative and Fellowship program
OpenAI and the Lenfest Institute AI Collaborative and Fellowship program
Evaluating fairness in ChatGPT
We've analyzed how ChatGPT responds to users based on their name, using AI research assistants to protect privacy.
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.
Creating agent and human collaboration with GPT 4o
Altera uses GPT-4o to build a new area of human collaboration
Using GPT-4 to improve teaching and learning in Brazil
Improving teaching and learning in Brazil
Learning to reason with LLMs
Decoding genetics with OpenAI o1
Geneticist Catherine Brownstein demonstrates how OpenAI o1 can speed up the process of diagnosing rare medical challenges.
Answering quantum physics questions with OpenAI o1
Quantum physicist Mario Krenn uses OpenAI o1 to help answer life's biggest questions.
Disrupting a covert Iranian influence operation
Introducing SWE-bench Verified
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
Improving Model Safety Behavior with Rule-Based Rewards
We've developed and applied a new method leveraging Rule-Based Rewards (RBRs) that aligns models to behave safely without extensive human data collection.
Prover-Verifier Games improve legibility of language model outputs
Discover how prover-verifier games improve the legibility of language model outputs, making AI solutions clearer, easier to verify, and more trustworthy for both humans and machines.
OpenAI and Los Alamos National Laboratory announce research partnership
OpenAI and Los Alamos National Laboratory are working to develop safety evaluations to assess and measure biological capabilities and risks associated with frontier models.
Finding GPT-4’s mistakes with GPT-4
CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF