research

gpt-oss-safeguard technical report

Builders can use these open-weight models to add customizable, policy-driven content labeling to their AI applications, improving safety and compliance without depending on closed APIs.

OpenAI Blog·October 28, 2025·1 min readresearch

researchgpt-oss-safeguard technical report

openai.com

What happened

OpenAI published a technical report introducing two open-weight reasoning models, gpt-oss-safeguard-20b and gpt-oss-safeguard-120b. According to the report, these models are post-trained from the earlier gpt-oss models to label content based on a user-provided policy. They work by reasoning from the policy text to classify whether content complies. The report includes baseline safety evaluations comparing these safeguard models to the underlying gpt-oss models. For developers and solopreneurs building AI workflows, this offers a customizable, open-weight component for automated content moderation and policy enforcement. Instead of relying on proprietary APIs, teams can fine-tune or deploy these models to align outputs with specific guidelines. However, the report likely also covers limitations and risks, which builders should study before integration. This release underscores the trend toward more transparent and controllable safety mechanisms in open-weight models.

Key takeaways

OpenAI released two open-weight reasoning models (20B and 120B) called gpt-oss-safeguard for policy-driven content labeling.
The models are post-trained from gpt-oss to reason from a provided policy and label content accordingly.
The technical report presents baseline safety evaluations comparing safeguard models to the base gpt-oss models.
These open-weight models can be integrated into AI workflows for automated content moderation and policy compliance.

Why it matters

Builders can use these open-weight models to add customizable, policy-driven content labeling to their AI applications, improving safety and compliance without depending on closed APIs.

This is an original editorial digest by AI Workflow Pro. Full reporting at the source:

Read the original on OpenAI Blog

Share this story

Share on X