research

Transfer of adversarial robustness between perturbation types

For AI developers and solopreneurs, relying on a single defense method can create false security; this research shows that robust AI workflows must evaluate and harden models against diverse perturbation types.

OpenAI Blog·May 3, 2019·1 min readresearch

researchTransfer of adversarial robustness between perturbation types

openai.com

What happened

OpenAI has published research on how adversarial robustness—a model's ability to resist maliciously crafted inputs—transfers between different types of perturbations. The study systematically tests whether defenses effective against one perturbation type (e.g., L∞-norm attacks) also hold against others (e.g., L2-norm or spatial perturbations). The findings indicate that robustness does not automatically generalize; a model trained to resist one attack can remain vulnerable to another. This challenges common practices in building secure AI workflows, where defenses are often optimized for a single attack vector. For developers deploying models in production, the takeaway is that comprehensive testing across multiple perturbation types is necessary to ensure reliable protection. The research underscores that adversarial security strategies must be diverse and not rely on a one-size-fits-all approach.

Key takeaways

OpenAI studied how adversarial robustness transfers across different perturbation types (e.g., L∞, L2, spatial).
Robustness to one type of attack does not reliably generalize to others.
Models trained against a single perturbation remained vulnerable to different attack types.
The work highlights the need for multi-faceted adversarial testing in production AI systems.