What happened after 2,000 people tried to hack my AI assista…

What happened

In a real-world test of AI assistant security, developer Fernando Irarrázaval challenged hackers to extract secrets from his OpenClaw instance via email prompts. Despite over 6,000 attempts costing $500 in tokens and triggering a Google account suspension, no one succeeded in leaking the secret. The underlying model, Opus 4.6, used a strict anti-prompt-injection system prompt forbidding actions like revealing credentials or executing code. According to Simon Willison, this outcome reflects the increasing effectiveness of training frontier models to resist injection attacks, a trend noted in recent system card releases. However, Willison cautions that 6,000 failures don't guarantee immunity; a determined attacker with a novel approach could still break through. The Hacker News discussion highlighted both healthy skepticism and constructive feedback from the challenge creator. For AI builders, this underscores that while model-level defenses are improving, they are not yet a substitute for robust architectural safeguards in production systems.

Key takeaways

Fernando Irarrázaval ran a challenge allowing 2,000 participants to email his OpenClaw assistant, aiming to leak a secret.

After 6,000 attempts and $500 in token costs, the secret remained unrevealed due to strong prompt-injection defenses.

The Opus 4.6 model's system prompt forbade actions like revealing secrets, modifying files, or running code from emails.

Simon Willison notes that frontier model training against injection attacks is proving effective but insufficient for irreversible harm.

The Hacker News thread features skeptical discussion and replies from the challenge creator, emphasizing no guarantee of absolute security.

What happened after 2,000 people tried to hack my AI assistant

What happened

Key takeaways

Why it matters

More AI news

Search AI Workflow Pro

What happened after 2,000 people tried to hack my AI assistant

What happened

Key takeaways

Why it matters

Related tools

More AI news