Using DSPy to evaluate and improve Datasette Agent's SQL sys…

What happened

Simon Willison used the DSPy framework to systematically evaluate and improve the system prompts of Datasette Agent, a tool that translates user questions into read-only SQL queries. Inspired by an AIE keynote on DSPy, he initiated an asynchronous research task via Claude Code with the latest Datasette alpha and DSPy library. Testing with GPT-4.1 mini and nano, DSPy identified several prompt flaws: for instance, the schema listing only showed table names, and the instruction to avoid calling describe_table if information already exists prompted the model to guess column names (e.g., page_count, o.order_id), leading to errors and retry loops. A suggested fix was to include column names in the prompt's schema listing or soften that advice. The experiment demonstrates how DSPy can automate prompt engineering for LLM-based agents, turning trial-and-error into a data-driven process. For developers building AI workflows, this offers a replicable method to optimize system prompts without manual tweaking, improving accuracy and reducing debugging time.

Key takeaways

Simon Willison applied DSPy to evaluate Datasette Agent's SQL system prompts.

DSPy identified that omitting column names from schema listings led the LLM to guess and cause error loops.

The framework recommended either including column names or relaxing the advice against calling describe_table.

Tests were run with GPT-4.1 mini and nano via Claude Code.

The approach provides a systematic alternative to manual prompt tuning for AI agents.

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

What happened

Key takeaways

Why it matters

More AI news

Search AI Workflow Pro

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

What happened

Key takeaways

Why it matters

Related tools

More AI news