release
Prompt Caching in the API
This feature directly lowers the operational cost of running AI workflows, especially for developers who handle high volumes of similar requests, making it easier to scale applications without proportional cost increases.
What happened
OpenAI has introduced automatic prompt caching for its API, offering discounted pricing on input tokens that have been recently processed by the model. According to the OpenAI Blog, when a developer sends a prompt with repeated prefixes or common instructions, the API automatically detects and caches those segments, reducing both cost and latency. The discount applies to cached input tokens, with prices up to 50% lower for certain models. This feature works out of the box for supported models (including GPT-4o and GPT-4o-mini), requiring no code changes from developers. Contextually, prompt caching addresses the common workflow where identical or similar prompts are sent repeatedly—such as in chatbot conversations, iterative code generation, or batch processing tasks. For developers building AI workflows, this means they can optimize their API spending without additional engineering effort. The caching is ephemeral (lasts 5-10 minutes) and transparent, making it a practical optimization for real-time applications. OpenAI’s move aligns with industry trends toward reducing inference costs, especially as developers scale their AI-powered products.
Key takeaways
- OpenAI automatically caches recently seen input tokens and offers discounted rates on them.
- Discount applies to supported models like GPT-4o and GPT-4o-mini, with up to 50% savings on cached tokens.
- No code changes required; caching is ephemeral and transparent to the developer.
- Reduces both cost and latency for repetitive prompt segments.
- Ideal for chatbots, code completion, or any workflow with repeated context.
Why it matters
This feature directly lowers the operational cost of running AI workflows, especially for developers who handle high volumes of similar requests, making it easier to scale applications without proportional cost increases.
This is an original editorial digest by AI Workflow Pro. Full reporting at the source:
Read the original on OpenAI BlogMore AI news
All news →





Join the AI Workflow Pro Community