Skip to main content
Join Community

Search AI Workflow Pro

Search tools, categories, stacks, and pages

research

Learning to play Minecraft with Video PreTraining

For AI workflow builders, VPT shows how to build capable agents from unlabeled video, cutting data costs and enabling automation of complex multi-step tasks in software environments.

OpenAI Blog··2 min readresearch
researchLearning to play Minecraft with Video PreTraining
openai.com

What happened

OpenAI has published a research paper on Video PreTraining (VPT), a method for training neural networks to play Minecraft using a large dataset of unlabeled human gameplay videos. The model learns from raw keyboard and mouse inputs, requiring only a small amount of labeled contractor data for fine-tuning. With this approach, the agent can craft diamond tools—a task that typically takes experienced players over 20 minutes and 24,000 actions. VPT represents a significant advance in training agents that interact with software through native human interfaces, moving toward general-purpose computer-using AI. For developers building AI workflows, this research demonstrates how unlabeled video data can dramatically reduce the need for costly labeled datasets when training complex behavioral models. The technique could be applied to other domains where human demonstration videos are abundant, enabling more efficient development of autonomous agents for tasks like software testing, data entry, or game automation.

Key takeaways

  • OpenAI's VPT method uses a massive unlabeled video dataset of human Minecraft play, with minimal labeled data for fine-tuning.
  • The model operates via native keyboard and mouse inputs, making it general-purpose for computer interfaces.
  • After fine-tuning, the agent can craft diamond tools, a sequence requiring over 20 minutes of skillful play.
  • VPT reduces the need for expensive labeled training data by leveraging publicly available human gameplay videos.
  • The approach is a step toward creating agents that can learn arbitrary computer tasks from observing humans.

Why it matters

For AI workflow builders, VPT shows how to build capable agents from unlabeled video, cutting data costs and enabling automation of complex multi-step tasks in software environments.

This is an original editorial digest by AI Workflow Pro. Full reporting at the source:

Read the original on OpenAI Blog
Share this story
Share on X

More AI news

All news →

Join the AI Workflow Pro Community

Join Free