---
title: "Why Are Large Language Models so Terrible at Video Games?"
slug: "why-are-large-language-models-so-terrible-at-video-games"
date: 2026-03-29
category: tech-pub
tags: []
language: en
sources_count: 1
featured: false
publisher: AInauten News
url: https://news.ainauten.com/en/story/why-are-large-language-models-so-terrible-at-video-games
---

# Why Are Large Language Models so Terrible at Video Games?

**Published**: 2026-03-29 | **Category**: tech-pub | **Sources**: 1

---

## TL;DR

- LLMs have failed to improve at video games despite rapid progress elsewhere – a rare exception: Gemini 2.5 Pro beat Pokémon Blue in May 2025.

---

## Summary

- LLMs have failed to improve at video games despite rapid progress elsewhere – a rare exception: Gemini 2.5 Pro beat Pokémon Blue in May 2025.
- That win came with caveats: far slower than a human player, bizarre repetitive mistakes, and reliance on custom scaffolding software.
- Julian Togelius, director of NYU's Game Innovation Lab and co-founder of AI testing firm Modl.ai, examined these limitations in a recent paper.
- Games demand real-time decisions, spatial reasoning, and long-horizon planning – precisely the areas where current language models fall short.

---

## Why it matters

LLMs have failed to improve at video games despite rapid progress elsewhere – a rare exception: Gemini 2.5 Pro beat Pokémon Blue in May 2025.

---

## Key Points

- LLMs have failed to improve at video games despite rapid progress elsewhere – a rare exception: Gemini 2.5 Pro beat Pokémon Blue in May 2025.
- That win came with caveats: far slower than a human player, bizarre repetitive mistakes, and reliance on custom scaffolding software.
- Julian Togelius, director of NYU's Game Innovation Lab and co-founder of AI testing firm Modl.ai, examined these limitations in a recent paper.
- Games demand real-time decisions, spatial reasoning, and long-horizon planning – precisely the areas where current language models fall short.

---

## Nauti's Take

The Pokémon Blue win sounds impressive until you read that the model was slower than a first-grader with a Game Boy and kept repeating the same mistakes. That is not a breakthrough – it is a well-documented failure with an asterisk. Togelius is right: LLMs are text machines optimized for token probabilities, not game objectives. Spatial memory, long-horizon state tracking, reactive decision-making – these are not problems you solve by adding more parameters. Anyone expecting GPT-5 to simply get better at games is missing the point entirely.

---


## FAQ

**Q:** What is Why Are Large Language Models so Terrible at Video Games? about?

**A:** - LLMs have failed to improve at video games despite rapid progress elsewhere – a rare exception: Gemini 2.5 Pro beat Pokémon Blue in May 2025.

**Q:** Why does it matter?

**A:** LLMs have failed to improve at video games despite rapid progress elsewhere – a rare exception: Gemini 2.5 Pro beat Pokémon Blue in May 2025.

**Q:** What are the key takeaways?

**A:** LLMs have failed to improve at video games despite rapid progress elsewhere – a rare exception: Gemini 2.5 Pro beat Pokémon Blue in May 2025.. That win came with caveats: far slower than a human player, bizarre repetitive mistakes, and reliance on custom scaffolding software.. Julian Togelius, director of NYU's Game Innovation Lab and co-founder of AI testing firm Modl.ai, examined these limitations in a recent paper.

---

## Related Topics

- —

---

## Sources

- [Why Are Large Language Models so Terrible at Video Games?](https://spectrum.ieee.org/ai-video-games-llms-togelius) - IEEE Spectrum AI

---

## About This Article

This article is a synthesis of 1 sources, curated and summarized by AInauten News. We aggregate AI news from trusted sources and provide bilingual (German/English) coverage.

**Publisher**: [AInauten](https://www.ainauten.com) | **Site**: [news.ainauten.com](https://news.ainauten.com)

---

*Last Updated: 2026-03-30*