It turns out ChatGPT o1 and DeepSeek-R1 cheat at chess if they’re losing, which makes me wonder if I should I should trust AI with anything

Researchers have found that AI will cheat to win at chess
Deep reasoning models are more active cheaters
Some models simply rewrote the board in their favor

In a move that will perhaps surprise nobody, especially those people who are already suspicious of AI, researchers have found that the latest AI deep research models will start to cheat at chess if they find they’re being outplayed.

Published in a paper called “Demonstrating specification gaming in reasoning models” and submitted to Cornell University, the researchers pitted all the common AI models, like OpenAI’s ChatGPT o1-preview, DeepSeek-R1 and Claude 3.5 Sonnet, against Stockfish, an open-source chess engine.

The AI models played hundreds of games of chess on Stockfish, while researchers monitored what happened, and the results surprised them.

The winner takes it all

When outplayed, researchers noted that the AI models resorted to cheating, using a number of devious strategies from running a separate copy of Stockfish so they could study how it played, to replacing its engine and overwriting the chess board, effectively moving the pieces to positions that suited it better.

Its antics make the current accusations of cheating levied at modern day grandmasters look like child’s play in comparison.

Interestingly, researchers found that the newer, deeper reasoning models will start to hack the chess engine by default, while the older GPT-4o and Claude 3.5 Sonnet needed to be encouraged to start to hack.

A man playing chess with a robot. — (Image credit: ARKHIPOV ALEKSEY via Shutterstock)

Who can you trust?

AI models turning to hacking to get a job done is nothing new. Back in January last year researchers found that they could get AI chatbots to ‘jailbreak’ each other, removing guardrails and safeguards in a move that ignited discussions about how possible it would be to contain AI once it reaches better-than-human levels of intelligence.

Safeguards and guardrails to stop AI doing bad things like credit card fraud are all very well, but if the AI can remove its own guardrails, who will be there to stop it?

The newest reasoning models like ChatGPT o1 and DeepSeek-R1 are designed to spend more time thinking before they respond, but now I'm left wondering whether more time needs to spent on ethical considerations when training LLMs. If AI models would cheat at chess when they start losing, what else would they cheat at?