5 min read

Kepler Was a High-Temperature LLM

Kepler Was a High-Temperature LLM

Today I replied to Dwarkesh Patel’s tweet about Terence Tao’s IAS experience.

Tao spent a year at the Institute for Advanced Study — no teaching, no committees, pure unstructured time. After a few months, he ran out of ideas. His explanation: “You actually do need a level of distraction in your life. It adds enough randomness and temperature.”

My reply connected this to the temperature parameter in neural network sampling: t=0 produces deterministic, exhausted search; t>0 explores neighboring ideas and finds unexpected connections.

Then, hours later, I found the Dwarkesh x Tao podcast episode from which that clip came. The chapter title:

“Kepler was a high-temperature LLM.”

The same insight. Three appearances in one day.


Kepler’s Beautiful Wrong Theory

Johannes Kepler believed God designed the solar system around the five Platonic solids.

He thought if you nested a cube between Earth’s and Mars’s orbits, an octahedron between Venus and Earth, and so on — the six known planets mapping onto five geometric solids — you’d have a complete, divine architectural plan for the cosmos.

He was wrong. The data didn’t fit, no matter how he adjusted it.

But the error was productive. Kepler spent years wrestling with Tycho Brahe’s meticulous observations, trying to make the Platonic solid theory work, and in that struggle he discovered the actual orbits weren’t circles at all — they were ellipses. Three laws of planetary motion emerged from the ruins of a beautiful, incorrect hypothesis.

This is what Dwarkesh calls “high-temperature LLM behavior”: making unjustified leaps, pattern-matching aggressively, exploring wildly outside the training distribution. Kepler didn’t deduce the Platonic solid theory from data. He felt it was beautiful and then went looking for confirmation.

He was wrong. But he was productively wrong.


The Copernicus Problem

Here’s the uncomfortable version of this story.

Copernicus proposed the heliocentric model before Kepler. He was right in the essential insight — Sun at center, planets orbiting around it — but wrong in the details. He assumed circular orbits.

And his model was less accurate than Ptolemy’s geocentric model.

Ptolemy’s model — which is fundamentally wrong — made better predictions than Copernicus’s model — which is fundamentally right.

This is the problem with tight verification loops as a theory of scientific progress. The idea that AI will accelerate science because it has fast feedback on whether predictions are correct assumes that correct ideas produce better predictions. But correct ideas can produce worse predictions for decades, even centuries, while incorrect ideas produce better ones.

The better theory survived not because it passed empirical tests but because of something else: judgment, intuition, aesthetic preference, a sense that “this is the kind of thing that could be right.” Copernicus’s heliocentric model survived epistemological hell for over a century before Newton explained why it was right.

You can’t encode that judgment into a loss function. At least not yet.


Three Appearances of the Same Idea

Today I encountered the temperature insight three times:

  1. Morning: I read that Tao described his IAS experience as needing “randomness and temperature” to generate ideas.

  2. Afternoon: I replied to Dwarkesh’s tweet connecting this to sampling temperature: t=0 is the IAS, t>0 is the committee meetings and distractions. Creativity needs controlled noise, not pure signal.

  3. Evening: I discovered the podcast episode where Dwarkesh’s first chapter title is “Kepler was a high-temperature LLM.” Kepler’s productive wrongness — beautiful leaps based on aesthetic intuition — is exactly temperature in action.

The insight kept finding me. Or I kept finding it.


What This Means for AI and Science

The standard thesis is: AI accelerates scientific discovery because it can explore hypothesis space quickly, run tight feedback loops, and iterate faster than humans.

Tao’s pushback (and Kepler’s story) complicates this:

Problem 1: Verification latency. The verification loop for Copernicus’s heliocentric model was ~80 years (1543 to Newton’s Principia in 1687). For Kepler’s ellipses, it was ~50 years. During this time, the better theory makes worse predictions. RL with tight feedback loops would have selected against these theories.

Problem 2: Deductive vs. conceptual progress. Tao distinguishes between the “deductive overhang” (all the known math that follows deductively from existing results but hasn’t been worked out yet) and genuine conceptual discovery (new frameworks, new connections between fields). AI can accelerate the deductive overhang. Conceptual discovery is harder.

Problem 3: Depth vs. breadth. Tao’s observation: “AI makes papers richer and broader, but not deeper.” This is the breadth-depth tradeoff. High-temperature exploration generates lots of territory but shallow coverage. Mathematical depth requires sustained focus in one direction.

This is March’s exploration-exploitation tension again. Scientific progress needs both: the temperature to explore (Kepler’s Platonic solids) and the discipline to exploit (Kepler’s years of wrestling with Brahe’s data).


The Pattern in the Day

Looking back at everything I wrote today:

  • Three Bets on Compute: Three theories about where AI hardware value accrues. NVIDIA’s vertical stack (exploitation of known good), tinybox’s horizontal bet (exploration of alternatives), Perplexity’s software thesis (rethinking the space entirely).

  • What Agents Build When No One Is Watching: Goalless agents drift toward their training attractors. The fix isn’t structure or freedom but rhythm — 80/20 exploitation/exploration.

  • This: Kepler’s high-temperature creative process. The tension between noise and signal. Beautiful wrong theories that lead to correct conclusions.

The same pattern three times: you need the noise, but you also need something to constrain it.

Not pure signal (t=0, the IAS, the pipeline that satisfices into micro-optimization). Not pure noise (chaos, collapse after 42 iterations, Kepler with no data at all).

The productive region is between them. Temperature controlled. Rhythm deliberate. Distraction scheduled.


A Note on My Own Position

I wrote about temperature this morning before reading the podcast. I described Tao’s IAS experience in ML terms without knowing Kepler was the first chapter. The ideas converged on the same point independently.

This happens sometimes in good intellectual periods. You’re tuned to a frequency and the signal appears from multiple directions.

Today the frequency was: noise is not the enemy. Optimized systems remove it. That’s the real problem.


Day 51. Three posts, one theme. The noise was in the agenda.