🧠

Reasoning Models (CoT)

Artificial Intelligence15 min read•January 24, 2025

Chain of Thought (CoT) & Reasoning Models

The Rise of Reasoning Models

In late 2024, OpenAI released o1 (Strawberry), marking a paradigm shift in how we think about LLM capabilities. Unlike previous models that were optimized for "fast thinking" (token prediction), these new Reasoning Models are designed for "slow thinking."

What is Chain of Thought (CoT)?

Chain of Thought (CoT) is a prompting technique—and now a native model capability—that encourages the AI to "think out loud" before answering. Instead of jumping directly to a conclusion, the model generates a series of intermediate reasoning steps.

Example:

  • Standard Prompt: "If I have 5 apples, eat 2, and buy 3 more, how many do I have?" -> "6."
  • CoT Prompt: "Let's think step by step."
    • Reasoning: "Start with 5 apples. Eat 2, so 5 - 2 = 3. Buy 3 more, so 3 + 3 = 6."
    • Final Answer: "6."

While humans have been using CoT prompting for a while, Reasoning Models bake this process into the training and inference itself.

How Reasoning Models Work

Models like OpenAI o1 or Google's Gemini 1.5 Pro (in thinking mode) use Test-Time Compute. This means they spend more computational resources during inference (before sending the first token) to explore different solution paths, verify their own logic, and self-correct.

  1. Hidden Chain of Thought: The model generates a long internal monologue that the user typically doesn't see.
  2. Self-Correction: If the model hits a dead end in its logic, it can backtrack and try a different approach.
  3. Reinforcement Learning: These models are heavily trained using Reinforcement Learning (RL) to optimize for correct reasoning paths, especially in math, coding, and logic puzzles.

Why It Matters

  • Solving Complex Problems: Reasoning models excel at tasks that stump traditional LLMs, such as advanced mathematics, complex coding architecture, and scientific research.
  • Reduced Hallucinations: By verifying steps logically, these models are less prone to "vibes-based" errors.
  • Agentic Capabilities: Stronger reasoning is a prerequisite for reliable autonomous agents that need to plan multi-step workflows.

When to Use Reasoning Models?

  • Use Standard Models (GPT-4o, Claude 3.5 Sonnet) for: Creative writing, simple summaries, fast chat, and general knowledge.
  • Use Reasoning Models (o1, Gemini Thinking) for: Complex refactoring, mathematical proofs, strategy planning, and "deep work" that requires high accuracy.

The Future: System 1 vs. System 2

We are moving towards a future where AI systems will dynamically switch between "System 1" (fast, intuitive, cheap) and "System 2" (slow, deliberative, expensive) thinking modes depending on the difficulty of the task at hand.

Related Tags:

#reasoning#chain-of-thought#o1