The Evolution of AI Agents: From 1.0 to 3.0, and Why I Don't Believe in Omnipotent AGI


AI AgentsContext EngineeringLLMEdge AIIan Chou

Over the past two years, the term “AI Agent” has shifted from niche terminology to a buzzword on the lips of every AI practitioner. But what stages has the “Agent” concept actually gone through? Many people only experienced one or two of these phases before jumping straight to declaring, “AGI is here.”

In this article, I want to clarify these developmental stages and share my perspective on the future—Agents 3.0—and why the true endgame will not be a single, omnipotent AGI.

Agents 1.0: The Hype and Disillusionment of Shallow Loops (Spring 2023)

The period around April 2023, when AutoGPT skyrocketed to fame overnight, typified the Agents 1.0 era.

The core architecture was essentially a single while loop:

  • Feed a high-level goal to the LLM (e.g., “Help me start a company with $10M annual revenue”).
  • The LLM breaks down the task → Calls tools → Feeds results back into the prompt → Repeats.

It sounded cool, but in practice, 90% of the runtime was spent:

  • Repeating the same actions (infinite loops).
  • Generating a pile of useless intermediate files.
  • Ultimately requiring manual human intervention.

During that time, many (myself included) excitedly thought “AGI is just one commit away,” only to discover we were simply lengthening the chain of prompt engineering. The Context Engineering at this stage was primitive—basically “Prompt Stuffing” (cramming everything in)—and it would almost inevitably collapse on tasks requiring more than 15 steps.

Agents 1.5: Stronger Capabilities, Same Essence (Late 2023 to Present)

As models like GPT-4, Claude 3.5, and Gemini 1.5 arrived with longer context windows and stronger reasoning capabilities, many believed we had reached “True Agents.”

Characteristics of this stage:

  • You give a clear instruction, and the LLM generates longer, more structured content (long-form text, code, charts) in one go.
  • Tool calling became reliable; hallucinations decreased.
  • However, the interaction model remained unchanged: It was still fundamentally “Q&A” or “fire-and-forget” for a single large task.

Essentially, this is just an upgraded version of Agents 1.0—the models are smarter, and prompt engineering is more efficient, but the architecture is still that shallow loop. It just survives 50 steps now instead of 15. Tools like Cursor and Claude Projects currently operate mostly in this stage.

Agents 2.0: Deep Agents and Flow Engineering (Starting 2025)

Here, I must reference the core insights from Phil Schmid’s “Agents 2.0: From Shallow Loops to Deep Agents” and add my perspective on the technical implementation.

Agents 1.0 relied on “LLM Context Window = Total State,” leading to Context Overflow, Loss of Goal, and No Recovery.

Agents 2.0 (Deep Agents) represent an architectural leap, marking a shift in development from Prompt Engineering to Flow Engineering. The tech stack is moving from simple API calls to frameworks capable of state management, like LangGraph or LlamaIndex Workflows.

The four core pillars are:

  1. Explicit Planning: No more implicit chain-of-thought. Instead, the agent maintains an editable dynamic plan (like a Markdown To-Do list), reviewing and updating it at every step.
  2. Hierarchical Delegation: The emergence of an Orchestrator + Sub-Agent structure. Specialized Researcher, Coder, and Writer sub-agents possess clean contexts and return only refined results.
  3. Persistent Memory: Intermediate results are written to external storage (File Systems, Vector DBs). Here, Context Engineering evolves into designing a “Memory Schema”—deciding what should be remembered and what should be forgotten.
  4. Extreme Context Engineering: The system prompt is no longer a single sentence but contains detailed state machine definitions: When should it stop to plan? When should it spawn a sub-agent?

This architecture allows agents to genuinely handle tasks requiring hours or even days, rather than just seconds or minutes.

Agents 3.0: From Digital Embodiment to Empathetic Agents (My Current Research)

Building on Agents 2.0, the next phase is Embodied Empathetic Agents. This isn’t just about robotics; it’s the transition from “Digital Embodiment” to “Physical/Sensory Embodiment.”

Step 1: Digital Embodiment

This is the concept behind Claude 3.5’s “Computer Use” or LAMs (Large Action Models). The Agent no longer interacts solely via APIs but possesses “eyes” and “hands,” capable of reading GUIs, clicking mice, and operating legacy software that lacks APIs. This transforms the Agent from a “Tool Caller” into a true “Operator.”

Step 2: Sensory Empathy

This is the future I find most fascinating.

  • Multimodal Input: Vision, audio (voice + tonal emotion), and eventually physiological signals from wearables.
  • Environment Modeling: This is the hardest level of Context Engineering because the AI needs to interpret not just text, but the “atmosphere of the moment.”
  • Proactive Prediction: Acting like a true partner. If you come home with a low tone of voice and heavy footsteps, an Agent 3.0 won’t wait for the command “Play music.” It will synthesize the data and suggest quiet, a cup of hot tea, or appropriate ambient music.

Why I Don’t Believe in Omnipotent AGI

Many extrapolate Agents 3.0 linearly to conclude that “AGI must be a single, all-knowing, all-powerful superintelligence.”

I completely disagree, for two reasons: Thermodynamics and Edge Computing Logic.

1. Thermodynamic and Economic Constraints

An entity that “knows everything and can do everything” requires real-time access to hundreds of exabytes of human knowledge while maintaining peak alertness. The energy required would rival a planet-sized data center. Mobilizing that level of compute just to answer “What’s the weather today?” or “Set an alarm” is absurd from both thermodynamic and economic standpoints.

2. Latency & Privacy

If we want Agent 3.0 to read micro-expressions in real-time and provide empathetic responses, this cannot wait for signals to travel to a cloud-based super AGI and back (latency is too high), nor can we upload all your private life data (privacy risk).

The true endgame is a “Cloud-Edge Synergy” distributed ecosystem:

  • Cloud LLMs: Responsible for complex reasoning and scientific research. Trained every 3 years. They act like a library or a university.
  • Domain Agents: Specialized in law, medicine, or coding. Trained every 6-8 years.
  • Personal AI (Edge/SLM): Running on your phone or glasses. Real-time, low power, absolute privacy. It focuses solely on understanding you.

Conclusion

From the hype and disillusionment of Agents 1.0 (Shallow Loops) to the architectural revolution of Agents 2.0 (Flow Engineering), we are now moving toward the era of Embodied Empathy in Agents 3.0.

The essence of Context Engineering is also evolving—from “fill-in-the-blanks,” to “state management,” and finally to “environment and emotional modeling.” This path leads not to an omnipotent god, but to countless specialized, energy-efficient, and understanding digital partners.

This is the AI future I believe is truly worth anticipating—and the one that aligns with physical reality.