
Paradoxical Insight: The more capable and self-designing AI models become, the less valuable the companies that design it become; and the more valuable the companies that control the physical atoms it runs on – chips, energy, land – become.
A Brief History of AI
1. The Foundational Architectural Layer: The Transformer Paradigm
The publication of “Attention if All you Need” in 2017, and its introduction of “transformers” is the genesis of the current AI super cycle. Before transformers, we were handcuffed by sequential processing of recurrent neural networks and LTSMs. Its seminal contribution introduce the idea of attention that could be perfectly parallelized on GPUs. It was a new kind of hardware-aware architecture that turned massive data into fuel. It moved the bottleneck from computation to data and scale.
2. The Methodology Layer: The “Pre-train then Fine-tine” Paradigm
With this new transformer engine, we needed a better driving strategy. The 2018 shift – epitomized by the decoder-only GPT and encoder-only BERT – proved that self-supervised pre-training created a deep, contextual understanding of language. This reusable “world model” could then be efficiently fine-tuned for specific tasks, breaking the cycle of building a new model for every single problem.
3. The Scaling Layer: The Emergent Property of Size
We had the engine and the driver, the next step was to supercharge our AI vehicle. The GPT-3 was our experimental proof of the scaling laws. By aggressively increasing parameters and data, we saw “emergent abilities” – like in-context learning and reasoning – arise not from explicit programming, but from the model’s sheet complexity. This was the inflection point in the first S-curve of LLMs where we stopped just teaching models, and started observing what they could teach themselves.
4. The Interaction Layer: Alignment as a Product Feature
A powerful AI vehicle is useless if you don’t have an interface to interact with it, control it, and manage its state. The ChatGPT revolution ushered by OpenAI in late 2022 was about adding this critical UI and control layer. Through reinforcement learning with human feedback (RLHF), we moved from treating the model as a text generator to shaping it as a conversational agent. This shift from raw capability to usability, forcing us to formalize the fuzzy concept of “helpfulness” into a concrete optimization target.
5. The Accessibility Layer: The Efficiency & Open-Source Mandate
The current frontier isn’t about building a bigger engine; it’s about building a more efficient and accessible vehicle. Models like DeepSeek-R1 introduced in early 2025 represent this topmost layer – a strategic focus on performance-per-watt, long-context engineering, and a commitment to open weights. This layer is about dissolving the moat of exclusivity and empowering the ecosystem to build the next breakthrough on a foundation of state-of-the-art, efficient, and open technology.
A Digression into Dynamical Systems
LLM architecture at every layer of its evolution (at least in the 5 layers highlighted above) can be modeled as a dynamical system.
Each layer in the 5 layers represents a higher-order recurrence (a fundamental idea borrowed from computer science), carrying forward the model’s state across a broader temporal or systemic boundary.
2017: recurrence parallelized
2018-21: recurrence internalized into weights
2020-22: recurrence emerges as meta-learning
2022-23: recurrence closed vs human feedback
2024-25: recurrence externalized into ecosystem memory
We can formalize LLMs as a dynamical system:
= latent internal memory
= previous internal memory
= user input
= state transition function representing the neural architecture
= model response
= output function mapping internal latent state to response
Why did I digress into this mathematical description of LLM architecture? Because LLMs are surprisingly structured, and each step of its evolution is simply the discovery of a new state transition function g, which represents the neural architecture.
From Dynamical Systems to AGI
The progression of AI models can be viewed as a sequence of state transition functions g, which under some metric M, approaches some defined level α based on how we define artificial general intelligence (AGI) (that is, M(g) approaches α).
My hypothesis – which can’t be proven nor disproven – is that α (the “superintelligence” level of AGI) represents a horizontal asymptote that we can never cross due to some inherent property of reality itself (yes, a simulation-based hypothesis).
I believe we are at a critical point in the history of AI evolution where we are approaching the inflection point of AI’s S-curve. This inflection point represents the potential discovery of a state transition function g* that can learn, understand its limitations, improve upon them, and continue to do so recursively.
This will be a profound paradigm shift in humanity’s history when AI can self-improve itself and it will signal the end of software as a limiting constraint, an incredible technological inversion from software to hardware which will redefine our evolution.
When intelligence becomes a commodity, the only thing that limits its growth and application is the physical capacity to run it.
Chips to compute it.
Energy to power it.
Land to house it.
Ironically, the final and most valuable product of the digital revolution will be the re-sovereignty of the physical world.
Leave a comment