The strange truth about today’s most powerful AI is that even the people who build it cannot fully explain why it works, which means much of modern technology now rests on tools we can use far better than we can understand.
Publish Date: 2026-06-07 18:35:00
Source Domain: spacedaily.com
The people who build today’s most capable artificial intelligence can describe exactly how they train it. They can write down the architecture, the optimisation procedure and the objective the system is rewarded for meeting. What they cannot do, in any complete way, is explain how a finished model arrives at many of its particular answers. The training is understood. The thing the training produces is not.
This is not a sensational claim, and it is not the same as saying nobody knows how AI works. It is a narrower and stranger point. We understand the process that grows these systems far better than we understand the systems themselves.
What is understood, and what is not
A large language model is not programmed in the ordinary sense. Engineers specify a network architecture, usually a transformer, and an objective, usually predicting the next piece of text, and then run an optimisation process over enormous amounts of data. The model’s internal settings, its billions of numerical parameters, are not written by anyone. They are found by the training process.
The result is closer to something grown than something built. The capabilities that emerge, and the internal representations the model uses to produce them, are a product of the optimisation rather than a design anyone laid out in advance. Reading a trained model’s weights tells you almost nothing directly, in the way that reading out the strengths of every connection in a brain would not tell you what the person is thinking. The recipe is legible. The dish is not.
The field trying to open the box
There is a research programme aimed squarely at this gap, known as mechanistic interpretability, which tries to reverse-engineer the internal computations of neural networks into human-readable terms. Its modern form is closely associated with the researcher Chris Olah and the teams around him at Google, OpenAI and Anthropic. The early work, on image models, identified individual artificial neurons that…