Computing in the Age of Hallucination
Generative AI moves us away from replicable computing, computing that performs math, to story-based computing, computing that makes guesses.
In the early 1960s George Fuechsel, a programmer working at IBM, coined the term “garbage in, garbage out” to explain that a computer is only capable of processing what it is given. Or to put it another way, computers give answers based on the data with which they are presented, and if you present a computer with the same data again, you will somewhat predictably get the same answer again.
Fifty years on we saw this come back to bite us at the start of the Big Data era. The new cohort of data scientists, who mostly came from an academic computing backgrounds, were used to the data they manipulated being correct. If data was in a database, in a computer, it was true. At least for some values of true. A lot of these new data scientists didn't really understand “garbage in, garbage out” and had problems dealing with real world data with all its natural variability, and our own inability to control variables that might affect the data we were collecting. The idea that out in the real world two measurements of the same thing, taken with the same instrument, at almost the same time, might substantially and statistically differ, became more than somewhat problematic. Because they presented the wrong data, they got the wrong answer. Predictably.
The arrival of large language models, generative artificial intelligence, moves us away from even that precarious place. We are moving away from computing that gives predictable answers, to computing that gives approximate guesses, we're entering an age of story-based computing.
Large language models may give the appearance of reasoning, but that appearance is much less impressive than it seems at first glance. While there are some examples of models doing things that appear to require a model of the world around them, the ability to reason about how things should happen out here in the physical world, there are counter examples of it not being able to perform the same trick where you would assume — as a human — that holding such a model of the world would inevitably produce the right answer.
That's because generative models do not hold a physical model of the world, that's not really what is happening. They aren’t physical models. They’re story models. They physics they deal with isn’t necessarily the physics of the real world, instead its story world — and semiotic physics.
Models do not understand the world, instead they are prediction engines. A large language model is just a statistical model of language, not the physical world around it. Given a prompt — a string of tokens — they predict the next token, and then the next, based on the weights given to those tokens by their training data. Unfortunately for modern AI that data is the internet, and the internet is full of lies.
Models are tools of narrative and rhetoric. Not necessarily logic. We can tell this from the lies they tell, the invented facts, and their stubbornness in sticking with them. But models are stubborn not because they believe their own lies, but because their ground truth — the data their were train on — is full of them. It's amazing that they are useful as they are, let alone that we now seem at times to be relying on them to tell the truth.
We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the- art models trained at the largest available scales which claim strong function, using a simple, short, conventional common sense problem formulated in concise natural language, easily solvable by humans. The breakdown is dramatic, as models also express strong overconfidence in their wrong solutions, while providing often non-sensical "reasoning"-like explanations akin to confabulations to justify and backup the validity of their clearly failed responses, making them sound plausible. — Nezhurina et al., 2024
If your use case need reasoning, especially reasoning that can be backed by facts and references, then foundation models are not the right, or even perhaps a particularly good, tool to use.
Yet foundation models have already proved themselves useful as assistive technologies when it comes to writing software. So it seems likely to me that all computer users — not just developers — may soon have ability to develop small software tools from scratch. But also, perhaps more interestingly, describe changes they’d like made to existing software that they might already be using.
Writing code is a huge bottleneck in the modern world. Software started out being custom-developed within companies, for their own use; it only became a mass-produced commodity later, when demand grew past available supply. Most code, in most companies, now lives in spreadsheets. Most data is consumed and processed in spreadsheets, by people that aren’t traditional developers. Consumed and processed by end users.
If end users suddenly have the ability to make small but potentially significant changes to the software they use, using a model, whether they have source code to the software they use — so their model can make changes to it — might matter to the average user, not just to developers. This has the potential to create serious structural change in the way software is created, but also in the way it is owned.
But model hallucinations do matter, and it might well turn out that consequential changes are hard for end users to make. Failures of the model addressable by experienced developers, who can easily spot problems with generated code and fix them, could well be impenetrable to most users. Partially broken pieces of model-modified software could become prevalent in many work places. Today's uniform environments could quickly become bespoke, with nominally similar software performing similar jobs customised between companies or even between individual users.
We do see this sort of customisation even today, where one developer sitting down at another's machine curses as their colleagues environment inevitably has different macros and key bindings, and slightly different versions of needed dependencies. This problem may soon extend to most end users, not just developers and our complicated and customised desktop environments. We return to garbage in, garbage out. Except now, this time, instead of our data, it might well be our software.
However despite this, the ready availability of models and their ability to manipulate software without direct understanding of its function, or the consequences of change, could well become a double-edged sword for maintainability.
While on the one hand models might cause polyfurcation of software environments between users, on the other, they might also act to assist developers navigate an increasingly labyrinth software stack.
We live in a world where understanding large pieces of software from the user interface all the way down is almost impossible. The age of the hero programmer has, or in any case will shortly, come to an end. Today's world is one populated by programmer archeologists, where we generally only understand the layering in systems at best. Systems in use tend to remain in use; the more critical they are, the more inertia they have.
Understanding the full stack is almost impossible in any large modern software system, as such systems generally consist of layers of legacy code encapsulating institutional knowledge. A legacy software system is years of undocumented corner cases, bug fixes, codified procedures, all wrapped inside software. Impossible to grasp as a whole, only in outline.
But models are not us, they do not have to understand a system to assist developers working with it to refactor, track down bugs, or to stick it and disparate other systems together with glue code. In the end models may be better suited to navigating large software systems than the humans that built them.
The use of models to assist development of software will have a profound influence on the level of abstraction required to use and develop it. However more worryingly, this added layer of abstraction is starting to appear not just in software, but now also in the physical world. Models may not have a physical understanding of the world around them, but they can influence it.
Because we’re increasingly using software and models to “fix” legacy hardware, allowing mechanical dials and readouts to be remotely monitored. For now these fixes involve "classical" classifiers and other models where right and wrong have meaning. But soon generative models may well be employed in attempts to interpret the world.
The difference between classical machine learning models and generative models is the difference between error, and hallucination. With generative models there is just no way to evaluate accuracy. Their semiotic world view means that their responses can be plausible but still hallucinatory.
Models are cognitive engines, story-driven bundles of non-reasoning, soon to be plumbed directly into the same interfaces that the apps on our phones use, that we ourselves use, to make changes in—and interact with—the world around us.
Which leaves us with the problem of what happens when the software and machine learning goes wrong, or just isn’t very good. Because change is coming, and it's coming quickly. A decade after software was declared to be “eating the world,” models may now be eating software.