Losing the imitation game

AI cannot develop software for you, but that's not going to stop people from trying to make it happen anyway. And that is going to turn all of the easy software development problems into hard problems.

A pair of figures at a carnival, wearing elaborate red costumes and unsettling, almost featureless white masks

If you've been anywhere near major news or social media in the last few months, you've probably heard repeatedly about so-called AI, ChatGPT, and large language models (LLMs). The hype surrounding these topics has been intense. And the rhetoric has been manipulative, to say the least. Proponents have claimed that their models are or soon will be generally intelligent, in the way we mean humans are intelligent. They're not. They've claimed that their AI will eliminate whole categories of jobs. And they've claimed that developing these systems further and faster is both necessary and urgent, justified by science fiction dressed up as arguments for some sort of "safety" that I find to be incoherent.

The outer layer of hype surrounding AI—and LLM chatbots in particular—is that they will become indispensable tools of daily work, and entirely replace people in numerous categories of jobs. These claims have included the fields of medicine, law, and education, among others. I think it's nonsense. They imagine self-teaching classrooms and self-diagnosing fitness gadgets. These things will probably not even work as well as self-driving cars, which is to say: only well enough to be dangerous. Of course, that's not stopping people from pushing these fantasies, anyway. But these fields are not my area of expertise. My expertise is in software engineering. We should know better, but software developers are falling victim to the same kind of AI fantasies.

A computer can never be held accountable. Therefore, a computer must never make a management decision.

While the capabilities are fantasy, the dangers are real. These tools have denied people jobs, housing, and welfare. All erroneously. They have denied people bail and parole, in such a racist way it would be comical if it wasn't real. And the actual function of AI in all of these situations is to obscure liability for the harm these decisions cause.

So-Called AI

Artificial Intelligence is an unhelpful term. It serves as a vehicle for people's invalid assumptions. It hand-waves an enormous amount of complexity regarding what "intelligence" even is or means. And it encourages people confuse concepts like cognition, agency, autonomy, sentience, consciousness, and a host of related ideas. However, AI is the vernacular term for this whole concept, so it's the one I'll use. I'll let other people push that boulder, I'm here to push a different one.

Those concepts are not simple ideas, either. Describing them gets into hard questions of psychology, neurology, anthropology, and philosophy. At least. Given that these are domains that the tech field has routinely dismissed as unimportant for decades, maybe it shouldn't be surprising that techies as a group are now completely unprepared to take a critical view of claims about AI.

The Turing Test

Certainly part of how we got here is the Turing test. That is, the pop science reduction of Alan Turing's imitation game. The actual proposal is more substantial. And taking it seriously produces some interesting reading. But the common notion is something like a computer is intelligent if it can reliably pass as human in conversation. I hope seeing it spelled out like that makes it clear how dramatically that overreaches. Still, it's the framework that people have, and it informs our situation. I think the bit that is particularly informative is the focus on natural, conversational language. And also, the deception inherent in the imitation game scenario, but I'll come back to that.

Our understanding of intelligence is a moving target. We only have one meaningful fixed point to work from. We assert that humans are intelligent. Whether anything else is, is not certain. What intelligence itself is, is not certain. Not too long ago, a lot of theory rested on our ability to create and use tools. But then that ability turned out to be not as rare as we thought, and the consensus about the boundaries of intelligence shifted. Lately, it has fallen to our use of abstract language. That brings us back to AI chatbots. We suddenly find ourselves confronted with machines that seem to have a command of the English language that rivals our own. This is unfamiliar territory, and at some level it's reasonable that people will reach for explanations and come up with pop science notions like the Turing test.

Language: any system of formalized symbols, signs, sounds, gestures, or the like used or conceived as a means of communicating thought, emotion, etc.

Language Models

ChatGPT and the like are powered by large language models. Linguistics is certainly an interesting field, and we can learn a lot about ourselves and each other by studying it. But language itself is probably less than you think it is. Language is not comprehension, for example. It's not feeling, or intent, or awareness. It's just a system for communication. Our common lived experiences give us lots of examples that anything which can respond to and produce common language in a sensible-enough way must be intelligent. But that's because only other people have ever been able to do that before. It's actually an incredible leap to assume, based on nothing else, that a machine which does the same thing is also intelligent. It's much more reasonable to question whether the link we assume exists between language and intelligence actually exists. Certainly, we should wonder if the two are as tightly coupled as we thought.

That coupling seems even more improbable when you consider what a language model does, and—more importantly—doesn't consist of. A language model is a statistical model of probability relationships between linguistic tokens. It's not quite this simple, but those tokens can be thought of as words. They might also be multi-word constructs, like names or idioms. You might find "raining cats and dogs" in a large language model, for instance. But you also might not. The model might reproduce that idiom based on probability factors instead. The relationships between these tokens span a large number of parameters. In fact, that's much of what's being referenced when we call a model large. Those parameters represent grammar rules, stylistic patterns, and literally millions of other things.

What those parameters don't represent is anything like knowledge or understanding. That's just not what LLMs do. The model doesn't know what those tokens mean. I want to say it only knows how they're used, but even that is over stating the case, because it doesn't know things. It models how those tokens are used. When the model works on a token like "Jennifer", there are parameters and classifications that capture what we would recognize as things like the fact that it's a name, it has a degree of formality, it's feminine coded, it's common, and so on. But the model doesn't know, or understand, or comprehend anything about that data any more than a spreadsheet containing the same information would understand it.

Mental Models

So, a language model can reproduce patterns of language. And there's no particular reason it would need to be constrained to natural, conversational language, either. Anything that's included in the set of training data is fair game. And it turns out that there's been a lot of digital ink spent on writing software and talking about writing software. Which means those linguistic patterns and relationships can be captured and modeled just like any other. And sure, there are some programming tasks where just a probabilistic assembly of linguistic tokens will produce a result you want. If you prompt ChatGPT to write a python function that fetches a file from S3 and records something about it in DynamoDB, I would bet that it just does, and that the result basically works. But then, if you prompt ChatGPT to write an authorization rule for a new role in your application's proprietary RBAC system, I bet that it again just does, and that the result is useless, or worse.

Programming as Theory Building

Non-trivial software changes over time. The requirements evolve, flaws need to be corrected, the world itself changes and violates assumptions we made in the past, or it just takes longer than one working session to finish. And all the while, that software is running in the real world. All of the design choices taken and not taken throughout development; all of the tradeoffs; all of the assumptions; all of the expected and unexpected situations the software encounters form a hugely complex system that includes both the software itself and the people building it. And that system is continuously changing.

The fundamental task of software development is not writing out the syntax that will execute a program. The task is to build a mental model of that complex system, make sense of it, and manage it over time.

To circle back to AI like ChatGPT, recall what it actually does and doesn't do. It doesn't know things. It doesn't learn, or understand, or reason about things. What it does is probabilistically generate text in response to a prompt. That can work well enough if the context you need to describe the goal is so simple that you can write it down and include it with the prompt. But that's a very small class of essentially trivial problems. What's worse is there's no clear boundary between software development problems that are trivial enough for an LLM to be helpful vs being unhelpful. The LLM doesn't know the difference, either. In fact, the LLM doesn't know the difference between being tasked to write javascript or a haiku, beyond the different parameters each prompt would activate. And it will readily do a bad job of responding to either prompt, with no notion that there even is such a thing as a good or bad response.

Software development is complex, for any non-trivial project. But complexity is hard. Overwhelmingly, when we in the software field talk about developing software, we've dealt with that complexity by ignoring it. We write code samples that fit in a tweet. We reduce interviews to trivia challenges about algorithmic minutia. When we're feeling really ambitious, we break out the todo app. These are contrivances that we make to collapse technical discussions into an amount of context that we can share in the few minutes we have available. But there seem to be a lot of people who either don't understand that or choose to ignore it. They frame the entire process of software development as being equivalent to writing the toy problems and code samples we use among general audiences.

Automating the Easy Part

The intersection of AI hype with that elision of complexity seems to have produced a kind of AI booster fanboy, and they're making personal brands out of convincing people to use AI to automate programming. This is an incredibly bad idea. The hard part of programming is building and maintaining a useful mental model of a complex system. The easy part is writing code. They're positioning this tool as a universal solution, but it's only capable of doing the easy part. And even then, it's not able to do that part reliably. Human engineers will still have to evaluate and review the code that an AI writes. But they'll now have to do it without the benefit of having anyone who understands it. No one can explain it. No one can explain what they were thinking when they wrote it. No one can explain what they expect it to do. Every choice made in writing software is a choice not to do things in a different way. And there will be no one who can explain why they made this choice, and not those others. In part because it wasn't even a decision that was made. It was a probability that was realized.

[A programmer's] education has to emphasize the exercise of theory building, side by side with the acquisition of knowledge of data processing and notations.

But it's worse than AI being merely inadequate for software development. Developing that mental model requires learning about the system. We do that by exploring it. We have to interact with it. We manipulate and change the system, then observe how it responds. We do that by performing the easy, simple programing tasks. Delegating that learning work to machines is the tech equivalent of eating our seed corn. That holds true beyond the scope of any team, or project, or even company. Building those mental models is itself a skill that has to be learned. We do that by doing it, there's not another way. As people, and as a profession, we need the early career jobs so that we can learn how to do the later career ones. Giving those learning opportunities to computers instead of people is profoundly myopic.

Imitation Game

If this is the first time you're hearing or reading these sentiments, that's not too surprising. The marketing hype surrounding AI in recent months has been intense, pervasive, and deceptive. AI is usually cast as being hyper competent, and superhuman. To hear the capitalists who are developing it, AI is powerful, mysterious, dangerous, and inevitable. In reality, it's almost none of those things. I'll grant that AI can be dangerous, but not for the reasons they claim. AI is complicated and misunderstood, and this is by design. They cloak it in rhetoric that's reminiscent of the development of atomic weapons, and they literally treat the research like an arms race.

I'm sure there are many reasons they do this. But one of the effects it has is to obscure the very mundane, serious, and real harms that their AI models are currently perpetuating. Moderating the output of these models depends on armies of low paid and precariously employed human reviewers, mostly in Kenya. They're subjected to the raw, unfiltered linguistic sewage that is the result of training a language model on uncurated text found on the public internet. If ChatGPT doesn't wantonly repeat the very worst of the things you can find on reddit, 4chan, or kiwi farms, that is because it's being dumped on Kenyan gig workers instead.

That's all to say nothing of the violations of intellectual property and basic consent that was required to train the models in the first place. The scale of the theft and exploitation required to build the data sets these models train with is almost inconceivable. And the energy consumption and e-waste produced by these systems is staggering.

All of this is done to automate the creation of writing or media that is designed to deceive people. It's intended to seem like people, or like work done by people. The deception, from both the creators and the AI models themselves, is pervasive. There may be real, productive uses for these kinds of tools. There may be ways to build and deploy them ethically and sustainably. But that's not the situation with the instances we have. AI, as it's been built today, is a tool to sell out our collective futures in order to enrich already wealthy people. They like to frame it as being akin to nuclear science. But we should really see it as being more like fossil fuels.


Cover photo by Helena Jankovičová Kováčová