AI abstract visualization Unsplash · Free to use
Computing C Student — Vol. 02

The Confident Wrong Answer Is the Most Dangerous Feature in AI

Hallucination isn't a bug they're working on. It's structurally built into how large language models work. What that actually means for people using these tools every day.

In the fall of 2023, a lawyer named Steven Schwartz submitted a legal brief to a federal court in New York that cited six case precedents in support of his client's position. All six cases were fabricated. They had plausible names, plausible citation formats, plausible summaries. None of them existed. Schwartz had used ChatGPT to research the brief, and ChatGPT had generated case citations with the same formatting, the same tone, and the same apparent authority as real ones. Schwartz received a sanctions hearing. His client's case was not helped.

This story was covered extensively when it happened, and most of the coverage treated it as a story about AI's limitations — a cautionary tale about trusting a new technology before understanding it. That framing is accurate but incomplete. The more useful framing is that the story reveals something structural about how large language models work, something that is not going to be fixed in the next version, and something that people using these tools at work need to understand in order to use them without getting hurt.

The word the industry uses is hallucination. It is a misleading word. It implies a deviation from normal operation — a glitch, an error, an exception. What hallucination actually describes is the model doing exactly what it was designed to do, in a situation where doing exactly what it was designed to do produces a false output. The problem is not a malfunction. It is the function.

§

What a Language Model Is Actually Doing

A large language model is, at a fundamental level, a very sophisticated next-token predictor. Given a sequence of text, it produces the token — roughly speaking, a word or word-fragment — most likely to follow that sequence, based on patterns learned from an enormous corpus of training data. It does this recursively, producing each new token based on all the tokens before it, until the output is complete.

This architecture is remarkably good at producing text that sounds correct. It is not designed to produce text that is correct. The distinction is important and consistent. A language model has no mechanism for checking whether the tokens it generates correspond to facts that exist in the world. It has a sophisticated sense of what factual-sounding text looks like, because it trained on an enormous amount of factual-sounding text. But "sounds like a fact" and "is a fact" are different properties, and the model optimizes for the first one.

When the model generates a citation for a case called Vargason v. Hardwick, 2019, 3rd Circuit, it is not lying. It is producing a sequence of tokens that follows the pattern of legal citations it has seen in its training data. The model has no awareness that the case does not exist. It has no category for "does not exist." It has patterns, and it is following them.

"The model doesn't know it's wrong. That's not a limitation they're going to engineer away. That's the architecture."
§

Why Confidence Makes It Worse

If language models produced uncertain-sounding output when generating potentially false information, the problem would be manageable. People are accustomed to handling uncertainty. We know how to act on "I think the meeting is at 2pm but you should confirm." We are less practiced at handling confident wrongness — output that presents itself with the same tone and formatting as verified truth.

Language models are calibrated toward confidence. The training process rewards outputs that are consistent with patterns in the data, and confident, authoritative text is a major pattern in that data. Legal documents are written confidently. Medical literature is written confidently. News articles, academic papers, professional communications — the corpus is dominated by confident prose. The model learned to write the way its training data is written.

The result is an output style that is systematically misleading about the reliability of its content. A language model summarizing a research paper will use the same confident, precise language whether it is accurately characterizing the paper's findings or generating plausible-sounding text that deviates from them. The user has no signal from the output itself to indicate which is happening.

This is not a solvable problem in the way that, say, a memory leak is a solvable problem. Researchers have made progress on calibration — on getting models to express uncertainty more accurately — but the fundamental architecture creates a ceiling on how well-calibrated the outputs can be. A model that generates tokens by predicting what should come next cannot perfectly distinguish between "I know this" and "I generated this."

§

The Categories of Risk

Hallucination risk is not uniform across use cases. Understanding where it is highest and where it is lowest is more useful than the general warning to "always verify AI outputs" — a warning so broad that it either gets ignored or paralyzes the people who take it seriously.

The highest-risk category is factual specificity: specific names, specific numbers, specific dates, specific citations, specific quotations. These are the outputs where the model's pattern-matching produces something that looks precisely correct — a specific case name, a specific statistic, a specific quote attributed to a specific person — while having no particular grounding in whether that precise thing exists or occurred. This is the Schwartz problem. The outputs that look most like verified facts are often the ones most worth checking.

Lower-risk categories include structural tasks — organizing information, generating outlines, reformatting data, summarizing documents that you can directly compare to the summary. In these cases, you have a reference point. You can see whether the summary matches the document. You can check whether the structure makes sense. The model's potential to generate wrong outputs is constrained by the structure of the task.

Medium-risk is the large middle: analysis, synthesis, explanation. A model explaining how a photovoltaic cell works is probably mostly correct, because the basic physics is over-represented in training data and the model has a lot to draw on. The same model explaining a niche regulatory change in a specific state, or the outcome of a specific piece of litigation, is operating with less signal and more interpolation. The confidence of the output will be similar. The reliability will not.

§

Using These Tools Without Getting Burned

None of this means language models are not useful. They are useful in ways that were not possible before they existed, and I use them every day. The useful frame is not "can I trust this output" — a question that has no consistent answer — but "what would it cost me if this output is wrong."

For drafting, brainstorming, structural organization, explaining concepts you already understand: the cost of a wrong output is low because you have enough context to catch the errors. Use them freely. The leverage is real.

For specific factual claims — case citations, statistics, quotations, product specifications, regulatory requirements — treat every output as a draft that requires verification against a primary source. Not because the model is wrong most of the time, but because the cost of the times it is wrong is high, and the output gives you no reliable signal about which times those are.

The most dangerous use case is the one where you have less expertise than the model appears to have. A lawyer using ChatGPT for legal research has enough knowledge to recognize that a case citation should be verified. A non-lawyer doing the same task may not understand that case citations require verification at all — the output looks exactly like what they imagine verified legal research looks like. Confidence plus unfamiliarity is the combination that produces the Schwartz scenario.

Know what you know. Use these tools in areas where you know enough to catch the errors. Stay humble about the areas where you don't. The model will not tell you when it's guessing. That part is on you.

From the Archive · Vol. 01

Cars

Porsche Didn't Go Electric for the Gas Prices

Read Now →
Porsche Taycan Turbo GT
← All Stories The Shop