Does ChatGPT really 'hallucinate'?

Aug 14, 2023

2019 04 04 21 52 2230 Artificial Intelligence Ai Data 400

A debate has ignited over how to refer to nonsensical text generated by AI large-language models like ChatGPT, with doctors leaning toward abandoning the term "hallucinations," according to letters published August 14 in JAMA Internal Medicine.

"As AI continues to integrate into the fabric of our society, we must not overlook the importance of language in describing its behavior," wrote clinical psychiatrist Joseph Thornton, MD, of the University of Florida in Gainesville.

To wit, Thornton took to task the use of the term in a perspective article by Teva Brender, MD, an internist at the University of California, San Francisco, that discussed implications of available AI tools for medical practice.

"The word confabulation is a more appropriate term, consistent with clinical usage, heuristic in addressing the problem, and avoids further stigmatization of both AI and persons who experience hallucinations," Thornton wrote.

The term hallucinations to refer to chatbot responses has been attributed to New York Times reporter Kevin Roose, who used it in an article titled "A conversation with Bing's chatbot left me deeply unsettled" that was published on February 16, 2023.

The term is now commonly understood by academicians and the public to describe nonsensical text that these large language models sometimes produce, as described by radiologist Rajesh Bhayana, MD, of the University of Toronto, in a recent interview with AuntMinnie.com.

In his letter, Thornton added, "Confabulations are fabricated but they are usually logically generated semantic statements. For example, citing references that do not exist is a confabulation."

In response, Brender said he agreed that use of hallucination in the context of AI is stigmatizing and "should be abandoned," yet noted that the word confabulation also carries semantic baggage.

"Confabulation means 'to generate a false memory without the intention of deceit'," he wrote. "Given that intention implies goals and aims, confabulation arguably anthropomorphizes or at least animates AI by ascribing it lifelike properties and raising the troublesome question of sentience."

Without an arbiter on the issue, AuntMinnie.com wondered whether ChatGPT itself could help establish a ground truth. In response to the question, "Do you hallucinate?" the chatbot stated:

"No, I don't hallucinate. I'm a machine-learning model developed by OpenAI, specifically, GPT-3.5. I don't have sensory experiences, emotions, or consciousness. I generate responses based on patterns in the text data I was trained on and the input I receive from users."

Ultimately, the use of metaphors can be a slippery slope, Thornton noted, as they not only describe but can also shape our conception of reality.

"The language we choose to use as researchers and clinicians can shape public attitudes and prevent unintentionally harmful stereotypes," he wrote.

Brender agreed and concluded, "The harder problem, and our collective challenge, will be to develop a shared vocabulary that accurately describes the underlying technology while remaining accessible to those without AI expertise."

Thornton's letter can be found here, and Brender's reply here.