Speech recognition technology shows double-digit error rate

Nov 29, 2006

CHICAGO - A study of radiology reports generated by speech recognition software at a prestigious U.S. academic medical center found that even with fully trained system users the technology was introducing significant errors in almost one out of five signed, final reports.

"We structured a study where we could look at the types of errors and frequency of errors in attending dictated reports using voice recognition software," said Dr. Ronald Dolin, who is from the department of radiology at Thomas Jefferson University Hospital in Philadelphia.

Dolin and his team retrospectively reviewed radiology reports dictated and signed by attending radiologists from February to March 2006 at Thomas Jefferson. According to Dolin, all radiology reports at the institution were generated using PowerScribe 4.7 from Dictaphone/Nuance Communications of Burlington, MA, which had been implemented for 16 months prior to the start of the study.

A total of 395 reports, consisting of five to 10 consecutive reports from each of the 41 attending radiologists, were reviewed, according to the researchers. They categorized the dictation errors into 10 subtypes, such as missing word errors, wrong word errors, extra word errors, nonsense phrase, unclear meaning phrase, or abnormal phrase with meaning intact.

"Errors were classified as significant if they altered or obscured the meaning of the sentence in which they appeared," Dolan said.

A total of 239 errors were identified in 146 of 395 reviewed reports for an overall error rate in final reports of 37%, he reported. In addition, he noted that at least one error was identified in reports from 40 of the 41 attending radiologists at Thomas Jefferson.

The researchers found that missing or extra words that did not alter the meaning of a sentence constituted 113 of the 239 errors, or 47% of the total. This type of error was found in the reports of 33, or 81%, of the attending radiologists, Dolan said.

Other common errors reported by the scientists included the wrong word, which accounted for 21% of the total errors; typographical or grammatical errors accounted for 8.8% of the total; a nonsense phrase with unknown meaning showed up in 11% of the total errors; and an error in the dictation date presented in 2.9% of the erroneous reports.

Most of the speech recognition technology errors, 83%, did not alter the meaning of the report, Dolin noted.

However, the remaining 17% of the errors could have impacted patient care. Significant errors -- errors that could conceivably alter a patient outcome -- accounted for 40 of the 239 error total. He said that significant errors were found in 54% of the attending radiologist reports and that five radiologists had two or more significant errors.

Dolin acknowledged that he set the criteria for classification of significant errors and determined what was and was not a significant error. During a discussion of the research after the presentation, a few audience members vigorously expressed their belief that Dolin's standard for significance had not been inclusive enough.

Rather than being a condemnation of the technology, Dolin believes that his study can provide a method for quality assurance and continuing education in the vagaries of speech recognition software. A periodic audit of a relatively small number of radiology reports, such as five to 10 reports per radiologist, can identify significant voice recognition error patterns among the group and by individuals, and can assist in efforts to mitigate these problems, he said.

By Jonathan S. Batchelor
AuntMinnie.com staff writer
November 30, 2006

Workflow, communication tools offer new hope for productivity, August 1, 2006

Meeting the challenge of structured reporting, June 23, 2006

Voice recognition speeds operations, irks radiologists, May 17, 2006

Workflow analysis yields process refinements, March 17, 2006