Speech recognition vendors tackle dialect challenges

Dec 16, 2008

Talking to a computer instead of typing on a keyboard to input patient reports may seem like science fiction but clearly is science fact -- and it's becoming standard operating procedure as radiologists strive for improved workflow and patient throughput.

But talking to a computer and having it understand everything you say without error the first time, regardless of accent, dialect, or cultural language patterns, has yet to migrate from reel life to real life.

While speech recognition vendors acknowledge, by and large, that current software offerings may not be capable of recognizing, recording, and translating speech regardless of dialect or language the first time the software products hear vocal renderings, they at least make up for the slack in the back-end editing process.

Burlington, MA-based Nuance Communications, which recently acquired Philips Speech Recognition Systems and its SpeechMagic system, built improved support for accents into the newest version of its Dragon Medical engine, according to Peter Durlach, senior vice president of marketing and product strategy at the company's healthcare division.

"Almost immediately after a clinician starts using 10 and completes the enrollment, Dragon Medical will analyze his or her speech patterns and load the accent model, which closely matches what is heard," Durlach said. "Dragon Medical 10 will be much more accurate out of the box than version 9 for users who have an accent from growing up in the Deep South, or someone with a heavy Indian, English, or Hispanic accent."

Viztek of Jacksonville, FL, has integrated server-based software from M*Modal of Pittsburgh to transcribe dictations by learning speech patterns from every user, assembling words and entire sentences via technology M*Modal calls "speech understanding," said Steve Deaton, vice president of sales.

"Once a sentence is completed, the system reviews the entire sentence overall for medical and grammatical relevance," Deaton said. "If a word does not relate medically to the sentence, a secondary word is placed and the sentence is reviewed again for medical and grammatical accuracy."

David Talton, marketing manager for reporting/radiology PACS solutions at Agfa HealthCare of Greenville, SC, defended current speech recognition software products for how they handle dialect and language challenges, attributing some of the so-called system failures to fundamental habits.

"We have found that microphone placement, pacing, and consistency of enunciation are frequently more important than dialect in an individual user's success," Talton said. "And we have offered an Indo-Asian enrollment option in our most recent release, which has provided good success to users with this background."

"With that said, there are always individual users who, for whatever reason, do not experience good success with speech recognition," he continued. "So for those users we offer options of straight digital dictation, or so-called 'correctionist' workflow. With correctionist workflow, the recognized audio and text is sent to a transcriptionist for correction and then back to the user for sign-off. So whether users have inherent difficulties because of an accent, or because they happen to have a cold, they can choose to send reports for correction -- even on a report-by-report basis."

At Siemens Healthcare of Malvern, PA, Rik Primo, director of marketing and strategic relationships in the company's Image and Knowledge Management Division, acknowledged that speech recognition technology is becoming more forgiving for pronunciation issues by using sophisticated spectral sound analysis technologies, such as fast Fourier, which uses lookup tables and elaborate statistics to recognize similar properties between different words or between the same words that are pronounced in a different fashion.

But he cautioned that such capabilities require hardware and software with more horsepower, in that "faster and more high-performance [central processing units (CPUs)] are critical for running these advanced algorithms in real-time and with high sampling rates."

Marcel Wassink, vice president of Nuance's SpeechMagic Solutions business (formerly Philips Speech Recognition Systems) in Vienna, Austria, dismissed accents as an issue, particularly with the SpeechMagic product, because "it builds an individual voice profile for every author and learns from corrections made in the recognized text."

"The only variable is the time it takes for the system to achieve individual recognition accuracy -- users with a very strong accent or very unclear pronunciation might need a bit more patience," he added.

Horizon scanning

So what's next in the development of speech recognition applications?

For Nuance, it's bridging the gap between present and future iterations with those of the past, according to Durlach. That includes linking "front-end" speech recognition with traditional digital dictation systems and workflow.

"Dragon Medical 10 will allow a clinician, for example, to dictate a note and send his or her Dragon-dictated draft and voice files to a medical transcriptionist for checking," Durlach said. "We've also built a similar workflow into our Enterprise Express digital dictation solution, whereby Dragon Medical 10 delivers a 'draft' directly to medical transcriptionists without the text being seen by the physician. The doctors dictate in Enterprise Express just as they normally do into any wall phone they've always used -- we've just inserted speech recognition into the process to make the medical transcriptionists more effective."

Nuance's acquisition of Needham, MA-based eScription earlier this year enabled the company to offer an application service provider (ASP)-hosted version of this capability.

Structured reports that can store data in the database in sections rather than in a cumulative report will drive future refinements, Viztek's Deaton predicted. "The next step is to touch other components in a facility in an automated fashion," he added.

Agfa also supports structured reporting for radiology based on the company's success with the applications in cardiology, according to Talton. "Of course, radiology offers a much more challenging situation, with many radiologists producing upward of 150 reports per day, so new developments need to improve rather than reduce a radiologist's productivity," he said.

Wassink from Nuance concurred, noting that such integration can help reduce medical errors. "Here's an example: Acetylsalicylic acid is a synonym for aspirin. A SpeechMagic-enabled system should recognize both terms as the same medication and -- in a step further -- trigger an alert if prescribed to a patient with gastrointestinal ulcers," he said. "In this case, the system should ideally suggest stomach-friendly alternatives with an antiplatelet or anti-inflammatory effect." But he admitted that "there is still a long way to go."

In fact, Agfa has developed Impax Veriphy, a PACS-based critical test results management (CTRM) system in conjunction with Nuance, Talton said.

"Integrating the CTRM into the PACS rather than into the reporting solution creates enterprise-wide access to radiology-based critical test results," Talton noted. Furthermore, the company's introduction of its Impax ConnectED application is designed to improve communication between the radiology and emergency departments with bidirectional communication of critical test results between the two departments.

Siemens' Primo pointed to a plethora of developments that he believes will redefine speech recognition in the near future and virtually eliminate manual keyboard entry, including structured reporting, specialized vocabularies for various applications, and automatic transfer of biomeasurements and numerical values from clinical systems, combined with data extraction from nonstructured data, such as paper documents. He also indicated that medical forms and medical documents would be "perfect applications for advanced speech recognition technology."

Yet all of this will be predicated on further improvements in CPU performance to run these more elaborate and increasingly sophisticated speech recognition algorithms on commercial, off-the-shelf, nonproprietary hardware, he added, and further improve ease-of-use, reliability, and tolerance for pronunciation issues.

By Rick Dana Barlow
AuntMinnie.com contributing writer
December 17, 2008

ED report turnaround times drop with speech recognition, November 19, 2008

Study: speech recognition boosts error rates in radiology reports, September 25, 2008

U.S. military to adopt voice recognition technology worldwide, July 15, 2008

Speech recognition shrinks report turnaround time, May 29, 2008