Speech recognition users need patience, training to achieve optimal results

Jun 14, 2001

By Joseph Marion
Superior Consultant Company

Automated speech recognition has been touted as a means of cutting costs and improving radiology report turnaround times. At first glance, radiology seems particularly well-suited to speech recognition applications due to its complex, structured reporting nature and high volume of reports.

Yet despite the promise and growing number of successful speech recognition installations, implementation requires preparation and a realistic attitude regarding the capabilities of the technology. Based on five years of experience directly implementing speech recognition systems in numerous radiology departments, here is my Top 10 list of lessons learned.

1. Don’t underestimate the importance of training

The most important factor for successfully applying speech recognition is the quality of training. I've often been called to troubleshoot existing installations in which inadequate training was a significant factor in all usage problems.

Training should be viewed as an ongoing commitment to the success of the application. As the staff gains experience, additional questions arise. Follow-up training also helps eliminate any bad habits in users that may result from experimentation or an incomplete knowledge of the system. Initial training is best accomplished in multiple phases, for example once during clinical start-up, followed by another session a few weeks later.

Annual, or better yet, semi-annual refresher training should be factored into operating expenses to ensure that users remain current and new staff is adequately trained. Vendors usually offer software updates at least annually, and these, too, can be addressed by periodic training.

Training radiology staff, particularly radiologists, requires specific skills. Radiologists usually operate under time and workload constraints and quickly become creatures of habit. Teaching them a new way of doing their jobs in a way that becomes second nature requires patience on the part of both the trainer and the radiologists.

Radiologists have varying degrees of computer skills; therefore, training must be individualized to ensure a working knowledge. And many users may have heavy accents that require extra attention during the enrollment process, as the application may have difficulty interpreting word pronunciations.

Third-party trainers can sometimes be more effective than internal staff because they are less threatening. However, internally trained support staff are crucial to address ongoing training, support, and other issues.

2. Understand the technology

Achieving success with speech recognition applications requires a thorough knowledge of the technology and how to apply it. Because the technology is relatively new, prospective users frequently rely on vendors for an understanding of its requirements and implementation approaches.

If you don’t have internal knowledge of speech recognition technology, consider using a consultant who is familiar with it and can provide you with an objective, third-party opinion of your requirements and vendor proposals. As with any acquisition of this magnitude, be sure to do your homework. Make sure you have a clear understanding of where to apply speech recognition, how it will be interfaced to your existing systems, the expected impact on resources, and any prerequisites, such as defining macro language prior to implementation. Doing so will avoid surprises later.

3. Validate interface requirements up front

A successful speech recognition implementation in radiology requires some form of interface to other systems for patient demographics. Staff productivity would be adversely affected by duplicating entry of patient demographics for use in a speech recognition application.

Speech recognition applications are usually interfaced to the RIS or an orders application that contains patient demographics. This allows for uploading the signed report for easy and timely access. Vendors typically have a lot of experience with such interfaces, and have probably interfaced your RIS or orders application if they come from well-known companies.

However, lesser known firms and homegrown systems represent a challenge to speech recognition providers. Be sure to explore interface requirements thoroughly with prospective vendors before committing to a speech recognition system. Either secure reference sites with similar RIS or orders systems implementations, or get a firm commitment (with performance penalties) from the vendor if you have these types of systems.

4. Get the radiologists' support

While radiologists are the primary users, they may not perceive themselves as the primary beneficiaries. The radiology group must be an integral part of the planning process for speech recognition implementation.

Setting the radiologists’ expectations and agreeing on workflow protocols up front will save considerable start-up problems. Bear in mind that implementation problems may significantly affect users' impression of the system -- and negatively affect their willingness to use or expand the system.

5. Have realistic cost-savings expectations

While speech recognition applications would ideally produce quick paybacks, it is often difficult to achieve projected savings. In many facilities, it is not realistic to expect transcription staff to be terminated. Rather, it is likelier that the number of employees will be reduced through attrition and reassignment, particularly into support positions for PACS, RIS, and speech recognition.

Part of the staff justification also depends on the transcription skills available. For example, an experienced transcription staff might focus on transcribing more complex reports, while speech recognition is used for simpler reports. Conversely, an inexperienced transcription staff may take care of simpler dictation, while speech recognition is used for the more complex reports.

6. Include multiple workflow options

The greatest benefits of speech recognition are achieved when radiologists agree to do their own editing. Most systems accommodate multiple approaches to workflow, including allowing radiologists to edit reports and allowing for batch editing following completion of dictation.

Because some radiologists won't want to edit their own reports initially, a transcription "editor" could be employed to edit the reports transcribed with speech recognition. To achieve maximum acceptance, it is advisable to consider all workflow options and accommodate different staff requirements until there is sufficient experience to consider a change.

7. Plan for obsolescence

Speech recognition technology is rapidly changing. In the past two years, vendors have incorporated significant improvements in recognition accuracy and user enrollment processes.

Failure to recognize the evolving nature of the technology could have an adverse financial impact, however. One facility that we worked with failed to take advantage of a vendor’s upgrade that was a fraction of the cost of new licenses. Later, the facility faced a ten-fold increase in upgrade costs. Any cost justification of speech recognition should include the vendor’s software maintenance contract or an estimate for periodic upgrades.

8. Thoroughly define integration priorities

Early speech recognition software usually consisted of stand-alone applications and required another computer device at dictation stations. In an environment where space is usually at a premium, this often affected where the technology could be applied. In addition, most speech recognition applications are Windows-based, and they conflicted with early, Unix-based PACS workstations.

The migration of PACS workstations to the Windows operating system has accelerated the integration of PACS, radiology information systems (RIS), and speech recognition applications to a single platform. Careful attention should be given to system implementation priorities and integration goals to avoid redundant hardware expenditures. PACS and RIS integration should be factored into speech recognition system selection.

9. Users need to adapt to the technology

Users often expect the technology to adapt to their style of dictation. While speech recognition has advanced rapidly, it has not come far enough to be quite this selective. Computers follow very precise rules, and unlike humans, are very rigid in their interpretation. Consequently, users must be willing to modify their dictation habits to the computer to achieve optimal results.

For example, a radiologist who currently dictates without punctuation would need to learn to include punctuation, as speech recognition applications require punctuation to properly interpret the spoken word. Similarly, despite promising advances in filtering, the dictator who pauses with "ahs" and "ums" will be forced to edit extraneous words unless he/she changes his/her dictation habits.

10. Allow adequate time for results

It typically takes users at least a week or two of continuous system use to achieve the expected accuracy. Speech recognition applications rely on both the sound of the pronounced word, and the context in which it is used. Most applications continuously learn from previous dictation, and as the system improves its accuracy with every new dictated report, users will notice an improvement over time.

Overall, speech recognition applications are effective tools for improving the timeliness, quality, and cost of dictation/transcription processes. By learning from others' experiences, prospective implementers can increase their likelihood of success. Experience suggests that users should:

Thoroughly understand the technology before starting, obtaining knowledge either internally or through consultative services.

Carefully plan an installation by defining requirements and expectations, and by securing staff commitment.

Ensure that funding includes adequate support, especially training, during start-up and throughout the life of the application.

By Joe Marion
AuntMinnie.com contributing editor
June 15, 2001
Joe Marion is an executive director with Superior Consultant Company in Southfield, MI. He has extensive background in PACS and speech applications, and is responsible for integrating PACS and speech and voice recognition applications into Superior’s e-Health initiatives.

Click here to post your comments about this story. Please include the headline of the article in your message.