Business

Speech impediment: Technology getting slow start

■ The market for speech-recognition software is growing, but not as much as expected. Here are some theories on what's keeping physicians from using it, and what has to happen before they do.

By Tyler Chin — Posted June 14, 2004

WITH THIS STORY:
» Speech to text
» Maximum performance
» Related content

Here's a tale explaining why speech-recognition software -- which promised the ease of using your voice to enter information rather than the hassle of typing on a keyboard -- hasn't caught on like its proponents thought it would.

Internist Jeffrey Clode, MD, of Spokane, Wash., remembers a colleague at his 20-physician practice who told his computer that a female patient had come in after choking on peanuts.

Except on screen, the software didn't get the dictation quite right. Instead of "peanuts," the software entered -- well, something more anatomical.

"It was funnier than hell because our nurse practitioner found the error and was just in hysterics," Dr. Clode said. "You have to watch out [and proof carefully] when you use it."

Although speech-recognition systems have been around for 20 years, fewer than 10% of doctors today use the technology that lets users speak into a microphone and see their speech converted into text on a computer screen in real time, said Bill DeStefanis, vice president of marketing at Voicebrook, a Lake Success, N.Y., company that sells speech-recognition services to hospitals.

In 2003, the speech-recognition software market for dictation was about $300 million worldwide, including about $100 million for just the software and $200 million for value-added services such as training and integration, DeStefanis estimated.

Health care, which makes up about 60% of that market, has been growing about 12% annually for the past three years, he estimated. By comparison, electronic medical records and practice management software and services for physicians totaled $1.7 billion in 2003, according to Forrester Research Inc.

Technology-savvy physicians and some industry experts identified three key impediments that must be overcome before doctors start widely using speech-recognition software, which, with a specialty medical dictionary, costs between $500 and $1,000.

Hurdle: Doctors must change how they work

Potential solution: Doctors have to be willing to adapt to the technology, and vendors have to make it easier for them to use it.

There are two models of speech-recognition technology on the market. One is the transcriptionist-editor model, which hospitals are using and which doesn't require doctors to change their work flow. Instead of using conventional dictation and transcription services, in which transcriptionists transcribe every word a doctor dictates, hospitals use the transcriptionist-editor model, in which machines initially transcribe a doctor's dictation and medical transcriptionists check the reports, making corrections as necessary. This model can cut up to 40% from the cost of conventional dictation and transcription.

The other model, in which the physician is the editor, wipes out transcription costs but has been minimally adopted so far because it dramatically changes physicians' work habits.

Doctors not only have to spend considerable time learning the software but also must "adapt to the technology," said Joe Marion, an executive director with Superior Consultant Co., Southfield, Mich. "If they are not willing to do that and [think] that the computer is going to figure them out, then they will struggle," he said.

Speech-to-text software doesn't make mistakes on multisyllable medical terms but tends to misfire on two- or three-letter or single-syllable words. For example, the spoken phrase "she has had" might come across as "she is at," said David Heiman, MD, a gastroenterologist at a two-doctor group in Tampa, Fla. So he pauses slightly when he hits spots where those errors are likely to occur, to improve the odds that the system will recognize his words more clearly.

Physicians must have a certain personality to embrace speech-recognition technology, said Dr. Clode. He uses it to input data into his group's electronic medical records software system because he is a self-described lousy typist and "it would drive me insane" to use a keyboard.

Speech recognition is "not for everybody," he said. "One of my partners who tried it would yell into the microphone, 'Damn it! I said ...' Well, if you're kind of rigid like that, don't want do to proofing and are the type of individual who can't stand any errors, it's not going to work."

Hurdle: Too many mistakes

Potential solution: System has to be 99% to 99.9% accurate.

Speech recognition didn't become viable -- or practical -- until vendors introduced continuous speech-recognition technology in the late 1990s. Before that, users used discrete speech systems that forced them to speak unnaturally, inserting pauses between every word.

With continuous speech-recognition technology, doctors could speak normally, but the technology wasn't ready for prime time in terms of accuracy and speed.

When continuous speech technology was initially rolled out, vendors claimed its accuracy rate was in the 90% range, but the reality was that most people experienced low- to mid-80% accuracy, DeStefanis said. "Now in real-world situations, accuracy rates are in mid- to high 90s."

Incremental improvements in speech-recognition technology, noise-cancellation technology, and more powerful and faster computers in the past five to seven years all have led to systems that are 95% to 98% accurate. Regardless of what accent doctors speak with, the accuracy is the same because of the way the technology works.

The technology is ready to take off in health care, observers say. But adoption is occurring primarily in the hospital market, because the technology offers hospitals a way to substantially reduce transcription costs.

In the outpatient setting, however, physicians have been slower to adopt the technology.

A 95% to 98% accuracy rate isn't good enough to convince most physicians to use the technology, even though it can be cheaper than conventional dictation and transcription, said Richard F. Gibson, MD, chief medical information officer at Providence Health System in Portland, Ore. One reason hospitals have been more willing to use speech recognition is that their transcription costs are higher, and unlike physicians, they can afford to pay medical transcription companies or transcriptionists to double-check the computer-transcribed reports.

Speech technology has to reach 99% or higher to reel in doctors, said Dr. Gibson, a family and emergency physician who uses and likes the technology. "That would mean I would come across one error, whereas right now I'm correcting six to eight words out of 250. That's really very good."

A 99% accuracy rate is achievable, but 100% is not. People don't have a 100% accuracy rate in understanding each other's speech, even when they talk in person, said William Meisel, PhD, president of TMA Associates, a speech-recognition consulting firm in Tarzana, Calif.

"Speech is too ambiguous, and there're too many words that sound the same that mean different things," agreed Barry Hieb, MD, a research director at Gartner Inc., a market research firm in Stamford, Conn. "I think the question is not really anymore a question of the accuracy of the speech recognizer. The real question is: When is it good enough that it's more valuable for the physician to use it than not."

For Dr. Heiman, that point occurred in 2000 when he got fed up with paying $700 to $1,000 monthly to a transcription service, and dealing with long turnaround times.

Now, he said, "I see my [visit] note and fax it to my referring physicians the day I write it. When I first started using it, guys were calling me -- sometimes while the patient was still here in my office -- asking me how it was that I was getting the dictation done and out so quickly."

Besides eliminating his transcription cost and waiting time for reports, Dr. Heiman believes that the technology helps him win referrals and improve care because every physician involved in treating his patients can quickly know what he has done and what his treatment recommendations are. "There's so much upside to speech recognition, it's ridiculous," Dr. Heiman said.

Hurdle: Still only of limited value to doctors

Potential solution: Make it faster and easier to use; add intelligence and integrate it with billing and electronic medical records software systems.

Unless physicians dictate heavily, have to quickly provide copies of charts to other physicians, or have an EMR but can't efficiently input data into the system with a keyboard, there isn't a compelling incentive for them to use speech recognition, some say.

Based on its experience with technology, Providence Health System has found that the speech- recognition user sees two fewer patients and works 15 to 30 minutes longer per day than the one doing conventional dictation.

"The thing is, you can make up for the transcription cost by being able to see a couple more patients per day. It's not a clear win," Dr. Gibson said. Also, whether speech recognition saves doctors money depends on the volume of their dictation, their transcription costs and their facility with speech technology, he said.

But a different technology is emerging that promises to make speech-recognition systems more valuable to physicians. With natural language processing technology, users will be able to command verbally their systems to do more than just strictly convert speech into text. That capability, coupled with linking speech systems to billing and EMR systems, will enable physicians to automatically generate bills, referral letters and other forms using verbal commands, DeStefanis said.

"The real value will come when speech recognition is integrated with the EMR and gets more accurate," Dr. Gibson said. "If you could just start speaking without even wearing a headset, put your finger where you want [the system] to go so that you don't need a mouse and keyboard, and don't need to correct much, every doctor would use it. That's what they are waiting for.

"It will happen, but I'm not going to hazard a guess as to when. We've always said five years from now, and for 20 years we've been wrong."

ADDITIONAL INFORMATION

Speech to text

Speech recognition systems break down the sound of every word into phonemes -- the basic elements of pronunciation -- and digitizes them.
Next, the computer compares the sounds of your voice against phonemes that have been prerecorded in a database in an attempt to find the best match for the word using statistical and probability analysis.
The system constantly looks at the context in which the word is used by examining the word that comes before and after it, a technique known as "trigam." By using trigrams and either context analysis, statistical analysis or both, for example, the computer guesses or determines whether you said to, two or too.

Sources: IBM Pervasive Computing; Superior Consultant Co.

Maximum performance

To get the most out of speech-recognition software:

Get a computer that has a Pentium 3 or 4 processor running on at least 300 MHz. It also should have at least 512 MB of memory but preferably 1 GB or more.
Buy a high-quality digital microphone that plugs into a USB port, as opposed to an analog microphone that plugs into the PC's audio input.
Invest in a good sound card.
Go through the enrollment process, which requires you to spend up to 15 minutes reading a few paragraphs so that the system can start recognizing your voice and how you pronounce words.
Be aware the system is continuously learning your voice every time you dictate.
Make corrections verbally to get better accuracy and performance out of the system.
Enunciate clearly and avoid stuttering and saying um, ah, etc.
Slightly hesitate or change your cadence when you speak short words in succession, especially those you know the system tends to, in the nomenclature of the industry, "misrecognize."
Avoid dictating in noisy environments.
Create templates or macros whenever you dictate something that is long and boilerplate in nature.

Sources: Jeffrey Clode, MD; Richard F. Gibson, MD; David Heiman, MD; IBM Pervasive Computing; Superior Consulting Co.; TMA Associates