Skip to main content

Windows 7's speech-recognition tools

By Nate Anderson
The blog Ars Technica says Microsoft Windows 7 is "good enough" at speech-recognition technology.
The blog Ars Technica says Microsoft Windows 7 is "good enough" at speech-recognition technology.
STORY HIGHLIGHTS
  • Windows 7 has employed speech-recognition technology for some time
  • Microsoft, the maker of Windows, has not publicized this fact much
  • Ars Technica finds Windows 7 speech technology is "good enough"
  • But it's not quite as sophisticated as voice-specific applications
RELATED TOPICS

(Ars Technica) -- Microsoft has pumped out voice recognition software for years, but the company has a curious aversion to publicizing the fact. With Windows 7, Microsoft's speech recognition has become a decent productivity tool and one that the company should be proud to proclaim as an OS feature. For the casual speech recognition user, nothing beats free -- especially when one considers the $100+ price points for third-party software.

But is it powerful enough for serious users? One long-running criticism of Microsoft's bundled Windows software is that is strives only to be "good enough" without ever achieving excellence. Ars Technica's Editor-in-chief Ken Fisher and I put Win 7's built in recognition engine to the test for a couple of months to find out how well it serves the needs of the hardcore word jockey. We'll spare you the suspense: serious users will want to look elsewhere, but this is a great way to show any colleague with a Win 7 machine that speech recognition is real, it's here, and it works.

Navigation

Microsoft rolled out a speech recognition engine in Office XP; after installing the suite, users who opted for the speech recognition engine could dictate into Word and other apps.

It wasn't until Windows Vista, though, that speech recognition was baked right into the operating system, and was done so in a competent way. Back in 2007, the New York Times' David Pogue wrote, "I don't find it quite as accurate as my beloved Dragon NaturallySpeaking 9, which is freakishly, 'Star Trek'-ishly accurate. But it's awfully cool ... Speech Recognition is an unsung bright spot in Windows Vista."

With Win 7, Microsoft's speech recognition has come into its own. Starting the program is simple -- the "Speech Recognition" control panel applet allows you to set your microphone and toggle the recognition engine on. It couldn't be simpler, and there's nothing to install. In moments, you'll be dictating ... right into a tutorial.

An attractive but severe-looking young woman will guide you through the initial tutorial, which introduces all the basic commands and provides plenty of practice in using basic tools like the corrections features. As tutorials go, this one is excellent, and there's a big reveal partway through -- the tutorial isn't just teaching you, it's adapting to your voice as you work through each section.

When complete, it's time to control Windows using only the sheer power of your voice. Navigation and OS control are the best features of the built-in recognition engine, and they worked almost flawlessly. "Start Word" worked. Bam. Window open. "Switch to Explorer." Bam. I'm in Explorer. "Double-click Odd Donkey Facts." Bam. "Odd Donkey Facts" folder opens.

You can say just about any scrap of text visible on the screen, from menus to filenames to dialog box options, and the software correctly clicks, selects, or opens. Opening, switching, and controlling programs was simple, easy enough to figure out without even glancing through the printable speech recognition cheat sheet. And when you don't know what to say or there's nothing in particular to say -- like when trying to click some icon in Word's ribbon interface -- there's still no need to resort to the mouse.

Instead, a simple "Show Numbers" command will overlay the current window with a host of blue rectangles, each placed above a clickable object and each containing a number. Once the rectangles are displayed, say the number and the computer clicks for you.

You can even navigate things like Explorer this way, saying the names of folders and telling the system to "doubleclick World_Domination_Plan."

Even better, the floating voice recognition widget that runs by default when speech recognition is active will even tell you how to do the same thing using an actual voice command. For instance, use the "Show Numbers" command to click the Back arrow in Internet Explorer and Windows helpfully informs you that saying "Back" achieves the same effect. It's a terrific system, and one that's been present in rival programs like Dragon NaturallySpeaking for a few versions now -- but it's perfected here.

There are limits to navigation and control, and you'll see them most in third-party apps like Chrome. In the screenshot below, you can see the difference between Chrome and IE when the "Show Numbers" command is used -- the control widgets are still detected, but the actual page text is not. Unlike IE, Chrome Web pages can't be browsed by voice.

Recognition

The same simplicity attends corrections -- making them is simple and natural. Selecting words is a matter of saying them, and the correction box brings up suggested alternatives and allows users to spell their own if unavailable.

Which is great, because the built-in correction tools get a pretty decent workout, and herein lies the main issue with Win 7's speech recognition: it's just not as good as alternatives from companies like Nuance, which makes Dragon NaturallySpeaking.

That's not to say it's bad, but several years of experience with voice recognition have convinced me that it needs to be superb before it will be widely used. Without confidence in the tool, users instinctively reach for the mouse and keyboard whenever they're about to do something that might easily be misinterpreted. Short notes, especially if they will require corrections, just don't seem worth donning the headset for.

With versions 9 and 10, Dragon NaturallySpeaking earned this confidence. You could teach the program custom phrases like "Ars Technica" and have reasonable certainty that they would not be mangled into "Mars technical" while dictating. You could speak at tremendous speed. Even short words, the hardest for speech recognition to parse (long and difficult words are so unique that they tend to be easy), were untangled correctly, and this confidence led to more use.

Win 7's system comes close. Soon after running through the tutorial, I dictated a passage from Barbara Kingsolver's Animal, Vegetable, Mineral to gauge accuracy. I made an effort to speak clearly and picked a passage without proper names or other words likely to trip up a non-customized recognition engine. Here's what I got, with errors highlighted in italics:

"Our culture is not unacquainted with the idea of food as a spiritually loaded commodity. We're just to killer about which spiritual arguments will accept as valid for declining certain foods. Generally unacceptable reasons: environmental destruction, energy waste, the poisoning of workers. Acceptable: it's prohibited by it all the text. Send out a platter of country ham in front of our rabbi, and you ma'am, and a Buddhist monk, and you may have just conjured three different visions of damnation. Guests with high blood pressure may add 1/4. Is it such a stretch, then, to make moral choices about food based on a global consequences of its production and transport? In a country where 5% of the world's population clogs down 1/4 all the fuel, also belching out that much of the world's waste and pollution, we've apparently made big choices about consumption. They could be up for review."

Not bad -- and probably as many as I'd make just typing out the passage. But it's not stellar, either, something that was borne out by longer use. For someone who plans to use voice recognition with regularity, investment in a third-party program would be preferable -- especially since Win 7 includes very limited tools for customizing its dictionary and none at all for setting up things like voice macros.

Ken Fisher used the software for weeks, relying on it to do much of his work. Initially it seemed like a good tool, but it was never quite accurate or customizable enough, and it seemed to be a slow learner after corrections were made. In the end, he switched to NaturallySpeaking.

Good enough

Win 7 has plenty going for it an as OS, and a competent voice recognition engine only adds to its appeal. This is "good enough" recognition that one can actually use to dictate letters, reports, and e-mails, and the superb navigation and control tools make it simple to cut down on mouse work.

We were especially impressed with how natural the software's commands could be -- at no point did we have to consult reference cards, except to capitalize words. If you want to select the previous paragraph, just say "select previous paragraph" -- it works.

But the core of any speech recognition engine is the speech recognition, and Win 7's implementation looks like it needs one more release to really knock it out of the park. For the patient cheapskate or the casual user, Win 7's built-in tools will be fine; everyone who works with words more regularly would benefit from doling out some additional cash.

COPYRIGHT 2011 ARSTECHNICA.COM