Microsoft's speech recognition engine is as good as a person

The technology that powers Cortana has now achieved 'human parity'

Speech recognition

Microsoft's speech recognition engine is now as good at recognising speech as a human, the company has claimed.

The tech giant has been working on its speech transcription systems, which are used in products like its digital assistant Cortana, with a goal of achieving 'human parity' the point at which an AI can interpret speech with same error rate as a real person.

"Today, I'm excited to announce that our research team reached that 5.1% error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved last year," wrote Microsoft technical fellow Xuedong Huang in a blog post.

"Reaching human parity with an accuracy on par with humans has been a research goal for the last 25 years. Microsoft's willingness to invest in long-term research is now paying dividends for our customers in products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services. It's deeply gratifying to our research teams to see our work used by millions of people each day."

The company had previously measured a human error rate of 5.9%, but subsequent investigations by rival researchers at IBM measured a higher human error rate of 5.1%, achieving a rate of 5.5% with its own system.

Both systems are benchmarked against the Switchboard corpus, a dataset of recorded telephone conversations that speech research technologists have been using for over two decades to measure the capability of transcription systems.

Huang said that Microsoft managed to reach this milestone, which represents a 12% increase over the system's performance last year, by modifying the neural net-based language and acoustic models it uses, as well as by using the entire history of a conversation to allow the system to predict what the next word is likely to be by using context.

The next focus for Microsoft's speech recognition research will be to improve the system's ability to recognise accented speech, dialects and conversations in noisy environments. The company will also work on improving its ability to understand the meaning and intent behind speech, saying: "Moving from recognizing to understanding speech is the next major frontier for speech technology."

Featured Resources

Digital document processes in 2020: A spotlight on Western Europe

The shift from best practice to business necessity

Download now

Four security considerations for cloud migration

The good, the bad, and the ugly of cloud computing

Download now

VR leads the way in manufacturing

How VR is digitally transforming our world

Download now

Deeper than digital

Top-performing modern enterprises show why more perfect software is fundamental to success

Download now

Recommended

Microsoft and SpaceX collaborate to take the cloud into space
Network & Internet

Microsoft and SpaceX collaborate to take the cloud into space

20 Oct 2020
Microsoft is forcibly installing PWAs on Windows 10
Microsoft Windows

Microsoft is forcibly installing PWAs on Windows 10

15 Oct 2020
Zapier integration lets you connect 2,000+ apps to Microsoft Teams
communications

Zapier integration lets you connect 2,000+ apps to Microsoft Teams

1 Oct 2020
Microsoft unveils Azure for Operators to unlock 5G potential
5G

Microsoft unveils Azure for Operators to unlock 5G potential

30 Sep 2020

Most Popular

The top 12 password-cracking techniques used by hackers
Security

The top 12 password-cracking techniques used by hackers

5 Oct 2020
IT services giant Sopra Steria falls victim to Ryuk ransomware
Security

IT services giant Sopra Steria falls victim to Ryuk ransomware

23 Oct 2020
How to wipe a laptop easily and securely
Security

How to wipe a laptop easily and securely

5 Oct 2020