Microsoft's speech recognition engine is as good as a person

The technology that powers Cortana has now achieved 'human parity'

Speech recognition

Microsoft's speech recognition engine is now as good at recognising speech as a human, the company has claimed.

The tech giant has been working on its speech transcription systems, which are used in products like its digital assistant Cortana, with a goal of achieving 'human parity' the point at which an AI can interpret speech with same error rate as a real person.

Advertisement - Article continues below

"Today, I'm excited to announce that our research team reached that 5.1% error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved last year," wrote Microsoft technical fellow Xuedong Huang in a blog post.

"Reaching human parity with an accuracy on par with humans has been a research goal for the last 25 years. Microsoft's willingness to invest in long-term research is now paying dividends for our customers in products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services. It's deeply gratifying to our research teams to see our work used by millions of people each day."

The company had previously measured a human error rate of 5.9%, but subsequent investigations by rival researchers at IBM measured a higher human error rate of 5.1%, achieving a rate of 5.5% with its own system.

Advertisement
Advertisement - Article continues below

Both systems are benchmarked against the Switchboard corpus, a dataset of recorded telephone conversations that speech research technologists have been using for over two decades to measure the capability of transcription systems.

Advertisement - Article continues below

Huang said that Microsoft managed to reach this milestone, which represents a 12% increase over the system's performance last year, by modifying the neural net-based language and acoustic models it uses, as well as by using the entire history of a conversation to allow the system to predict what the next word is likely to be by using context.

The next focus for Microsoft's speech recognition research will be to improve the system's ability to recognise accented speech, dialects and conversations in noisy environments. The company will also work on improving its ability to understand the meaning and intent behind speech, saying: "Moving from recognizing to understanding speech is the next major frontier for speech technology."

Advertisement

Recommended

Visit/web-browsers/24526/what-is-microsoft-edge
web browser

What is Microsoft Edge? Everything you need to know

2 Apr 2020
Visit/cloud/software-as-a-service-saas/355135/slack-to-work-with-microsoft-on-teams-integration
software as a service (SaaS)

Slack launches call integrations for Microsoft Teams and Zoom

1 Apr 2020
Visit/business/business-strategy/355189/microsoft-shifts-major-events-online-until-june-2021
Business strategy

Microsoft makes all major events online-only until June 2021

1 Apr 2020
Visit/operating-systems/26581/how-to-get-help-in-windows-10
operating systems

How to get help in Windows 10

30 Mar 2020

Most Popular

Visit/security/cyber-security/355200/spacex-bans-the-use-of-zoom
cyber security

Elon Musk's SpaceX bans Zoom over security fears

2 Apr 2020
Visit/development/application-programming-interface-api/355192/apple-buys-dark-sky-weather-app-and-leaves
application programming interface (API)

Apple buys Dark Sky weather app and leaves Android users in the cold

1 Apr 2020
Visit/data-insights/data-management/355170/oracle-cloud-courses-are-free-during-coronavirus-lockdown
data management

Oracle cloud courses are free during coronavirus lockdown

31 Mar 2020