Microsoft's speech recognition engine is as good as a person
The technology that powers Cortana has now achieved 'human parity'
Microsoft's speech recognition engine is now as good at recognising speech as a human, the company has claimed.
The tech giant has been working on its speech transcription systems, which are used in products like its digital assistant Cortana, with a goal of achieving 'human parity' the point at which an AI can interpret speech with same error rate as a real person.
"Today, I'm excited to announce that our research team reached that 5.1% error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved last year," wrote Microsoft technical fellow Xuedong Huang in a blog post.
"Reaching human parity with an accuracy on par with humans has been a research goal for the last 25 years. Microsoft's willingness to invest in long-term research is now paying dividends for our customers in products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services. It's deeply gratifying to our research teams to see our work used by millions of people each day."
The company had previously measured a human error rate of 5.9%, but subsequent investigations by rival researchers at IBM measured a higher human error rate of 5.1%, achieving a rate of 5.5% with its own system.
Both systems are benchmarked against the Switchboard corpus, a dataset of recorded telephone conversations that speech research technologists have been using for over two decades to measure the capability of transcription systems.
Huang said that Microsoft managed to reach this milestone, which represents a 12% increase over the system's performance last year, by modifying the neural net-based language and acoustic models it uses, as well as by using the entire history of a conversation to allow the system to predict what the next word is likely to be by using context.
The next focus for Microsoft's speech recognition research will be to improve the system's ability to recognise accented speech, dialects and conversations in noisy environments. The company will also work on improving its ability to understand the meaning and intent behind speech, saying: "Moving from recognizing to understanding speech is the next major frontier for speech technology."