Amazon Lex lets users build their own chatbots
Amazon announces image recognition, voice recognition, and text-to-speech
Amazon Web Services (AWS) is sharing the smarts behind its voice assistant Alexa so customers can bake the AI into their own applications.
First launched with the Echo speaker just last month, Alexa is another example of Amazon and AWS sharing each other's technology (last year's data transfer device, the Snowball, uses Kindles as e-ink shipping labels), and puts the technology the tech giant uses itself into the hands of customers.
Speaking at re:Invent 2016 today, AWS CEO Andy Jassy told delegates: "We have always tried to take whatever we can use ourselves and make it exposable for you."
The latest example of this is the forthcoming release of Amazon Lex, the automatic speech recognition (ASR) technology that underpins Alexa.
Normally developers would need huge amounts of text and audio clips to train a piece of software on before it became adept at understanding human intent, as well as building and adjusting deep learning algorithms, but AWS has already done all of this for them, and they can simply access it via an API.
"[Lex] will allow you to build all kinds of conversational applications," said Jassy. "You'll submit a piece of text or a piece of audio, you'll specify a response and it will return that response."
Those apps could be anything from booking a flight to checking the weather, with users simply tasked with entering a few sample phrases into the AWS Management Console for how the command or question might be posed, as well as a few parameters that define Lex's response, such as 'when do you want to fly?' and 'where do you want to go?'.
"You can build multi-step conversations [based on] ASR algorithms and models. We have a lot of data we're constantly going through that we'll continue to refine to make the models even better," said Jassy.
Lex already connects with Salesforce, Microsoft Dynamics and QuickBooks, to name a handful of enterprise products, as well as boasting integrations with Facebook Messenger, Slack and Twilio.
The tool also scales automatically with the number of users it has, and developers only pay for the number of calls made to the Lex API.
Customers can sign up for the Amazon Lex preview starting today, but it is not known when Lex will be generally available.
The second of three AI announcements, Polly lets AWS customers add "natural-sounding speech" to applications like newsreading services, or to create brand new speech-enabled apps.
Using the SDK or the AWS Management Console to send text to Polly via AWS Lambda. Polly then turns that into an MP3 audio file that it streams back to Lambda that can be played or stored.
With 47 voices and 24 languages to choose from, developers pay-per-character every time they convert text, but can store Polly's responses offline or in an S3 bucket to use as much as they please.
Potential use cases for Polly include adding speech capabilities to IoT devices, automating call centres, or building language learning tools.
Amazon Polly is available today in the US and EU, with further expansion over the coming months.
Picture credit: Amazon Web Services
Lastly, Rekognition is AWS's own version of Facebook and Google's image recognition software, and analyses photos to understand what's in them.
Jassy said: "It allows our customers to build advanced applications with capabilities like; I want to do a search for women driving a car. We do facial recognition so it allows you to tell what's happening with the face - what's the sentiment, is that person smiling or frowning, are there glasses that the person is wearing?"
Having trained it on "millions and millions of images", AWS adds a 'confidence score' next to the tool's labels, to show how likely it is that it's correct.
"Because we have so much data and we'll have so many customers using this service, to continually fine tune these models and because they live in the cloud you get to enjoy those improvements for free," said Jassy.
One use case AWS cited was face matching and facial recognition, with potential uses for the security industry, such as using this method to unlock personal devices.
Using the services together
While the three services can work separately, AWS made a point of illustrating two of them at work together.
Matt Wood, GM of product strategy, said on stage: "Although individual services like Polly and Lex can be used independently, putting them together allows you to build some pretty novel, sophisticated, category-defining applications."
Taking that book a flight example again, Wood this time suggested making a text enquiry through Lex, rather than a speech enquiry, using Slack or Facebook Messenger.
"You can say [you want to go] 'somewhere with forests and lakes'. Maybe a city with some great architecture," he said.
Instead of speaking to you about possible holiday destinations, Rekognition throws up any photos that match your search criteria, just like a Google image search does.
Wood added: "We can take an individual image from the billions that are created every single day, pass that image into Rekognition, and have Rekognition tell us what's inside the photo. We can say there's a forest and a lake.
"You can do this in real time with Rekognition or you can apply Rekognition to all the images, millions of them stored inside buckets inside S3, and they'll be processed in batch."
Amazon Rekognition is available in the US and EU now, with more regions due to launch soon.
Playing catch up in AI or leading the way?
AWS acknowledged it hasn't shouted about its AI services before now, but denied falling behind to competitors like Microsoft Azure or Google DeepMindin the race to develop AI services.
Jassy said thousands of developers at AWS are constantly working on AI, and the company told IT Pro it pays "trainers" to constantly improve Rekognition, honing and refining it on a constant diet of images.
"If you actually think about it, AWS has a very deep heritage in this space," Jassy argued. "One of the earliest features on the internet that people started using was '[Amazon] customers who bought this item might also like these items', that was machine learning-driven."
He said what has changed is customers asking to use this technology themselves, which is why AWS is opening it all up to them.
Dan Scholnick, general partner at early stage investment fund Trinity Ventures, told IT Pro he thinks AWS is in fact taking a different tack from its rivals.
"AWS is changing the nature of the competition," he said. "Both Microsoft and Google offer low level AI platforms. They are very powerful but require a lot of expertise to use. With its announcements today, AWS is offering pre-built AI solutions like image recognition and speech recognition. You could build those things on Google's TensorFlow platform, but it's a lot of work. Amazon seems to be taking a different approach."
Picture credits: Joe Curtis (unless stated)