Amazon Echo is one of the best smart home gadgets around, and it's about to get a whole lot better. With necessary tweaks to Amazon's Alexa digital voice assistant, users will be able to interact with the AI like humans.
Amazon announced on Thursday that it will be adding a series of new speaking skills to its virtual assistant, allowing her to do a lot of human-like imitation while speaking. According to the company's official developer blog, Amazon Alexa will be programmed to whisper, take a breath to pause for emphasis, adjust the rate, pitch and volume of speech, and also bleep out words you don't want your kids to hear.
These new language features are called Speech Synthesis Markup Language, or SSML tags, which are currently limited to the U.S., U.K., and Germany.
"Speech Synthesis Markup Language, or SSML, is a standardized markup language that allows developers to control pronunciation, intonation, timing, and emotion. SSML support on Alexa allows you to control how Alexa generates speech from your skill's text responses," Liz Myers, Sr. Business Development Manager, Alexa at Amazon, explained.
"You can add pauses, change pronunciation, spell out a word, add short audio snippets, and insert speechcons (special words and phrases) into your skill. These SSML features provide a more natural voice experience," she added.
It is now up to Skill developers to take advantage of these new SSML tags and put them into use. This might take some time, so it's best not to hold your breath on this one. But it is just as exciting and when these new features actually rollout to users, it will be an interesting addition in the space of virtual assistants where we have players like Siri, Google Now, and now Bixby.
The five main SSML tags available for Skill developers to leverage are:
Whispers: Makes Alexa speak softly.
Expletive beeps: Bleeping out profanities and words you don't want your kids to hear.
Sub: Explains phrases into abbreviated text.
Emphasis: Change the rate and volume of Alexa's speech
Prosody: Controls volume, pitch and rate of speech.