I attended few sessions by Amazon about its Alexa devices, during last few months. Voice represents next major disruption in computing. These devices provides VUI (Voice User Interface). We had hands-on sessions and interactive QAs. Let me cover some of the relevant URLs and overview in this blog post for readers of "Express YourSelf !"


Here is a quick comparison of all major devices:

Echo Dot Echo Dot
Kids Edition
Echo Echo Plus Echo Spot Echo Show
Price Rs. 4000 N/A Rs. 10,000 Rs. 15000 Rs. 13000 N/A
$50 $80 $100 $150 $130 $230
Rs. 3341 Rs. 5346 Rs 6682 Rs. 10023 Rs.8687 Rs. 15369
Microphones 7 7 7 7 4 8
Misc. Smart hub 2.5" screen 7" screen

Apart from the regular devices from Amazon, few smart cars and smart TVs also have built-in Alexa support. Amazon also launched "Alexa 7-Mic Far-Field Dev Kit" that hardware can be part of any product. One can add display support also like Echo Spot and Echo Show, however it needs to go through rigorous certification process from Amazon.

Comparison of Mobile App with Alexa Skill

Mobile App ~ Alexa Skill
Mobile App icon ~ Invocation Name

Many mobile apps have Alexa skill e.g ola, goibibo, crickinfo, zomato etc. have alexa skill

How it works

Alexa software has mainly two major components

1. ASK (Alexa skills kit) to build new skill

2. AVS (Alexa voice service) to integrate with RPi kind of device. 

The hardware is quite simple with microphone array and speaker. The microphone array used for noise cancellation. The spoken sentence is divided into :

1. Wakeup word
2. launch
3. Invocation name. It should be two words. 
3. Utterance


step 1. Wake up word can be = Alexa / Computer / Echo / Amazon

This will wakeup the device. It triggered beam forming to listen. 

step 2. The utterance (captured audio) goes to cloud

step 3. At cloud real magic happens with

3.1 speech processing 
3.2 NLP

step 4. The invocation name  is detected. With invocation name, the execution flow goes to specific skill. Now skill has all the logic, algorithm to further understand the utterance, to access cloud service, database etc and finally for the response

Here the front-end is developed and tested with simulator using

step 5. As per training model, Alexa translate the utterance to Intent. The developer need to create custom intent, that mapped to function implementation to provide response. Alexa also provide standard built-in Intent, that developer can implement
There is a set of built-in intent libraries for various use cases

In Alexa terms "slot" is like argument to function. Alexa has built-in slot types :

There is many to one mapping between utterances and intent. There is one to one mapping between intent and function

There is many to one mapping between utterances and custom slot. There is one to one mapping between custom slot value and argument value to function. 
So one can pronounce "A.C" or "Air Conditioner" still it maps to same enumarated value as argument to function. Such synonymous are detected using "Entity Resolution" 

The back-end function can be implemented at any HTTPS terminated end-point or AWS lambda service. The AWS Lambda service, at present, is available only for regions: 
1. US east North vergina
2. EU (Ireland)

The professional skill can use session attribute for better user experience and also for data analytics. 

step 6. The response can be 
6.1 Speech : SSML, Local lingo, TTS, audio stream, small mp3 files
6.2 Cards = title, subtitle (skill name), text (content), image. 
Cards are optional. We can use rich text with different font including Unicode at card. It is built using various BodyTemplate and ListTemplate. 

The speech output goes to speaker. The card output goes to 
1. Alexa Companion App
2. Echo Spot and
3. Echo Show

One can check device capability for including card/video in response. 

The Alexa skill can be built using pre-built models

1. Custom: For unique need
2. Flash briefing : For RSS feed
3. Smart Home : For home automation
4. Video : For video application

Questions - Answers

Let me highlights few leanings about Alexa Echo eco-system and the devices

* The Alexa companion app can be connected to only one device. So it is not possible to push same image/content/card to all companion app running on mobile using single Alexa device
* Amazon allows to use same invocation word for multiple skill developed by same/different people. All such skill can be configured for given device. However the skill that is configured last, it will be invoked for the duplicate invocation word. 
* It is possible to enable/disable specific app on the device using mobile app
* It may possible to develop smart home device using Raspberry Pi for single user, with skill that is not published. One can use Smart Home pre-built model. Let the Intent invoke code running at Raspberry Pi, that turn on/off home appliances using GPIO pin and relay. 
* None of the Alexa devices has built-in battery. 
* "Alexa for Business" can have features like allowing access to very specific limited set of skills only. 
* Alexa does not have any adult content, so parental control is not needed. 
* One can change wakup word and replace "alexa". However still it will be female voice only. The Alexa devices do not support response in male voice. 
* Alexa device cannot be used for dictation or speech to text conversion. One can use AWS transcribe service for the same.  
* One can develop (1) one shot dialogue (2) multi-turn dialogue skills
* To design multi-turn dialogue skills, one can use (1) graph UI or (2) frame UI. 
* Alexa can prompt for missing slot
* Amazon is coming up with Notification, that will be triggered by skill to Alexa device. However until the end-user ask to get notifications, the Alexa device will not start talking by itself to inform about notification. 


Now, let's have a look to important URLs : Design of Voice Experience :Join the Amazon developer community & check in for the event in India. : It has details about all meetup, hackathon, webinar, slack channel etc.  and : Online learning resources and Getting started with skill development : Alexa public sample code repository Getting Started in India

Alexa response can be further enhanced at skill using

1. SSML. SSML is Speech Synthesis Markup Language. More details:

Alexa specific SSML :

2. Speechcon


Sohan Maheshwar


Manish Panchmatia said...

Post a Comment