Dmitry Lozitskiy
3 min readAug 30, 2018

Building “Language Teacher” — Alexa skill powered by AWS Polly and AWS Translate

Building Alexa skills is a fun. It’s even more interesting once you acknowledge the vast amount of AWS services which could be easily integrated with Lambda function. My recent experiment was to build something interesting that can use best of both Alexa and AWS worlds, so I came up with this idea to use AWS Polly and AWS Translate to give Alexa accustomed VUI a new spin.

Test drive

“Language teacher” has recently passed the certification and has been published by Amazon. Currently the skill can talk back to you in 7 languages: Russian, Italian, French, German, Portuguese, Japanese, Spanish and output the transcription of the speech to your Alexa app:

Give it a test drive to see how it works: https://www.amazon.com/dp/B07GYS433J/

Skill Architecture

Here is how the architecture of the skill looks like, we will go through the steps in a moment:

“Language Teacher” Alexa skill architecture

Step 1 — User activates Echo device by using Alexa skill invocation name “language teacher” with any of available intent utterances

Step 2 — Echo device sends the request as an audio to Alexa Voice Service, where AVS through it’s ASR & NLU capability recognizes the skill and intent names

Step 3 — Alexa Voice Service invokes a particular skill endpoint which in our case is defined as AWS Lambda function ARN. In the request to Lambda AVS passes an identified intent name with all the derived slot values from the Step 2

Step 4 — Lambda function takes the Device ID from the request and goes to Dynamo DB table to check if the User has defined his preferred translation language before, if language has not been defined it will use a default language

Step 5 — The slot value and a language code are passed by Lambda to AWS Translate as parameters. AWS Translate returns a translated text in the response

Step 6 — Lambda takes AWS Translate response and passes it to AWS Polly along with the Voice ID that should be used to produce the speech from the text. AWS Polly returns a stream to the Lambda function in a response

Step 7 — Lambda writes the stream as an mp3 file to S3 bucket with the prefix as a hashed Device ID

Step 8 — Lambda prepares the response using SSML audio tag to reference the audio file location and sends it back to AVS

Step 9 — AVS sends the response to the Echo device

Step 10 and 11 — Echo device produces the response back to the User with audio file being streamed as part of the response

Skill Implementation

The implementation part is quite straight forward, for the full version and interaction model you can check out the source code here: https://github.com/Dlozitskiy/language-teacher/

As our lambda is dealing with asynchronous calls, the intent code has been structured as a chain of promises:

Translate:

Synthesize:

Write to S3:

Building the response:

For hashing of an Echo Device ID as an S3 object prefix, you can use md5 hash function:

const devId = handlerInput.requestEnvelope.context.System.device.deviceId;    
const prefix = require('crypto').createHash('md5').update(devId).digest("hex").toString();

Notes

Couple things to note before you start building:

  1. Even that skillBuilder has a property “withAutoCreateTable” set to “true” I noticed that the table has failed to create, so I had to pre-create the empty table manually with the primary key “id”
  2. Make sure that you have public-read policy on your S3 bucket for the Echo device to be able to play the mp3 file produced by AWS Polly
  3. For testing of your skill you can use tools https://echosim.io/ or https://reverb.ai/
  4. For icons you can use Alexa Skill icon builder https://developer.amazon.com/docs/tools/icon-builder.html which I didn’t end up using as I couldn’t find a globe icon 😊

That’s pretty much it, you can find the interaction model and the lambda function in the repo and try building it yourself!

Please thumbs up the post if you think it was interesting and you’d like to see more posts like that.

Responses (1)