Introduction
This article is a continuation of Alexa Custom Skill 1 with Visual Studio + C#: Greeting on Twitter. The content of the previous article serves as the foundation here.
For writing Lambda functions, I chose C#, which is a minor language for this purpose (at least in my opinion), to create a custom skill.
Overview
In the previous implementation, the process was completed with a single round of conversation like this:
This time, I improved it a bit so that the conversation continues as follows, and the process branches depending on the user’s response:
Implementation
For this implementation, I changed almost only the Lambda function from the previous version. (The conversation model was also slightly modified.) To continue the conversation with Alexa, you need to use sessions, so I implemented that part. For details on session management, I referred to Alexa Skill Development Training Series Chapter 3: Designing Voice User Interfaces. This time, I created the C# code based on the Node.js sample here.
The Lambda project is available here.
Conversation Model
I only changed the intent schema a little. *Added AMAZON.NoIntent and AMAZON.YesIntent to accept Yes and No responses.
Custom slot types and sample utterances are the same as in the previous article.
{
"intents": [
{
"slots": [
{
"name": "Word",
"type": "GREETING_WORD"
}
],
"intent": "TwitterIntent"
},
{
"intent": "AMAZON.NoIntent"
},
{
"intent": "AMAZON.YesIntent"
},
{
"intent": "AMAZON.HelpIntent"
},
{
"intent": "AMAZON.StopIntent"
}
]
}
lambda
As before, I coded in an environment with AWS Toolkit for Visual Studio installed.
I added the following two libraries via NuGet:
- Alexa.Net
- CoreTweet
Initialization
As before, Twitter access information is obtained from environment variables. This time, I also implemented state management, with a different function called for each state.
Although you could branch with if statements, I cached delegates in anticipation of the number of states increasing in the future.
private readonly string APIKey;
private readonly string APISecret;
private readonly string AccessToken;
private readonly string AccessTokenSecret;
private readonly Dictionary<string, Func<IntentRequest, Session, SkillResponse>> stateHandlers;
public Function()
{
APIKey = Environment.GetEnvironmentVariable("API_KEY");
APISecret = Environment.GetEnvironmentVariable("API_KEY_SECRET");
AccessToken = Environment.GetEnvironmentVariable("ACCESS_TOKEN");
AccessTokenSecret = Environment.GetEnvironmentVariable("ACCESS_TOKEN_SECRET");
stateHandlers = new Dictionary<string, Func<IntentRequest, Session, SkillResponse>>
{
{ EConversationState.StartState.ToString(), FunctionHandler_StartState },
{ EConversationState.ConfirmState.ToString(), FunctionHandler_ConfirmState }
};
}
This mechanism changes the function called for each state, and the state is stored in the session attributes.
(See input.Session.Attributes[“STATE”]
in the code below.)
In this function, the state is read from the session attributes, and the cached delegate is retrieved to handle processing for each state.
public SkillResponse FunctionHandler(SkillRequest input, ILambdaContext context)
{
var requestType = input.GetRequestType();
if (requestType != typeof(IntentRequest)) return null;
var intentRequest = input.Request as IntentRequest;
var state = input.Session.Attributes.ContainsKey("STATE") ? input.Session.Attributes["STATE"] as string : EConversationState.StartState.ToString();
var handler = stateHandlers[state];
return handler(intentRequest, input.Session);
}
FunctionHandler_StartState
StartState processing. The user is expected to say “Tweet △△△”, so I first get △△△. The obtained content needs to be remembered, so it is registered in the session attributes with the key “Word”. Next, Alexa responds, but here I use Ask instead of Tell. Using Ask keeps the session open, and Alexa immediately waits for the next utterance.
private SkillResponse FunctionHandler_StartState(IntentRequest intentRequest, Session Session)
{
// Ignore intents other than TwitterIntent
if (intentRequest.Intent.Name.Equals("TwitterIntent") == false) return ResponseBuilder.Tell("Unexpected request. Cancelling.");
// Get the value of the Word slot
var wordSlotValue = intentRequest.Intent.Slots["Word"].Value;
// Response from Alexa
Reprompt rep = new Reprompt();
rep.OutputSpeech = new PlainTextOutputSpeech() { Text = "Is it okay to tweet this?" };
Session.Attributes = new Dictionary<string, object>();
// Remember the phrase to tweet
Session.Attributes["Word"] = wordSlotValue;
// Change state to ConfirmState
Session.Attributes["STATE"] = EConversationState.ConfirmState.ToString();
return ResponseBuilder.Ask($"Is it okay to tweet '{wordSlotValue}'?", rep, Session);
};
}
FunctionHandler_ConfirmState
This function handles the ConfirmState. It expects the user to answer either “Yes” or “No”, and changes the behavior depending on the response.
Note: The built-in intents for “Yes” and “No” are used here. As expected, built-in intents have better recognition accuracy than custom ones.
If “Yes”
Retrieve the remembered phrase from session attributes and post it to Twitter.
If “No”
Cancel the post.
Regardless of the answer, the session should end after this, so Alexa responds with Tell. To use the function again, start from the beginning.
private SkillResponse FunctionHandler_ConfirmState(IntentRequest intentRequest, Session Session)
{
// If NO is returned, cancel the process
if (intentRequest.Intent.Name.Equals("AMAZON.NoIntent"))
{
return ResponseBuilder.Tell("Okay, I won't post it.");
}
// If it is not YES, treat as unexpected and cancel
if (intentRequest.Intent.Name.Equals("AMAZON.YesIntent") == false)
{
return ResponseBuilder.Tell("Unexpected response. Cancelling.");
}
// Retrieve the remembered phrase from session attributes
var wordSlotValue = Session.Attributes["Word"] as string;
// Generate required information for Twitter API
var tokens = CoreTweet.Tokens.Create($"{APIKey}", $"{APISecret}", $"{AccessToken}", $"{AccessTokenSecret}");
// Post to Twitter
tokens.Statuses.UpdateAsync(new { status = wordSlotValue }).Wait();
// Report the result
return ResponseBuilder.Tell($"I posted '{wordSlotValue}' on Twitter.");
}
Testing on Real Device
It didn’t work perfectly at first, and I had to make several adjustments, but in the end, it worked as intended. If you let Alexa handle everything in a single round, misrecognition can be a problem, but by adding a user confirmation step, accuracy improved. However, if you always require confirmation, it can reduce usability, so in actual skill development, you should design according to the use case.
Summary
By using sessions in a skill created with C# in Visual Studio, I was able to have a back-and-forth conversation with Alexa. This greatly expands the range of possible features.
This time, there were only two states, so it wasn’t a problem, but as the number of states increases or you need to handle nested states, you may need to be more creative with the code structure.