Introduction

This article is a continuation of Alexa Custom Skill 1 with Visual Studio + C#: Greeting on Twitter. The content of the previous article serves as the foundation here.

For writing Lambda functions, I chose C#, which is a minor language for this purpose (at least in my opinion), to create a custom skill.

Overview

In the previous implementation, the process was completed with a single round of conversation like this:

This time, I improved it a bit so that the conversation continues as follows, and the process branches depending on the user’s response:

Implementation

For this implementation, I changed almost only the Lambda function from the previous version. (The conversation model was also slightly modified.) To continue the conversation with Alexa, you need to use sessions, so I implemented that part. For details on session management, I referred to Alexa Skill Development Training Series Chapter 3: Designing Voice User Interfaces. This time, I created the C# code based on the Node.js sample here.

The Lambda project is available here.

Conversation Model

I only changed the intent schema a little. *Added AMAZON.NoIntent and AMAZON.YesIntent to accept Yes and No responses.

Custom slot types and sample utterances are the same as in the previous article.

{
  "intents": [
    {
      "slots": [
        {
          "name": "Word",
          "type": "GREETING_WORD"
        }
      ],
      "intent": "TwitterIntent"
    },
    {
      "intent": "AMAZON.NoIntent"
    },
    {
      "intent": "AMAZON.YesIntent"
    },
    {
      "intent": "AMAZON.HelpIntent"
    },
    {
      "intent": "AMAZON.StopIntent"
    }
  ]
}

lambda

As before, I coded in an environment with AWS Toolkit for Visual Studio installed.

I added the following two libraries via NuGet:

  • Alexa.Net
  • CoreTweet

Initialization

As before, Twitter access information is obtained from environment variables. This time, I also implemented state management, with a different function called for each state.

Although you could branch with if statements, I cached delegates in anticipation of the number of states increasing in the future.

 private readonly string APIKey;
 private readonly string APISecret;
 private readonly string AccessToken;
 private readonly string AccessTokenSecret;
 private readonly Dictionary<string, Func<IntentRequest, Session, SkillResponse>> stateHandlers;

 public Function()
 {
     APIKey = Environment.GetEnvironmentVariable("API_KEY");
     APISecret = Environment.GetEnvironmentVariable("API_KEY_SECRET");
     AccessToken = Environment.GetEnvironmentVariable("ACCESS_TOKEN");
     AccessTokenSecret = Environment.GetEnvironmentVariable("ACCESS_TOKEN_SECRET");
     stateHandlers = new Dictionary<string, Func<IntentRequest, Session, SkillResponse>>
     {
         { EConversationState.StartState.ToString(), FunctionHandler_StartState },
         { EConversationState.ConfirmState.ToString(), FunctionHandler_ConfirmState }
     };
 }

This mechanism changes the function called for each state, and the state is stored in the session attributes. (See input.Session.Attributes[“STATE”] in the code below.)

In this function, the state is read from the session attributes, and the cached delegate is retrieved to handle processing for each state.

 public SkillResponse FunctionHandler(SkillRequest input, ILambdaContext context)
 {
     var requestType = input.GetRequestType();
     if (requestType != typeof(IntentRequest)) return null;
     var intentRequest = input.Request as IntentRequest;

     var state = input.Session.Attributes.ContainsKey("STATE") ? input.Session.Attributes["STATE"] as string : EConversationState.StartState.ToString();
     var handler = stateHandlers[state];
     return handler(intentRequest, input.Session);
 }

FunctionHandler_StartState

StartState processing. The user is expected to say “Tweet △△△”, so I first get △△△. The obtained content needs to be remembered, so it is registered in the session attributes with the key “Word”. Next, Alexa responds, but here I use Ask instead of Tell. Using Ask keeps the session open, and Alexa immediately waits for the next utterance.

private SkillResponse FunctionHandler_StartState(IntentRequest intentRequest, Session Session)
{
    // Ignore intents other than TwitterIntent
    if (intentRequest.Intent.Name.Equals("TwitterIntent") == false) return ResponseBuilder.Tell("Unexpected request. Cancelling.");

    // Get the value of the Word slot
    var wordSlotValue = intentRequest.Intent.Slots["Word"].Value;

    // Response from Alexa
    Reprompt rep = new Reprompt();
    rep.OutputSpeech = new PlainTextOutputSpeech() { Text = "Is it okay to tweet this?" };
    Session.Attributes = new Dictionary<string, object>();
    // Remember the phrase to tweet
    Session.Attributes["Word"] = wordSlotValue;
    // Change state to ConfirmState
    Session.Attributes["STATE"] = EConversationState.ConfirmState.ToString();

    return ResponseBuilder.Ask($"Is it okay to tweet '{wordSlotValue}'?", rep, Session);
};
}

FunctionHandler_ConfirmState

This function handles the ConfirmState. It expects the user to answer either “Yes” or “No”, and changes the behavior depending on the response.

Note: The built-in intents for “Yes” and “No” are used here. As expected, built-in intents have better recognition accuracy than custom ones.

If “Yes”

Retrieve the remembered phrase from session attributes and post it to Twitter.

If “No”

Cancel the post.

Regardless of the answer, the session should end after this, so Alexa responds with Tell. To use the function again, start from the beginning.

private SkillResponse FunctionHandler_ConfirmState(IntentRequest intentRequest, Session Session)
{
    // If NO is returned, cancel the process
    if (intentRequest.Intent.Name.Equals("AMAZON.NoIntent"))
    {
        return ResponseBuilder.Tell("Okay, I won't post it.");
    }

    // If it is not YES, treat as unexpected and cancel
    if (intentRequest.Intent.Name.Equals("AMAZON.YesIntent") == false)
    {
        return ResponseBuilder.Tell("Unexpected response. Cancelling.");
    }

    // Retrieve the remembered phrase from session attributes
    var wordSlotValue = Session.Attributes["Word"] as string;

    // Generate required information for Twitter API
    var tokens = CoreTweet.Tokens.Create($"{APIKey}", $"{APISecret}", $"{AccessToken}", $"{AccessTokenSecret}");

    // Post to Twitter
    tokens.Statuses.UpdateAsync(new { status = wordSlotValue }).Wait();

    // Report the result
    return ResponseBuilder.Tell($"I posted '{wordSlotValue}' on Twitter.");
}

Testing on Real Device

It didn’t work perfectly at first, and I had to make several adjustments, but in the end, it worked as intended. If you let Alexa handle everything in a single round, misrecognition can be a problem, but by adding a user confirmation step, accuracy improved. However, if you always require confirmation, it can reduce usability, so in actual skill development, you should design according to the use case.

Summary

By using sessions in a skill created with C# in Visual Studio, I was able to have a back-and-forth conversation with Alexa. This greatly expands the range of possible features.

This time, there were only two states, so it wasn’t a problem, but as the number of states increases or you need to handle nested states, you may need to be more creative with the code structure.