AWS GoT Hack: 48-hour Status Update

To recap, I'm working on a 48-hour hack to help learn about AWS.  My goal is to:

Create a "Game of Thrones" Alexa Skill that allows the user to ask if a character is alive or dead, and responds with an accurate answer.

In the previous post, I got the target intent working. Kinda. I can now ask Alexa if a character is alive or dead, but my code currently has no way of figuring this out, so Alexa always responds that she doesn't know.

Update

It has been roughly 48 hours since I started this 48 hour hack. Or, more accurately, it has been roughly 2 days.  Not nearly 48 hours, because, well, life. My Alexa skill is far from complete, but the real goal of this exercise was to experiment with as many AWS services as possible, so let's take a look at the current list:

It's interesting to note that I used some of the services (IAM, CloudWatch) because they are either required by, or enhanced the functionality of, another service. That is, I didn't choose them independently to solve my problem.

Other Interesting AWS Services

As I've worked on this, I've encountered lots of other AWS services that may be helpful in finishing my Alexa skill.  I'm going to enumerate them here and compare the list to what I actually use. Given my experience so far, I'm guessing the final list will be quite a bit longer than this.

S3

My rough high-level design for the skill is to use Lambda to copy changed web content from the Wikia and store it in S3. I can then use S3 triggers to use a different Lambda function to parse the new web content and store it in a DB. Storing a copy of the web content also means I can enhance my parsing without having to hit the Wikia site over and over again.

RDS or DynamoDB

I don't want to embed the Wikia parsing logic in my Alexa skill. Why parse every time someone asks a question? Also, seems like the weakest part of the system will be the parsing, as I'll be parsing human-edited content. So I will parse the content only when the content changes, or when I change the parser. It seems obvious to store the output of the parser in an easy-to-query database, so the Alexa skill logic can be minimized and fast.

RDS seems sort of the default choice, right? A relational DB is how this has been done in the past. And Amazon provides Aurora, a MySQL-compatible RDB, so I don't even have to choose or manage an RDB product.

But I'd have to live in a box to be unaware of the "SQL vs. NoSQL" Holy War, and one of the big differences has me leaning towards NoSQL: It's schema-less, meaning I don't have to come up with a schema up front and I don't have to worry about migration as the schema changes. But not all is rosy with NoSQ: I'm concerned about reduced ability to query, and the lack of a schema means that my DB operations aren't automatically checked against a schema for validity.

Lucky for me, AWS provides both RDS and NoSQL solutions, so which ever way I go, I'm covered.

SQS

I want to split the implementation across as many self-contained services as possible. I've been doing this more and more in my iOS projects, and have seen lots of benefits in the stability and adaptability of my code. Check out "microservices" for more description of what I mean.

I'm hoping to use AWS triggers as much as possible to interface between my services, but one place I don't think I can do that is in the interface between the Wikia change monitoring service and the Wikia copying service.  But AWS has a queueing service, the "Simple Queueing Service", that I hope to use for this. My basic idea is to have the change monitoring service add each changed page as a message into the queue, and have that trigger the copying service process to copy the new content to S3.

CodeDeployCodePipeline or what?

One of my biggest gripes about Alexa Skill development is that so many of the processes are manual. If the Intent Schema or Sample Utterances change, I have to go to a web form, log in and copy and paste them. Whenever my Lambda code changes, I have to zip the code up and go to a different web site, log in and upload it.  That is going back in time from where all my other development toolsets are today.

I was hoping that one of Amazon's other developer tools, CodeDeploy or CodePipeline would help me here, but it looks like they only work on EC2 instances. This might be an area where I'll have to create my own tools. Not really a horrible thing, as its a chance to use the experience of all those years as an SCM Process engineer.

Amazon Alexa

No, not that Alexa. The most unfortunately named AWS service, which I can't even remember how I stumbled upon. It looks like a query-able DB of the web. It might be that I can use this in some way to monitor for Wikia changes. But I'm concerned that Amazon isn't supporting this any longer, as I can't believe they would have two concurrent services named "Alexa". Or maybe it has just changed names.

EMR

The Elastic Map Reduce service uses Hadoop to process big data.  My data isn't all that big, but I like the idea of using MapReduce to parse the web pages. EMR would probably be overkill, but it would be interesting to experiment with it.

SNS

I have a couple of ideas for using the Simple Notification Service. The first is simply as a notification tool - I can use it to notify me if anything strange happens to my service. It can also be used as a "message bus" between AWS services, so I might use it instead of, or in addition to, SQS. I'm also wondering if it might be extended to Alexa. Imagine if my skill remembers what a user has asked it. If, in the future, the answer changes, e.g. a character dies, perhaps my skill could directly notify the user of this.  Of course, it might freak people out if Alexa "randomly" told them that Tyrion Lannister just died. If they are die-hard GoT fans, and they hadn't yet watched the show, this might also lead to the destruction of Echo devices... ;-)

CloudSearch

CloudSearch is interesting to me because it makes whatever data you give it searchable. It seems to be intended for user-generated searches, so perhaps it wouldn't give me the specific querying capability that I need. But, if it did, it would potentially eliminate several of my services and my S3 and DB requirements. The other possibility is that it could simplify those services by allowing my parser to only parse pages that have the correct content.

SES

The Simple Email Service might be a way to help me monitor for Wikia changes. The obvious way to do this is to periodically ask Wikia if it has any changes. But I really hate polling. Wikia can definitely send me an email when a page changes. What if that email went to SES instead, and that triggered my monitor Lambda function? The monitor function would then just queue a message to the copy service, and wouldn't have to poll at all.

SWF

Simple Workflow is used to manage and coordinate applications. I'm guessing it is targeted at much larger applications than mine, but I certainly have multiple tasks that need management and coordination. My services can do this on their own, and with other AWS services, but SWF deserves a look to see if it could bring this management into one place, further separating and isolating the services.

CloudFormation

With CloudFormation, I could potentially manage my AWS resources in a defined way. Maybe I can use it to control the bring up or bring down of the services. Or maybe I could use it to deploy my Lambda code. Again, I think it is targeted at much more complex applications, but it is certainly worth a look.

CloudTrail

Once I roll out my Skill to broader use, I want to be able to see how it is being used. What questions are the users asking? Which ones work, and which ones don't? How can I extent my Sample Utterances to cover more of the questions as the users are phrasing them? CloudTrail lets me look at my AWS API call history. I'm not sure that on its own it will give me the layer of detail I require, but coupled with other logging I should be able to learn a lot. I can also use CloudWatch to watch the CloudTrail logs and send SNS notifications when something "interesting" happens.

Config

Having worked at Tripwire, and been the owner of several complex IT infrastructures, I really value change management. Config looks like it can provide this service. I'll just need to see if it can provide this service across the other AWS services I am using.

OpsWorks

I like the idea of using tools like Puppet or Chef to configure and manage my infrastructure. I'm just not sure if these tools support AWS services other than EC2.

Inspector

At the moment, I'm not too concerned about security for my skill. The data is open-source, and I don't expect to make any money off of the skill. However, my experience has been that security issues tend to creep up on you, and any thought put in ahead of those issues is usually valuable. What if, for example, someone hacked my skill and put in a bunch of foul language in the responses? That could tarnish my brand, even though the skill is "free". 

Conclusion (for now)

So, in the words of the late, great Richard Dawson, "Survey says?":

  • 6 AWS services in use
  • 17 AWS services to investigate

Amazon roles out new services at a pretty heady clip, so the number to investigate will probably continue to rise.

I've run out of time for this exercise, but this is far from the end. I'll have to challenge myself with another "2 day" hack to see if I can get the skill working enough to submit it to Amazon.  

Coming to an echo near you?