Tag Archives: Amazon SWF

With Right for mimeType comes Responsibility too

About a few weeks back, Amazon SWF started validating the content type of the incoming request. As a result, a my-until-then-working-application started failing with this exception:

{\”__type\”:\”com.amazon.coral.service#UnknownOperationException\”,\”message\”:null},\”Version\”:\”1.0\”}”

To get past the error, I had to set the contentType to  ‘application/x-amz-json-1.0’ instead of ‘application/json’.

Not a big deal (just a few hours of troubleshooting) until I put my REST glasses on (note: neither amazon swf nor I are claiming that amazon swf is REST compliant) .  This incident did give me a chance to stop and ponder over API design in general and REST API design in particular. The spirit of REST compliant architecture is to enable independent evolvability of resources (like Amazon SWF) and consumers (my application).  I believe these are a few points to bear in mind:

Describe your representation (mimeType)

Representations are the medium using which consumers modify a resource. Hence, if you roll a custom mimeType, it is your responsibility to provide a detailed definition of it, like application/json, or text/vcard.

Amazon did not do that with ‘application/x-amz-json-1.0’.  Currently, i am not sure what it is. You and I can guess what is, but that is not how things should be.

Further, REST does not specify where your custom representation definition should be stored. You could create your own “well known” location, like IANA, where your consumers can look it up.

Don’t “alias” existing mimeTypes

REST is all about relying on standards, on the premise that it will reduce dependency. If there is an existing standard that fits your needs, like application/json, then use it. It is not being responsible to roll out a mimeType that is exactly like an existing one. It appears that you might be hedging against changes to json or preparing to roll your own. In either case, your are guaranteed to break your client in future, because your client and you are not working of the same standard.

Roll your own mimeType, only when it has something special to offer. mimeTypes/representations (reading/writing) are “generic” skills that your client needs to master. For example, browsers need to know how to render ‘text/html’, ‘image/jpeg‘ and so on and so forth.

Preserve backward compatibility

This is the most obvious one. Find a way to keep your client up and running, especially for such seemingly superficial change like replacing ‘application/json’ with ‘application/x-amz-json-1.0”

Amazon: Simple Workflow

It is amazing how much free time we have after the football season ends :-).  My football season ended two weeks ago when the New England Patriots were eliminated (it still hurts – Next year will be different).  I made good use of this additional time by taking a geekation that had been wanting to for a long time.

I decided to take a trip to Amazon’s SWF land.  I have been hearing a lot about it, at work,in the community  After all, it is on THE CLOUD.

With destination picked, I needed a theme to make the most of my stay there. I chose to build a very rudimentary digital asset management system. The idea is to touch upon the major moving parts for a digital asset management system and focus on the happy path. I eventually hope to evolve this into a reference implementation. I don’t expect anyone (including myself) to put this code in production. Just have some fun and learn.

I selected these cloud solutions to build my reference application on.

  1. Amazon SWF
  2. Amazon S3
  3. Amazon RDS
  4. Amazon CloudSearch
  5. Encoding.com

The idea is figure out how to stitch these cloud solutions together.

On the back of a paper napkin, the activity workflow to upload a video file into a digital asset management system could look like this:

Activity

To implement the above workflow in the SWF framework, I needed to build three classes of applications:

  1. Activity Workers
  2. Workflow Deciders
  3. Workflow Triggers

Activity Worker

An Activity Worker is an application that hosts logic to perform an activity (in the above diagram).  It pulls work from an SWF task queue (called task list), works on it and finally reports the result back to SWF.

  • Typically, one Activity Worker does one Activity. This is ideal for several reasons.
  • In a few rare cases, one worker can handle more than one activity.

Since workers in SWF worker solely on “pull” (asynchronous) model, three scalability options are available to us, viz.,

  • Demand smoothening; a worker chooses work that matches its available capacity.
  • Scale out; stand up more instances of workers as demand increases.
  • Scale up; throw more hardware at a single worker.

A mentioned earlier, a worker pulls activity off task list in SWF.

  • Typically, a one-to-one relationship exists between a task list and an activity type.
  • Alternatively (but rarely),
    • A task list can contain more than one activity type. A case where a single worker can handle more than one activity.
    • A single activity type (not activity instance) can appear more than one task list. A case where activity instances need to be prioritized.

Workers typically, like I did, are implemented as daemon processes. However, there is nothing in the SWF architecture that prevents from deciders and workers being interactive application like web application or even console applications.

The activity workers I have in my application are:

  1. Upload Worker : Responsible for uploading the digital files into the DAM.
  2. Transcode Worker: Responsible for creating variation(s) for the uploaded digital file (Uses Encoding.com)
  3. Asset Management Worker : Responsible for recording the metadata of uploaded digital file.
  4. Asset Index Worker : Responsible for indexing the metadata in a search engine (Amazon CloudSearch)

To leverage all the benefits of SWF (asynchronous activities), we need to achieve the following in our design:

  1. We only assume the order in which activities are given to workers.
  2. We cannot assume the order in which activities will be completed (As one worker might be slower than the other)
  3. The activities on a task list should not have any dependency of each other. This will severely limit how we can scale our application.
  4. The workers should be idempotent.

Workflow Deciders

While workers do heavy lifting, deciders orchestrate the workers. They decide when an activity should be done. A decider also sets the policy for an activity that is enforced by SWF.

In many respects, deciders are like workers. They, like workers, pull decision tasks (created by SWF) from a task list, take decision, and report the result back to SWF.

Deciders can afford to be stateless. SWF, as part of the decision task request provides a workflow event log. By retracing the log the decider can figure out what should happen next. This type of decision-making could get tricky quite quickly, but that is the fun part.

I believe it is possible to scale-out the deciders, but we need to be extremely careful with our design. Imagine, two decider instances making a decision on the same workflow at the same time. Fortunately, SWF will give different workflow event log to both and workflow event logs are strictly sequential and append only. Now it is up to us to design the deciders to be idempotent.

I have two deciders:

  1. Single Submission Decider : A workflow to control the submission of a single digital file
  2. Bulk Submission Decider : A workflow to control a bulk submission. This piggy backs on the Single Submission Decider.

Workflow Triggers

Workflow triggering applications are usually consumer-facing applications that accept work and set the “workflow” ball rolling. Unlike workers and deciders, messages are pushed to these triggering applications.

I have one triggering application:

  1. Submission Management Service : A REST service that accept bulk submission requests and triggers the “Bulk Submission Workflow”
  2. Bulk Submission App : A command line application installed on the end user’s computer to submit digital assets. This consumes the Submission Management Service’s API.

What’s left for SWF to do?

Good question, right?

The right question to ask is “what we didn’t have to do build a distributed, scalable and synchronous digital asset management system”.

  1. Durable Task Queues
  2. Task Timeout
  3. Task Retries
  4. Workflow Traceability
  5. Policy based control of activities

Post Geekation Blues

This was good geekation.  Why to do I think so? Because I already a have list of “things to check out” on my next break

  1. Enhance the reference implementation to handle unhappy paths
  2. Can the workflow logic be externalized and put in the hand of business users?
  3. Project the operating cost of this reference implementation in some hypothetical business settings.