Tuesday, April 24, 2012

Cloudy With Metrics

I’m working on a team that is responsible for a Windows desktop application that allows the display and manipulation of medical images. Medical images are just what you would imagine: x-rays, ultrasound, MRIs, computed tomography etc.

These images are large – hundreds of megabytes (more for ultrasound video) – and therein lies one of our big challenges: high performance for users everywhere. And many of our users are sat in Southern India or Europe, while our images are in a datacenter on the East Coast of America.

Now while we do have the luxury of nice big pipes connecting these locations, they inevitably suffer from fairly high latency (~300ms for India) due to the physical distance between them. That level of latency and TCP do not mix well.

This leads to a very disappointing rate of throughput even when ample bandwidth is available. Improving this situation can be boiled down to the application of just three techniques:
  1. Move the data faster
  2. Move less data
  3. Move the data before you need it

·        For us, the option of moving the data before it was needed was already off the table. Pre-caching image data is an approach we had been using for years with a legacy product. Although it worked respectably well, it was far from transparent to end users. With some regularity, data that people expected to reach remote users had not done so in a timely fashion, and much gnashing of teeth and submitting of helpdesk tickets ensued. Caching was a four letter word, so to speak.

So, we focused on moving the data faster, and moving less of it to boot. Exactly how we did that is not really the focus of this post (but so as not to leave you hanging, scaling of images and UDP were key).

What this post is really about, is how we made use of Amazon’scloud-based SimpleDB as a super-simple means to capture in-the-field metrics on the performance and usage of our application.

In our earliest pilot phase we displayed time to load and throughput in the application’s status bar, and users could obviously relay that information to us (yeah, that was as unsatisfactory as it sounds). And of course it was equally possible to log things to a user’s machine, e.g. with Log4Net or using the Windows Event Log. But that approach makes access to, and consolidation of, the captured data a bit more involved than we wanted it to be. So although it’s not that we couldn’t capture this data without a cloud based solution, it’s clear that some single, centralized capture of this data was an appealing option.

One way we might have got a centralized means to capture this data would be to have built ourselves a small database, fronted it with a set of web services and used that. But the effort involved in getting that done and deployed is, as with many organizations, non-trivial and we were looking to slip this functionality in ASAP.

By comparison, Amazon’s SimpleDB is exactly that – simple.

It’s “schema-less”, so there’s no DDL to create tables and so forth. Instead, data is stored in domains (akin to tables; you do need to create these ahead of time) made up of items (just like a row) as a set of attributes (a bit like a column, but better thought of as name-value pairs).

But enough English, here’s some C# to show just how easy it can be to stash some data into SimpleDB. It can in fact be even easier than this, since here I’m using the asynchronous approach to putting data in.

       private void LogMetricToCloud(IMetric metric)
        {
            string domain = metric.GetType().FullName;
            CreateDomainIfNeeded(domain);

            try
            {              
                PutAttributesRequest request = new PutAttributesRequest()
                    .WithDomainName(domain)
                    .WithItemName(Guid.NewGuid().ToString());

                foreach(var attribute in metric.Attributes())
                {
                    request.WithAttribute(new ReplaceableAttribute()
                        .WithName(attribute.Name)
                        .WithValue(attribute.Value));
                }

                string state = metric.GetType().FullName + ":" + metric.CSVData();
                IAsyncResult asyncResult = _db.BeginPutAttributes(request, new
AsyncCallback(SimpleDBCallBack), state);
            }
            catch (Exception e)
            {
                TryLogError(e.Message);
            }
        }

        private void SimpleDBCallBack(IAsyncResult result)
        {
            string state = result.AsyncState as string;
            try
            {
                // If there was an error during the put attributes operation it will be
thrown as part of the EndPutAttributes method.
                _db.EndPutAttributes(result);
            }
            catch (Exception e)
            {
                TryLogError(string.Format("Exception: {0}\nFailed to log: {1}", e.Message,
state));
            }
        }


I think that what the code does and how is fairly self-evident, but that could just be because I’ve been working with it. Just to make sure it’s clear, here’s a breakdown:
  • I have a number of different classes of metric, all conforming to the IMetric interface
  • I have a domain (think: table) for each different kind of metric
  • As a concrete example, one of my metrics measures feature utilization; features are things like flipping and masking images. For feature utilization I record who, where, when, what project and what feature they used.
  • The  LogMetricToCloud method simply iterates through all the attributes of a metric and creates a request to persist this in my SimpleDB instance.
  • The request is made asynchronously, with the creatively named  SimpleDBCallBack method being invoked upon completion.  The call here to EndPutAttributes will let us know if any problems occurred with the original request

All of this seems to be working pretty well, and I hope that in the near future we can start to make even more use of SimpleDB. Next on the list for me is implementing some “feature toggle” type functionality for our application. With that we can entertain some of the ideas described in this great article by Jez Humble about core practices for continuous delivery.