Monday, August 20, 2012

A (very) brief foray into Spring 3.1

Introduction

As mentioned in my last post, I'm reacquainting myself with Java web application development. This post covers my very brief foray into using Spring 3.1.

I created a basic starter app using a Maven archetype. The generated starter app included Spring MVC 3.1, Spring Security (didn't need this, but it was easy to control what URLs were protected or not), Tiles and JPA/Hibernate/HSQLDB for persistence.

In terms of content and code, there was an index page and its corresponding controller class. Most useful to me though was a User POJO (set up as an entity that worked with persistence) and the corresponding UserRepository and UserService classes. The basic pattern seems to be that controllers interact with services to get done what they need done. In turn, Services (may) talk to Repositories (Repository being Spring-speak for a good old fashioned DAO) to perform the usual create, read, update and delete operations.

The way this starter app was configured was primarily Java based (rather than the more traditional XML based configuration). This was interesting to see, although when googling around for examples one tends not to yet find so many using this approach.

OK, moving on. As mentioned in my original post, I am going to just put together a page that lists job data from a database, as well as providing a form to add more job data.


Building a page to display a table of information retrieved from a database

I needed to create several items in the project to achieve this:

  • A Job POJO – Job.java
  • A controller corresponding to that page – JobController.java
  • A service for the controller to interact with – JobService.java
  • A repository (i.e. DAO) to manage the persistence of jobs – JobRepository.java
  • And of course a web page. Went with a regular JSP called jobs.jsp
The Job POJO had to be annotated appropriately with JPA annotations, thus:

After which it just worked. (Schema creation was covered by the hibernate.hbm2ddl.auto=create setting in persistence.properties – good enough for this experiment).

The JobController had to be annotated as a controller and instructed what request to kick in for with the @RequestMapping annotation. Note also the @Autowired JobService – this is how we inject the JobService here:

Note the index method also has a @RequestMapping annotation. At this point I could have either just annotated the class or the method itself with both the value and method attributes like this:    @RequestMapping( value = "/jobs" method = RequestMethod.GET ) -- but instead they are separated in anticipation of a method that will handle the post from the form later on.

Another key thing here is to note our list of jobs is added to that Model. This is how it gets into scope for use on the view (JSP page) you’ll see below.

Moving on to the JobService, which shook out like this:

Note the @Service annotation, which I understand is just a specialized form of @Component, though obviously the intent is clearer. Its presence ensures the class gets picked up during initialization and can participate in dependency injection. (Additionally, @Controller and @Repository fulfill the same duty, ensuring classes marked as such are picked up during the class path scanning of initialization).

Just as the JobController used @Autowired to inject JobService, so JobService uses @Autowired to bring in JobRepository.

The @PostConstruct annotation is an EE annotation that is pretty neat. You can mark one method of a class with this and after dependency injection is complete it will be called, allowing constructor-like behavior. For here I am using it in a slightly wacky way, as a means to stuff in a few sample jobs to my database (only one example shown above for brevity).

Finally, the listJobs() method simply gets a list of Jobs from the repository (i.e. the DAO).

This brings us to the JobRepository which looks like this:

First off, note the @Repository class annotation which indicates that this is basically a DAO (as well as also ensuring it gets picked up and is able to be part of the whole DI setup) and @Transactional class annotation and able to participate in Spring transaction handling.

Next, the JPA EntityManager is defined with the @PersistenceContext annotation.

Then the two DAO methods we need follow; one for saving Jobs and one for getting a list of them all. The listJobs() method is annotated with @Transactional again, with the readOnly attribute set to true presumably ensuring this operation is inhibited from ever modifying data.

Finally there’s the JSP which looks like this (just showing the table used for presenting the list of jobs, obviously there’s a little more than this):

I don’t think there’s really anything to explain here assuming familiarity with JSTL

Adding a form to allow the user to add new jobs

Mostly as an excuse to write a little JavaScript I wanted the form for adding jobs to be hidden until a user chose to add one. At that point it should be revealed, they could fill it out, submit and the page would refresh with the new job added to the list and the form hidden again.

To achieve this I ended up adding:

  • A form with appropriate fields and a submit button
  • A small JavaScript function to toggle the visibility of the form
  • A model attribute to transport the form’s values over to my controller
  • An addJob method to handle the form post

Here’s two fragments from the JSP with the JavaScript and the form for adding jobs:

The JavaScript is completely unexciting as I just knocked out the first thing that came to mind.

More interesting is the use of the Spring form taglib (declared with <%@ taglib uri="http://www.springframework.org/tags/form" prefix="f"%>). The first key thing is the modelAttribute attribute of the <f:form> tag which marries up with the following in the JobController:

Through this we have an empty Job POJO ready and waiting to collect the values entered to the form.

The second key thing is that the path attributes of the <f:input> tags match up with the property accessors on the Job POJO.

Finally, here’s the method in the JobController to handle the form post:

This seems pretty straightforward. Note the @RequestMapping annotation indicating this should handle POSTs. The method simply takes the incoming Job and uses the JobService to persist it before redirecting to the jobs page (redirecting after a POST preventing subsequent resubmits of the form).

Thoughts...

Initially I was a little frustrated with Spring since my starter app was clearly geared to be done in the latest and greatest style, with little to no configuration taking place in XML, whereas the vast majority of tutorials out there aren't trying to teach you that. Eventually, after finding enough resources to get me going though I really quite liked how it shook out. There's a lot of power in those annotations, leading to very little cruft, and pretty simple, readable code. Granted I suppose there's a lot of older applications still heavy with dense XML and other complications, but if it's headed away from that I like it quite a lot.

Sunday, August 19, 2012

Reacquainting myself with Java web application development

After a few years in a technical management role, I am reacquainting myself with hands-on Java web application development. After all, nothing says "can do" like "have done" -- however there's an almost bewildering amount of choice when it comes to picking frameworks.

I decided I would start by looking at Spring, Tapestry, Wicket and maybe plain EE6 as that would cover the most popular setups people seem to be using. Additionally I have chosen to use NetBeans as my IDE (I always had a soft spot for NetBeans, and it's looking pretty good these days), Maven as my build tool and JPA/Hibernate for persistence.

I decided to build a web app for managing information about job opportunities. Something that might help keep one on top of things amid the phone calls from recruiters, talent acquisition managers and the various phone screens and interviews.

Minimally, this app would have a way to enter information about jobs, and review the list of jobs. This would all be on a single page. Beyond that there are a few other things I may add once I have the basics in the various different frameworks figured out including:

  • Making the form submit / addition of a job AJAX based to eliminate page refresh 
  • Editing job information 
  • Changing status of jobs (applied, phone screen, in person interview etc.) 
  • Bringing in jQuery and looking at some sexy transitions to reveal and hide the "add job" form, sortable table etc. 
  • Paging of the jobs table (granted I hope nobody needs to apply for so many jobs that paging is required…but interested to see how this pans out) 
  • An entirely gratuitous web service of some kind…
OK, so first up is Spring. A blog post on my experiences there will follow shortly.

Thursday, June 7, 2012

Explaining different types of testing


I was recently involved in a software release where some rather unfortunate defects made their way to production. This resulted in some (not unreasonable) tough questioning from our stakeholders.

One particular area they wanted to ask about was our testing practices. Were we not testing our product the same way their user community used it?

This highlighted an important disconnect in understanding the variety of types of testing that a responsible, professional team would usually employ to thoroughly test a product.

It was true that our testing wasn’t good enough for that release. But there was much more to it than simply “testing the software in the way users use it.”

In a bid to help them understand our weak areas (and what we were doing to remedy those) I went hunting for one of those “test pyramids” modeled on the same idea as the USDA food guide pyramid (http://www.nal.usda.gov/fnic/Fpyr/pmap.htm)

There are plenty out there, but I wanted to assemble my own which is what you can see here (click to enlarge):


If I were to do this again I’d probably separate out a bit more the different types of testing in the lower blue section for better emphasis of each, but you can get the gist of things from it as is.

As you can see, I’ve classified the testing appropriate for our product in quite a few different ways. Understanding what these all were, how they differed and why each different kind was necessary was (it became apparent) difficult for our non-technical audience to easily understand.

Reflecting upon this later, I came up with the following analogies that I hope helps make these different kinds of test more understandable to non-developers (and even some developers...)

Why so many different kinds of test?

This too was a puzzle for some. The idea I came  up with to explain this was that, just as a doctor may need a wide variety of tests to diagnose a patient, so too we need a variety of tests to check the "health" of our software.

The analogies below are all based around the idea of building cars. They might be a bet stretched, but I hope they're useful.

Unit tests

Inspecting each component in isolation. Confirm the building blocks are fit for use in the bigger system (car).

Feature/acceptance tests

Checking all features behave as expected, e.g. does it steer properly, doors and trunk open properly, seats adjust etc. 

Performance tests

Driving around a racetrack, time trials, checking how fast you can get from 0-60?

Stability and reliability tests

Driving the car non-stop until you've done 200,000 miles+ without (too many/too severe) problems

System tests

Driving around the neighborhood, highways, driving with passengers, kids etc. (i.e. using it the way real people would do)

User acceptance testing (UAT)

Getting the public to try it.

Exploratory testing

Driving around town, down the highways, mountain roads, 4WD roads, dirt roads, etc. You hope to find unexpected things.

Smoke testing

Giving each car that comes off the production line a quick drive.

Thursday, May 10, 2012

If you could point someone to just one link...

...as a quick summary of what agile is, which one would it be?

This is a question I just got asked. And I'm honestly not sure. It would depend a bit on who that "someone" was. But here's a very general definition for agile software development (ignoring the application of agile to other disciplines such as marketing) that I think might be OK.

Agile is an approach to developing software that:
  • Focuses on prioritizing the most valuable features first
  • Organizes work into short (typically 2 – 4 week) chunks of time typically known as iterations
  • Builds and tests one or more complete new features inside each iteration
  • Shows and discusses what they accomplished with the customer at the end of each iteration allowing regular feedback and guiding of the effort
  • Acknowledges that it is nearly impossible to figure out completely up front what people want from a software system, and that their opinions and business needs will change as they start to see it built. 
  • Relies on cross functional teams (developers, testers, whatever else is required)working as a self-organizing group to accomplish their goals. A good example of a self-organizing team outside software is a sports team, e.g. think how soccer players collaborate with the end objective of scoring goal(s).
  • Employs a number of technical disciplines to allow incremental development, with a particular emphasis on the importance of regularly integrating and testing code with as high a degree of automation as is sensible
I'm sure there are much better quick'n'simple overviews along the lines of "What is agile?", but it's surprising what you get if you google that. Based on the couple top links I visited it wasn't ideal as an explanation for a  general audience.

Tuesday, April 24, 2012

Cloudy With Metrics

I’m working on a team that is responsible for a Windows desktop application that allows the display and manipulation of medical images. Medical images are just what you would imagine: x-rays, ultrasound, MRIs, computed tomography etc.

These images are large – hundreds of megabytes (more for ultrasound video) – and therein lies one of our big challenges: high performance for users everywhere. And many of our users are sat in Southern India or Europe, while our images are in a datacenter on the East Coast of America.

Now while we do have the luxury of nice big pipes connecting these locations, they inevitably suffer from fairly high latency (~300ms for India) due to the physical distance between them. That level of latency and TCP do not mix well.

This leads to a very disappointing rate of throughput even when ample bandwidth is available. Improving this situation can be boiled down to the application of just three techniques:
  1. Move the data faster
  2. Move less data
  3. Move the data before you need it

·        For us, the option of moving the data before it was needed was already off the table. Pre-caching image data is an approach we had been using for years with a legacy product. Although it worked respectably well, it was far from transparent to end users. With some regularity, data that people expected to reach remote users had not done so in a timely fashion, and much gnashing of teeth and submitting of helpdesk tickets ensued. Caching was a four letter word, so to speak.

So, we focused on moving the data faster, and moving less of it to boot. Exactly how we did that is not really the focus of this post (but so as not to leave you hanging, scaling of images and UDP were key).

What this post is really about, is how we made use of Amazon’scloud-based SimpleDB as a super-simple means to capture in-the-field metrics on the performance and usage of our application.

In our earliest pilot phase we displayed time to load and throughput in the application’s status bar, and users could obviously relay that information to us (yeah, that was as unsatisfactory as it sounds). And of course it was equally possible to log things to a user’s machine, e.g. with Log4Net or using the Windows Event Log. But that approach makes access to, and consolidation of, the captured data a bit more involved than we wanted it to be. So although it’s not that we couldn’t capture this data without a cloud based solution, it’s clear that some single, centralized capture of this data was an appealing option.

One way we might have got a centralized means to capture this data would be to have built ourselves a small database, fronted it with a set of web services and used that. But the effort involved in getting that done and deployed is, as with many organizations, non-trivial and we were looking to slip this functionality in ASAP.

By comparison, Amazon’s SimpleDB is exactly that – simple.

It’s “schema-less”, so there’s no DDL to create tables and so forth. Instead, data is stored in domains (akin to tables; you do need to create these ahead of time) made up of items (just like a row) as a set of attributes (a bit like a column, but better thought of as name-value pairs).

But enough English, here’s some C# to show just how easy it can be to stash some data into SimpleDB. It can in fact be even easier than this, since here I’m using the asynchronous approach to putting data in.

       private void LogMetricToCloud(IMetric metric)
        {
            string domain = metric.GetType().FullName;
            CreateDomainIfNeeded(domain);

            try
            {              
                PutAttributesRequest request = new PutAttributesRequest()
                    .WithDomainName(domain)
                    .WithItemName(Guid.NewGuid().ToString());

                foreach(var attribute in metric.Attributes())
                {
                    request.WithAttribute(new ReplaceableAttribute()
                        .WithName(attribute.Name)
                        .WithValue(attribute.Value));
                }

                string state = metric.GetType().FullName + ":" + metric.CSVData();
                IAsyncResult asyncResult = _db.BeginPutAttributes(request, new
AsyncCallback(SimpleDBCallBack), state);
            }
            catch (Exception e)
            {
                TryLogError(e.Message);
            }
        }

        private void SimpleDBCallBack(IAsyncResult result)
        {
            string state = result.AsyncState as string;
            try
            {
                // If there was an error during the put attributes operation it will be
thrown as part of the EndPutAttributes method.
                _db.EndPutAttributes(result);
            }
            catch (Exception e)
            {
                TryLogError(string.Format("Exception: {0}\nFailed to log: {1}", e.Message,
state));
            }
        }


I think that what the code does and how is fairly self-evident, but that could just be because I’ve been working with it. Just to make sure it’s clear, here’s a breakdown:
  • I have a number of different classes of metric, all conforming to the IMetric interface
  • I have a domain (think: table) for each different kind of metric
  • As a concrete example, one of my metrics measures feature utilization; features are things like flipping and masking images. For feature utilization I record who, where, when, what project and what feature they used.
  • The  LogMetricToCloud method simply iterates through all the attributes of a metric and creates a request to persist this in my SimpleDB instance.
  • The request is made asynchronously, with the creatively named  SimpleDBCallBack method being invoked upon completion.  The call here to EndPutAttributes will let us know if any problems occurred with the original request

All of this seems to be working pretty well, and I hope that in the near future we can start to make even more use of SimpleDB. Next on the list for me is implementing some “feature toggle” type functionality for our application. With that we can entertain some of the ideas described in this great article by Jez Humble about core practices for continuous delivery.



Tuesday, June 7, 2011

DICOM non-technical introduction: mind map

Earlier today I went through the non-technical introduction to DICOM on the RSNA (Radiological Society of North America) website written by Steven C Horii, MD. As I was doing so I compiled a mind map using the excellent SimpleMind tool to help jog my memory on this stuff in the future.

Click the image below for the full size PNG or it's also available here as a PDF.


Monday, June 6, 2011

Understanding progress: on points, velocity and when to add new stories

Building software takes time. Usually enough time that people are interested in monitoring progress and understanding when it will be done. Agile teams often use User Stories as a unit of work. Typically these are estimated in points enabling a team to record their velocity, that is, the number of points completed per sprint.

With this data, teams have a simple means to show progress. Interested parties can follow along quite easily, seeing how each sprint eats away at the features in the backlog and how many points are left to complete the product. Using the team’s average velocity will give a nice indication of how many more sprints are required to complete the features in the backlog: points remaining ÷ average velocity = sprints remaining.

There is one wrinkle to this otherwise simple scheme. As anyone who has developed software can tell you, the devil is in the detail. Something that initially looks simple can end up being more involved than originally estimated. Case in point: my team recently had a story like the following for a desktop client image processing product they are building:
 
     As an imaging assistant
     I want to be able to open JPG images
     So that I can process them

The story seemed to be straightforward and was implemented easily and quickly. All was well until we tried some particularly large JPG files. At this point things blew up with out of memory errors and the like.

So was handling large JPGs a new feature and thus a new story? Or was it obviously part and parcel of the original? In other words, the question is how do you deal with this and still report easily understandable progress?

The short answer is it doesn’t really matter. You can add new stories to the product to cover the new work that you discover. Or you can just do the work that was implied but not necessarily obvious from the original story. The formula for figuring out how many sprints remain still works either way. Taking the former approach your velocity is likely to remain fairly stable. Taking the latter approach you may see it bounce around a little bit more, or see it dip from historic levels if the team had a velocity established through more predictable types of work (e.g. maintenance on a well known product.)

The longer answer is that each approach has its own pros and cons. Depending on your situation one may be better than the other.

Adding a story, pros:
  • Stakeholders can see the amount of work we original imagined was involved has grown and have more chance to comment on the necessity of these items – perhaps the complexities and edge cases discovered aren’t that valuable.
  • It probably shouldn't matter, but velocity remains stable and nobody feels the need to cross-examine the team and ask "Why has your velocity dropped?" Done this way velocity may even serve as a crude indicator of team performance and improvements can be seen in higher velocity.
Adding a story, cons:
  • It probably shouldn't matter, but there's likely a class of stakeholder that will question why all this "extra" work is emerging and ask how come we failed to identify it in the first place.
  • If the team has the pleasure of using a computerized agile lifecycle management tool then each additional story is another thing to enter, estimate, prioritize, track and update status on etc.
  • The number of points to complete the originally envisaged release keeps growing making some people anxious: "We don't know how much work is left to do, how can you ever hope to predict when it will be done?"
Not adding a story, pros:
  • Nothing extra to track (though we probably need to clarify the acceptance criteria of the original story)
  • Number of points to complete original feature set for the release remains the same (unless we discover a need for genuinely new features)
Not adding a story, cons:
  • The potential roasting of the team: "Why is your velocity erratic/dropping?"
  • The team might miss an opportunity to push low value work down the backlog.
Personally I'm most strongly drawn to the idea of not adding in extra stories. I like the simplicity and minimalism of this. The more items there are in the backlog the harder it is to grok the thing as a whole and more busywork goes into managing it all. I don’t think the “pros” of adding in extra stories are powerful enough to make it the preferred approach. And although there is the potential “con” of the team getting questioned about why their velocity is erratic or dropping, I believe this can be explained quickly in simple terms. I also think it’s a lot more straightforward, even comforting, for stakeholders to see that the size of the backlog remains fairly stable unless things they too understand as new work (features) get added.