Wednesday, December 10, 2014

I was on call for a week...and it didn't entirely suck!

Yup, it didn't suck. In fact it was actually pretty good -- I'd go so far as to call it kind of exciting and certainly educational. If you've been (or imagined being) on 24/7 call for a week for a high traffic internet service and think I sound insane, allow me to explain...

Earlier this year I started a new job as a software engineer at Sovrn in Boulder, CO.
“sovrn is an advocate of and partner to almost 20,000 publishers across the independent web, representing more than a million sites, who use our tools, services and analytics to grow their audience, engage their readers, and monetize their site.”
In plainer English this means we offer technology so websites can display ads and earn money. Although I personally think it would be great if more content-based websites could sustain themselves through contributions from their visitors like Wikipedia and Brainpickings do, the reality is that revenue from advertising is a more pragmatic revenue model for many website operators. In fact advertising revenue helps power much of the internet you and I use every day.

We at sovrn work at serious scale: according to Quantcast, in October of 2014 sovrn ranked as the 4th largest ad network in the world and the 3rd largest in the US.

This means we move billions of records of data all day every day from multiple datacenters to a central collection point for processing and analysis. This constant flow of data has to cope with the inevitable network and hardware issues that arise and ultimately transform data into warehouse and near-realtime reporting data stores.

Interruptions to any part of the system can cause a variety of issues, from delayed data through capacity challenges and loss of revenue for our customers. One facet of maximizing uptime and dealing with service interruptions is a sophisticated monitoring and alerting system that functions continually, necessitating on call engineers with software development, data management and enterprise IT skill sets.

My first exposure to being on call recently ended and I really did enjoy it. Although serving adverts might sound straightforward, there's a fascinating degree of sophistication involved, and when doing it at scale the problems only get more interesting.

Up until my on call week my understanding of the "big picture" of our operations was limited, having focused primarily on the needs of the team I'm part of. One of the great things about my on call experience was how it gave me a much greater appreciation for how everything fit together, and some exposure to the operational aspects of the data processing pipeline and big data toolset we employ at sovrn. (For the curious we're using Kafka, Mirrormaker, Zookeeper, Camus, Storm, Cassandra, Hadoop and more.)

Besides getting to see all of this stuff hum along in production, there's a definite air of excitement to dealing with an incident. We use a small set of tools to manage our on call duties including Icinga (for system monitoring), VictorOps (for managing on schedules and messaging on-call engineers), HipChat (we use a dedicated channel for production issues which helps keep all interested parties informed and allows multiple participants work an incident collaboratively) as well as a wiki for knowledge-base articles.

I've worked in jobs before where the software engineers didn't get anywhere near production -- primarily due to regulatory considerations necessitating a strong separation between development and operations. Although those separations may help address fraud and other similar concerns, they inhibit other very positive things besides the excitement and "big picture" comprehension I've already mentioned.

First, there's a definite camaraderie that emerges from trying to figure out what's going on when you're getting alert after alert one evening on a weekend and have to ask colleagues to help. This necessitates a level of communication and cooperation across teams that might not otherwise happen all that often and is definitely a very positive thing.

Secondly, seeing how your code responds in production is a phenomenal feedback loop for software engineers. You have a lot more skin in the game when you and your colleagues will be receiving alerts for failing systems. Suddenly great logging and debugging characteristics are first class concerns. Nothing will focus the mind on the need for writing high quality, easy to support code quite like this.

Hopefully now that explains my viewpoint and you no longer think I'm completely mad...

Wednesday, June 11, 2014

Oh CenturyLink, thou art a source of much mirth...

Had some trouble getting a connection over VPN for my new job. This resulted in much to-ing and fro-ing with their elite "Internet Escalation Team" (like the Apple store's Genius Bar with less genius, if you see what I mean...)

This exchange yielded several hilarious gems, not least of which:

Honestly Jon, you are FAR more the expert than any of us here, we have no training except to help customers find the application forwarding section in the modem under the advanced setup.

We don't have folks “in the know” about VPN stuff.Compare it to the car dealer not really having a way to trouble shoot an “aftermarket” NOS kit someone installed on a car.Like how a realtor can't troubleshoot the new outdoor pool construction being done on a house they sold , that didn't have one at the time of sale. Does your helpdesk have a specific item to address? I don't think we can help you with your customization of the service. VPN is beyond our knowledge.

Thank goodness I had some idea what I was doing and was able to figure out that the problem was the modem I had. After proving that my VPN connection could be established from a neighbor's house who also used CenturyLink as their ISP I managed to get them to send me a replacement and all was well.

Friday, May 30, 2014

Note to self: bash and git setup

Some notes on my bash and git setup

As much as I like IntelliJ and its git integration, it makes me feel like I'm just dealing with the tip of the git-iceberg. So I like to try and use the command line as much as possible as a way to increase my understanding of git and all the mad-clever things it can do. That's led me to want command line completion, including for git-flow, (and if you're not using git-flow, why? ;-) visibility of what branch I'm on in my prompt and a few aliases and other configuration items to make things easier. It's actually not all that much but it's made a big difference to me.

Here's what the salient parts of my ~/.bash_profile and ~/.gitconfig look like:

The git ls alias is a succinct answer to, "What's been committed lately?" at a high level. It simply lists the hash, how long ago it was committed, by whom and the commit message. I find it useful with --no-merges (which unsurprisingly omits merge commits from the output) and --author=<author> to limit things to just my work. It helps you answer the questions, "When did I commit feature X?" and "What did do lately?" The git ll variation gives a little more detail by listing the files changed along with an indication of the type and size of change. Useful when the commit message alone doesn't help me answer "What did I do lately?" ;-)

Spending more time in a console made me want to be more adept at editing commands here too; I had for years made do with Home and End and the arrow keys. I even increased my key repeat rate to speed up moving back to correct errors with the left-arrow key. Now I've added the couple things I really needed to improve things:
  • Enabling vi mode; I knew about this, but hadn't found myself on the command line quite enough to care about it until recently. (Vi is another thing like to "make" myself use just because proficiency in it just seems to really pay dividends).
  • Figuring out how to navigate back and forward one word at a time while editing a command -- crucial for crazy long commands
All that's required is a few lines in your ~/.inputrc file:

Monday, May 19, 2014

GREP_OPTIONS='--color=always' is evil

Changing jobs and finding myself doing a lot more in the shell has had me tweaking my working environment a fair bit lately. Part of this involved wanting to highlight matches when using grep since I found myself grepping more in the last few weeks than the entire year prior to that.

The most pedestrian way of doing this is to invoke grep with the --color option, which according to man grep may be set to "never", "always" or "auto". What it fails to point out is the important difference between "always" and "auto" (more of which later) but having experimented briefly with "always" and seen what I expected I proceeded with the assumption that it was a perfectly reasonable way to do things.

The prospect of typing --color=always all the time was not appealing however, and so after a quick search I hit upon the fact that one can set the environment variable GREP_OPTIONS to contain options to be passed when grep is invoked. My .bash_profile was swiftly modified to include export GREP_OPTIONS='--color=always'. And everything was good.

Except not really.

Later that week I was experimenting with some things from an old BASH book I had hanging around the house for years but never really dug into. One of said things was a function to create an ls command that filtered by date. Call it lsd (the book did) and it'd work thus:

> lsd 'may 18'
May 11 10:01 foo-report
May 11 10:42 cats.db

Except the weird part is that it didn't quite work right for me. The function was defined like this:

ls -l | grep -i "^.\{38\}$date" | cut -c39-

For the uninitiated, that cut command is saying to cut the output from column 39 on through to the end of the line, the idea being to transform the regular output from ls -l which looks like this:

-rw-r--r--  1 jarcher  530219169   76 May 11 10:01 foo-report
-rwxr-xr-x  1 jarcher  530219169  127 May 11 10:42 cats.db

to the shortened date/filename form. Column 39 being where the date part starts. What was weird though was that my output didn't seem to be cutting in the right place. I found I had to cut further to the right, which at the time puzzled me, but not enough to spend enough cycles thinking about why this might be.

The following week I was tooling around with git flow since it seems like feature branches are the order of the day at the new gig (and doing it all through IntelliJ makes me feel like a charlatan). It seems pretty damn good too, although there was a rather obnoxious amount of typing involved such as git flow feature start foo-feature just to get working on a new feature. I suspected some command line completion magick was available and indeed it is.

Here's when things got weird, although I had no clue this was related to --color=always at this point. It seemed as though, suddenly, my git command completion was hosed, big time. One or two things worked, but much did not. Typing git following by some double-tab action to show possible completions revealed the reason why:

jarcher-mbp15r:bash $ git
Display all 106 possibilities? (y or n)
^[[01;31m^[[K                  flow
a^[[m^[[Kdd                    g^[[m^[[Kc
a^[[m^[[Km                     g^[[m^[[Ket-tar-commit-id
a^[[m^[[Knnotate               g^[[m^[[Krep
a^[[m^[[Kpply                  g^[[m^[[Kui
a^[[m^[[Krchimport             h^[[m^[[Kash-object

Yup, almost all the subcommands were funky like this. No wonder I couldn't choose between checkout and cherry-pick. Neither of them began with ch anymore...

Believing git flow completion to be the culprit, I googled along those lines. Luckily I turned up one (and only one!) page of interest, where somebody else reported the same symptoms, and a day later that they'd determined the following line in their bash configuration was the culprit: alias egrep='egrep --color=always'

Alarm bells rang. I remembered that I'd recently set things up so my grep command would be invoked with --color=always; I even had this vague recollection that I'd read always meant that when the output was piped to another command the color-inducing control characters would be passed along too. By contrast, auto would only include those color control characters when the results of grepping were destined for immediate output on screen.

I unset my GREP_OPTIONS variable et voila, suddenly it all worked as it should. A quick look at the git completion script confirmed that grep is used in there, explaining to my satisfaction why --color=always was screwing things up.

Since this was such an evil little trap I thought it was worth blogging about. Maybe it'll save somebody else from suffering this confusing problem.