Not long ago, my mother was looking into buying an iPad. Despite the fact that I don’t own a tablet of any kind, she had some questions to ask me about it. This wasn’t all that surprising; I end up getting tech questions about all sorts of things from family and friends who assume I’ve done some amount of research on All Things Tech (and, generally, this assumption is correct). What was at least moderately interesting to me is that all of her questions were centered around the iPad itself: “How much storage should I get?”, “3G or no 3G?”, “Should I get this or that accessory with it?” At no point did she ask me “Would I be better off buying a different kind of tablet?” or even “Should I be buying this?”
…so I took the liberty of presenting these questions to her as the ones she probably should be asking.
For the former – “Should I be looking at the Xoom, TouchPad, or any one of the other available on the market today?” – my answer (for my mother) was “No.” She shrewdly flipped this question around on me: “What would you buy?” My laughing reply: “Not an iPad.” Well, she was kind of taken aback at this, so I trotted out the old apples and oranges metaphor. I don’t have a strong use case for buying a tablet other than “They’re kinda cool” so I can’t really say what features I would be looking for in one, but I know what I don’t want: overpriced proprietary hardware (right down to the charging connector), a proprietary OS and software that I can’t fiddle with, and an app store over which Apple has complete and total control. For my mother, the use case is similar – i.e., “They’re kinda cool” – but feature-wise I have a hunch that none of the above matter all that much to her. She just wants it to look sleek, work well, and integrate with all of the accessories she was also planning on getting. Well, it’s hard to beat the iPad on that front.
My mother doesn’t want a utilitarian piece of hardware to play around with, one that’s easy to get under the hood and work on. She doesn’t want a Chevy or a Ford. She wants a Mercedes. And in many ways, that’s what the iPad is: a luxury product for those that have the disposable income to be able to afford it. In fact, I would argue that all of Apple’s products are luxury products. Why else would a MacBook cost $1,200 when I can get nearly identical – even superior! – hardware in a PC for under $500? Because you’re paying a premium for that Apple symbol on the lid.
This brings us to the second question: Should she buy it? Well, that question is kind of moot at this point; she already did. …and I don’t really see anything wrong with that. She had a picture in her mind of what she wanted, and that’s what she got. I wouldn’t go so far as to call her a fanboy – I’m pretty sure this is the only Apple product she owns – but she definitely, at least this one time, caught “Apple Fever” and I don’t think she would’ve been satisfied with anything else. Had she bought the Chevy, all she would’ve thought about while driving it is “How much nicer would this have been if I were driving that Mercedes I was looking at?”
What’s my point? Well, I don’t know that I really had one to begin with, but if I did it’s probably something like this: If you’re going to buy something like this, attempt to have a well-defined use case. If you can’t do that, at least go into it with your eyes open as to what your motivations are for deciding on this technology or that. If you can’t do that…well…fuck it, it’s your money I guess; spend it however you like…but don’t come to me when your Mercedes breaks down, I don’t have the know-how or the equipment to work on it and you’re just gonna have to take it to the dealership.
Well, I got the “itch” once again to get the ol’ website up-and-running: cliftonsnyder.net. I’ll be using it mostly just as a place to tinker around and post code that someone (hopefully) might find useful. Check it out!
One of the things that I found really interesting at Velocity 2010 was the prevalence of the use of continuous deployment. I know I’ve mentioned the Facebook operations talk previously, but it’s worth mentioning again as a good example of this. In it, Tom Cook – a Facebook engineer (sorry, couldn’t find a link for Tom) – talks about about deploying code at least daily, with feature releases once a week. This flies in the face of the “deploy every 2-3 months” model that I’m familiar with. It also requires significantly more developer involvement, with the developer doing the actual deploy and sticking around to support it rather than throwing it over the wall to ops to put in place once the QA cycle is complete.
So, how is this accomplished? Well, without getting into the technical details of the tools they use (watch the video! really!), it essentially demonstrates a completely different culture than a “quarterly installs” sort of model. Obviously, this sort of thing can’t work in a “get every level of management to authorize the install in triplicate” shop. It requires a DevOps-y sort of environment where there is a tight integration between the folks who know the code and the folks who understand the systems its running on. It requires what I heard referred to at the conference as a strong “immune system” – basically, a set of tools (change management, anyone?) and a communication structure that affords a high degree of confidence that a particular install is (a) unlikely to break anything, and (b) can be rolled back quickly with minimal impact if it goes haywire.
I was a bit skeptical of this sort of thing at first, but John Allspaw said something in his Ops Meta-Metrics session that really resonated with me. He said (paraphrasing): “As an ‘ops guy’, I prefer smaller changes more often to big changes less often. Taken to it’s extreme, consider this: what if the change is only 5 lines of code? Does that feel safer? …because it should.” A light turned on inside my head when I heard that. It’s not about deploying fast “because we can”; it’s about deploying fast because it’s the safest thing to do.
Another interesting thing about this is the sorts of deployment models that can be used to mitigate impact if a 5-line code change does happen to break something. One of the most prevalent: not deploying code changes to all at once. Why not deploy it on a handful of servers – or on every server, but with the feature/change/bug-fix only “turned on” for a handful of users? In essence, why not use a relatively small portion of your userbase as unwitting beta testers for your change? Paul Hammond gave some interesting examples of how to handle this sort of deployment inside the code itself in his Always Ship Trunk session.
Whether Instrumentation & Metrics was the focus of the talk or just a portion of what was covered by the speaker, two major rules of thumb seemed to present themselves:
“Instrument everything.” Collect as much data about as many things as you can. If it reports, collect it and store it. If it doesn’t report, make it report. And store it. Make sure you keep around as much historical data as you feasibly can. “But Cliff,” I hear you asking, “won’t you just end up with a whole pile of data that doesn’t really mean anything and just takes up disk space.” Well, read on, because that brings me to the second rule of thumb:
“Data ain’t information!” (Direct quote from a talk on modeling and metrics). So…what does it mean? Well, a couple of things. One of the speakers who gave the presentation linked above would have you believe that data + a model is information. Modeling is critical in that it may allow you to extrapolate information from data points that would be otherwise meaningless. Note that I said may; as this presenter noted, “Data is from the devil, models are from God,” in a nod to the fact that real-world data rarely adheres to the nice, uniform curve generated by the model.
The other piece to this is an emphasis on the importance of visualization – i.e., understanding key metrics and how to display them such that interesting/important trends are elucidated. Some examples of this were given in the Ops Meta-Metrics talk, in which John Allspaw demonstrated that code installs and service downtime don’t always have a 1-to-1 correlation…but you will never know that if you don’t track both of those metrics and understand how important it is to compare them over time.
As a side note on metric monitoring, one of the really cool tools people were talking about at the conference was cucumber-nagios, a monitoring tool that allows you to specify configs in natural language. Slick!
Change Management was another huge theme at the conference. I actually heard more than one speaker say something to the effect of “If you don’t have change management in place in some form, you should leave the conference right now, put it in place, and then come back.” Developers (almost) always have some manner of change management in place – source control, peer review, approval processes, release schedules, etc. – in order to…errm…manage changes to the codebase on which they are working. …but what about the systems – the machines in which that code is running? …and why is that important? Why isn’t it okay to have a sysadmin fire up emacs, slam in a config change, and send out an email saying “all’s well”?
Well…let me give a non-tech example here. Let’s say you took your car to the mechanic. He takes a look-see at it, sort of twiddles about a bit, and hits you with a bill for a few hundred bucks. (I realize that this is almost exactly how most folks’ visits to the mechanic go, but bear with me here). Now suppose he can’t tell you exactly what he did or document it in any way, but it’s running so “we’re good, right?” Oh…and suppose he tells you that your car may or may not start the next time you turn the key; “Just bring ‘er on back and we’ll have another look!” How comfortable would you be with the arrangement?
Okay, so the car analogy doesn’t carry over so well (and is rather unfair to sysadmins, I might add)…but I can tell you that I’ve seen this sort of thing happen in the datacenter many a time. As a Linux admin, I was terrified of rebooting machines, largely due to the inverse relationship between the uptime of a system and the likelihood of it actually coming back up correctly after a reboot. Having a change management tool like Chef (very cool; demoed at the conference), cfengine, puppet – pick your poison – backed by version control (of course) is a means of raising the level of confidence about changes being made to systems.
Oh, and how about auditing? Suppose your method for determining what’s going on with your systems is to walk around to all the sysadmins who might have touched Machine 7 of 3,956 and ask politely “Have you changed anything on this sum’bitch in the past decade?” Repeatability? “Can you build Machine 3,957 and make it look just like Machine 7?” CYA during a postmortem witch hunt? “Prove to me that you didn’t have a hand in bringing down our production database this afternoon.” A good change management system goes a step beyond the typical “get all of management to sign off on the change before putting it in” approach. Just a few of the reasons for implementing change management.
DevOps – summarised reasonably concisely here – can be briefly summed up as “tighter integration between devs and ops”. For those of you not in a technical field, pay special attention to the “siloisation” section; I’d wager that anyone who works for a large company in any has seen this sort of “us vs. them” mentality between departments/divisions. The idea behind DevOps is to foster more of a “we’re all on the same team” sort of mindset.
I’d go so far as to say the DevOps was the theme at the conference. The entire three days were essentially a pep rally designed to promote making things “fast by default” not only by using whiz-bang technologies and tweaking your code, but also by culture change within and among IT organizations. (Note: DevOps Day – which I was unable to attend – took place on the Friday after Velocity 2010 in Mountain View and was mentioned several times by the presenters.)
Okay, so I’d meant to do a day-by-day breakdown of Velocity 2010, but [insert lame excuse here], so…I didn’t. However, now that I’ve had a week or so to “let it simmer”, I’d like to sum up a few of the major themes and undercurrents from the conference. Note that the conference was roughly divided into three flavors of discussion – “Ops”, “Web Performance”, and “Culture”. Of course, being an “ops guy” I focused on the ops-related sessions and tried to fit in as much of the “culture” as I could. (A Day in the Life of Facebook Operations was one of the best talks given, imo, and really touches on a lot of the themes I’m about to talk about below. If you watch no other video from Velocity 2010, watch this one.)
I was going to sum all of it up in one post, but I decided to break it out by theme rather than taking the “wall of text” approach. Hopefully, I’ll get all of it posted in the next couple of days here.
- Change Management
- Instrumentation & Metrics
- Continuous Deployment
- K-V Stores, memcached, etc.