Repeated Capturing and Parsing in Perl

When I checked my email after arriving at the office today, I found a query that had been sent to our internal Perl mail list. The questioner was trying to match a pattern repeatedly, capturing all of the results in an array. But, it wasn’t doing quite what he expected. The message, with minor edits, went a little something like the following.

I’m trying to extract key/value pairs from a file with the following contents:

- name = gcc_xo_src_clk, type = rcg
+ name = cxo_clk, type = xo, fgroup = xo, wt = 10, bloo = blah
? type = hm_mnd_rcg, name = bo : type = rcg_mn
+ name = pxo_clk

I was hoping to do something like this:

@list = $_ =~ m{ ^[-+?] \s* (\S+) \s* = \s* (\S+) \s* (?:, \s* (\S+) \s* = \s* (\S+) \s*)* }xms;

Thinking @list would be assigned the alternating key/value pairs. But the above doesn’t extract anything sane. Adding the /gc modifiers doesn’t make any difference.

If I do the following, it extracts the first two key/value pairs correctly (if the line has more than one pair).

@list = $_ =~ m{
    ^[-+?] \s* (\S+) \s* = \s* (\S+) \s*
    , \s* (\S+) \s* = \s* (\S+) \s*
}xms;

If I keep repeating the pattern in the second line, it keeps matching more key/value pairs.

I would expect using (?: )* should mean zero or more instances of match inside the parentheses, but obviously it’s not working. What am I doing wrong?

When I’m presented with a problem like this, that is some kind of structured data, I immediately think of writing a parser. I’ll get back to that in a bit, but I wanted to address the confusion about capturing in the pattern. And, in fact, that’s how the discussion on the mail list proceeded.

Repeated Capturing

First, let’s simplify the example to demonstrate why our seeker of wisdom isn’t getting back the list of items he expected.

my @matches = 'a b c d e' =~ /^(a) \s* (?: ([bcde]) \s* )*/xms;

say "(@matches)";   # prints "(a e)"

Capturing parentheses in Perl are treated somewhat like registers. Most Perl programmers are familiar with the $n variables, which hold the values of a successful pattern match. For example $1 holds the value matched by the first set of parentheses, $2 holds the value of the second set, and so on.

When a pattern is matched in list context, as above, it’s effectively the same as writing,

'a b c d e' =~ /^(a) \s* (?: ([bcde]) \s* )*/xms;

my @matches = ( $1, $2 );

These pattern match variables are scalars and, as such, will only hold a single value. That value is whatever the capturing parentheses matched last. So, in our simplified example, $1 matches a, which is obvious enough. As the pattern repeats, $2 would be set to b, then c, and so on until the final match of e.

That explains why the pattern match wasn’t returning the expected list. What can be done about it?

Capturing Along the Way

If we break down the sample data, we see that it generalizes to,

prefix key = value[, ...] [: key = value[, ...]]

The first approach that came to mind is to split the data into multiple lines. Each line can then have its initial prefix removed and saved, then parsed for its key/value pairs. That’s starting to look a lot like parsing, which I promised to get to later. For the purposes of this discussion, I wanted to be able to accomplish the task with a single regular expression.

To capture all of the values we want, we need to remove the repeating set of non-capturing parentheses. However, we still need to repeat the match, ideally returning all of the captured values in one statement. We can do that with the /g and /c regular expression modifiers.

my @list = $string =~ m{ ([-+?,:]) \s* (\w+) \s* = \s* (\w+) \s* }xmsgc;

I’ve done two things here. First, I replaced the \S character classes, used to match the key and value, with \w. The + pattern in a Perl regular expression is greedy, so the former character class was also matching the comma used to separate key/value pairs in the data. This left the literal comma with nothing to match, so was one source of confusion.

Second, I noted that the initial prefix, while syntactically important, could be viewed in the same way as the comma and colon separators. I combined all of these separators and added a capture around them so we can later make sense of the parsed data.

When matched against the data, the pattern results in a list like,

("-", "name", "gcc_xo_src_clk", ",", "type", "rcg", "+", "name", "cxo_clk", ...)

Now we can process the data using a simple state machine.

my $state = undef;

while ( my $token = shift @list ) {
    if ( $token eq '-' ) { $state = 'dash'; next; }
    # ...
    if ( $token eq ',' ) { next; }

    my $key   = shift @list;
    my $value = shift @list;

    if ( $state eq 'dash' ) {
        # ...
    }
}

Even though we did all of the data extraction using a single pattern match, it looks remarkably like … a parser! The pattern is simply the tokenizer used to feed tokens into our state machine, the parser.

Parsing

I stated at the outset that I looked at this as a parsing problem, so the solution I would use is most likely a parser. For simple, one-off scripts, I’d use a technique similar to the one I described in the previous section. However, for more complex data or a more complex script, I’d turn to a real parser.

In fact, one of my contributions to the thread that led me to compose this post included an example of using the $^R and $^N variables in embedded code blocks to demonstrate a rudimentary parser that allowed a simulated form of capturing within a repeated non-capturing group. I won’t go into any detail beyond showing what I wrote. As this was from an early point in the thread, the prefix is ignored in this example.

my @list = ();

my $kv = qr{
    (\w+) (?{ $^N; })           # capture the key
    \s* = \s* (\w+)
    (?{ $^R = [ $^R, $^N ]; })  # capture the value, saving the key
    (?{ push @list, @{ $^R } }) # push the key/value onto @list
}xms;

$data =~ m{ (?: ^[-+?] \s* $kv \s* (?:[,:] \s* $kv \s* )* )* }xms;

Fortunately for us, there are parsing modules on the CPAN.

Prior to Perl 5.10, Damian Conway had written Parse::RecDescent, but with the introduction of grammar-like facilities like named captures and named backreferences, Damian improved upon his original work and presented the Perl community with Regexp::Grammars.

What does a parser for this data built with Regexp::Grammars look like?

my $parser = qr{
    <[Line]>+

    <token: Prefix>   <MATCH= ([-+?]) >
    <token: Key>      <MATCH= (\w+) >
    <token: Value>    <MATCH= (\w+) >

    <rule: Line>      <Prefix> <Pairs> <Options>?
    <rule: Pairs>     <[Pair]>* % ,
    <rule: Pair>      <Key> = <Value>
    <rule: Options>   : <[Option]>* % ,
    <rule: Option>    <Key> = <Value>
}x;

if ( $data =~ $parser ) {
    # Do something with %/
}

This is a trivial example and all the work is left to be done by inspecting the parse tree in %/. However, the module supports embedded code that will be called when a token or rule matches, which can be used to process the data as its parsed.

References

Looking Ahead to 2012

One year ago in this space I attempted to start a new tradition for myself. While I rarely bothered in the past, mostly as an excuse to write a post, I jumped on the bandwagon and composed a set of resolutions for 2011. Now, in the waning hours of 2011, I wanted to take some time to review my resolutions and update them for 2012. Let’s take them from the top.

Spend more time with my daughter.

That should now read daughters, plural. In June we welcomed our second child, Brenna Rose, into the world. I’d like to think I’ve done well on this resolution. I spend every day looking forward to getting home to see my girls.

Read more.

I added this to the list because every year I would read a little less than the year before. I’ve managed to reverse this trend. Sure, I’ve added feeds to Google Reader, but I’ve removed a few as well. I’ve also opted to read actual books in my free time, reducing the amount of television I watch to a mere couple hours per week. I’ve read through two James Herriot books so far, with one remaining. I received a Barnes & Noble Nook for Christmas and purchased my first book for it on Thursday, The Ghosts of Belfast, which I’ve been rapidly devouring.

Write more.

While I frequently sat in front of my computer, pondering topics about which to write, I rarely found the necessary inspiration to put metaphorical pen to paper. Over the last 12 months, I composed a mere 16 posts. Four of those, a quarter of them, were about OSCON, which is far fewer than in years past when I would write an individual post for each session I attended. In addition, there are four drafts I never got around to completing. Looking them over, it appears that at least three of them are mostly done. I just need to give them a quick edit and post. Sadly, one of them is about Kaylee’s summer camp, so it’s a bit stale at this point.

Having read several blogs over the last year, I’ve developed a desire to be a more prolific writer. Not only personal posts, like this one, but anything that comes to mind. I can write about my programming work or even short fiction. I probably won’t delve into the realm of political commentary, but only because I lack any real desire to study the issues in enough depth to do them justice.

Be more Paleo.

More of the same here. I can count on one hand the number of bites of grains I had during the year. While my sweet tooth has been difficult to suppress, I really don’t eat very much sugar. One beer per week, to end Friday on a high note, is all I indulge in anymore. I managed to get my weight down to 158 pounds, which hasn’t budged in a few months. I would like to drop a few more pounds of body fat, so perhaps that is where I’ll focus my efforts for 2012.

Join a CrossFit Box.

I thought about it a few times, but never bothered. I have been going to the gym at work regularly and have been making decent strength gains. I’m mostly pleased with my progress on fitness and even earned my third degree black belt this year, so not accomplishing this particular resolution isn’t bothering me very much.

Get into MovNat.

The closest I got to this was our recent trip to Grape Day Park in Escondido. As a fitness program, it’s still something I want to do.

Actually use Facebook.

I added this one as a joke, but somehow I ended up using it. Sort of. I rarely post anything on Facebook, preferring Twitter and Google Plus, but I do try to keep tabs on what my friends are doing.

Looking Ahead to 2012

Aside from the above, what else should I look forward to doing in 2012?

I’ve recently been putting an emphasis on being more organized and getting things done in a more timely fashion. I’d like to keep this going into the new year, particularly with respect to the things that need doing around the house.

Segueing from organization into preparation, the recent power outage in San Diego made me take a serious look at our disaster preparations. My wife pokes fun at me for treating the entire situation as preparation for the zombie apocalypse, but I find it a fun way to stay motivated. Besides, when zombies show up and start eating people, who will be laughing then?

So that’s my theme for 2012: organization and preparation.

Grape Day Park

Over the weekend my wife and I took the girls to meet up with some friends. Our original plan was, after having lunch, to spend some time at the San Diego Children’s Discovery Museum in Escondido. However, after a couple of changes in that plan, we ended up at Grape Day Park, which happens to be adjacent to the museum.

Jump!

Action shot of me leaping from the grape slide.

The distinguishing features of the park are the slide, designed to look like a bunch of purple grapes, and Vinehenge. The latter feature is pretty awesome. It is a sculpture, by artists Valerie Salatino and Nancy Moran, of giant grape vines, which serves as an intricate climbing structure.

I couldn’t help myself. As soon as I saw the grapes and vines, I was all over them. I climbed from the bottom to the top, walking with all four limbs like a monkey. I leaped from the ground to the high vines, hauling myself up to perch atop them. I may have gotten more out of the vines than the kids did. Although, my daughter dubbed the tangle of vines the Spooky Forest, which from that point on was only to be entered with caution, not climbed upon.

We’ll definitely have to make this park a regular stop.

2011 Boot Camp Challenge

A little over a week ago, on Saturday, 24 September 2011, I participated in the MCRD San Diego 2011 Boot Camp Challenge. This is a short, three mile race that, according to the website, has over 40 obstacles, including hay jumps, tunnel crawls, log hurdles, a six foot wall, trenches, cargo net crawls, and push-up stations. In addition, United States Marine Corps drill instructors are positioned at each station to make sure each obstacle is properly completed.

It’s a fun course, as depicted by the map over on the left. The numbers on the map represent: (1) hay stacks; (2) hay stacks; (3) hay stacks; (4–19) jump over logs, crawl under logs, wall; (20) tunnels; (21) push-ups; (22) wall; (23) bayonet; (24) trenches; (25) tunnels; (26) low crawl; (27) planks; (28) push-ups; (29–43) jump over logs, crawl under logs, wall; (44) hay stacks; (45) hay stacks.

There are a lot of hay stacks on the course, and for good reason. Nothing breaks up your pace and depletes your muscles of glycogen quite like explosively leaping over hay stacks. Then come the obstacles. My group, the individual men, started the race early enough that I never had to stand in line to wait for an obstacle (I saw several lines in the photos of later groups). This is good for time, but exhausting as you sprint from one obstacle to the next only to do something that is very much not running.

If you’re anything like me, you saw number 23 and thought, “Bayonets?! Awesome!” At least, that’s what I thought when I first looked at the map. Yeah, not so much. We were allowed to run past the bayonet targets, and that’s it.

I had done this race once before, in 2004, but it has taken me seven years to finally do it again. That year I raced in a team of three with a couple of my friends. I don’t remember how we placed and the results are nowhere to be found, so I’ll just assume we didn’t do very well. Probably a good assumption, as the race was a week after my honeymoon and I recall being some 40 or so pounds heavier at the time.

I entered the race this year mostly as a test of my fitness. Was this paleo lifestyle thing really working out for me? Doing nothing more than three weeks of eating well, twice weekly body weight workouts, and once weekly sprints, I ran the race. I haven’t run more than a quarter mile since, well, since probably the last time I ran this race.

The results are here. Just in case that page vanishes, as most of the past results seem to have done, I’ve mirrored the page on my website, highlighting my result. I came in 47 out of 91 in my division and 340 out of 1,117 overall. Interestingly, my time of 26:44 would have put me 36 out of 48 in the mens elite division. Although, given the under 20 minute time of the first 10 people in that division, I don’t think I’ll ever race in it.

I’m sure I could have done better with more training, but my time was lower than I expected it to be. On the rare occasion that I do use a treadmill, it tells me that I run a 12 minute mile. So an average mile time of 8:55 for the race surprised me a bit. Back in high school, I could run a six minute mile, so I’ll consider that my new goal.

When I chose to wear shorts for the race, my wife asked me if I was worried about injuring myself on the obstacle course. Of course, I told her I wasn’t. I took the picture over on the right shortly after I got home from the race. I would call that relatively uninjured. Worst by far were my calves, which were sore for days, having run the race in my Vibram FiveFingers KSOs.

Overall, it was an incredibly fun race, and I can’t wait to do it again next year. I already have my calendar marked for Saturday, 6 October 2012. That’s over two months before the world ends, so I’m confident things will go off without a hitch. Next year I’ll try to get my time down to around 20 minutes.

San Diego Goes Dark

At approximately 3:35 this afternoon, I was standing in the hallway outside my office, talking to my boss and a coworker. It’s a very odd feeling when the power to the entire building goes out. Everything goes absolutely silent. I never appreciate how much noise the air conditioning, the computers, and even the vending machines make until it’s gone and the stillness sets in. A few seconds later, having given us enough time to pause and understand what was happening, the backup power kicked in, restoring light to the hallway. Looking at the time, I immediately decided to catch the 3:45 shuttle, which would get me on the 4:06 northbound COASTER home. As I sit here writing, reflecting on the afternoon, I’m grateful that I didn’t hesitate.

I received a text message from my wife as I was leaving the building, informing me that power was out at home, 20 miles away from my office. As I sat on the shuttle, listening to the chatter of the San Diego MTS radio, I learned that power was out across the county. I was relatively confident that the trains would continue to run, as they are self-powered and the railroads have radio procedures they follow when the signals lose power. Still, I was relieved when my train pulled into the Sorrento Valley station right on time. The trip home took much longer than usual, while the train proceeded slowly and waited for clearance over the radio. During the ride, I followed news about the power outage, and kept my dad up to date on my status, on Twitter. Fortunately, Verizon’s cell towers remained online.

Traffic was abysmal around the county by the time I arrived at the Carlsbad Poinsettia station, around 5:00 PM. I was fortunate, in that I only had a rough time until I crossed over the I-5 freeway. Most of my short, 7.5 mile trip between the train station and my house is done on less-traveled roads. Once into San Marcos, the traffic signals were operating on battery power, so the final few lights were even normal for me.

Finally arriving home shortly before 6:00 PM, I unplugged the computers and appliances—to safe-guard against possible surges when the power was restored—and prepared dinner. Fortunately, I intended all along to use the propane grill for dinner, so I didn’t have to alter our plans. We did end up eating our dinner by candle light, which is something we haven’t done in quite a while. After dinner, we finished off the chocolate ice cream, which was rapidly melting in the freezer. We spent the remainder of the evening listening to the news on one of our Eton crank-powered radios, all of which I’d selected as pledge gifts over the years from KPBS.

Our power was restored at 10:25 PM, at which point I plugged everything back in, set the few clocks we still have that don’t set themselves, and verified that the temperature of the refrigerator was okay. Then, as my brain wouldn’t let me go to sleep until I purged its thoughts into print, I sat down to compose this post. What did I do right today, and what lessons have I learned?

Know how you’re getting home. I was very lucky today. The power dropped minutes before the first MTS shuttle this afternoon, and I didn’t hesitate to take it. Further, the trains were still running. I do not have a backup plan for how I would get home otherwise. My parents live near my workplace so, barring a grave emergency, I’d likely wait things out at their house. Or, as I told my wife, chill out on the patio at Karl Strauss, knocking back a few pints.

Have portable radios where you need them. We succeeded here, having three of the aforementioned Eton radios. I had one in my car, but didn’t need it, since Verizon’s data network remained up the entire time. My wife has one in her car and we have one in the house, so she was able to listen to the news about the power outage. Our oldest radio doesn’t hold a charge for very long, so it may be time to replace it.

Have several flashlights in several locations, with batteries. Also, candles. We keep large Maglite flashlights in each of the cars and small tactical flashlights both in the cars and throughout the house. Between those and two boxes of candles from IKEA, we had plenty of light. Recently, we’d started moving to using rechargeable batteries for everything, which work great, when you have power to recharge them. I plan to purchase bulk packs of batteries in various sizes to store in an emergency kit, to be used only in emergencies. Speaking of which…

Have an emergency kit. We don’t really have one, though we didn’t suffer for it this time. Creating a plan and organizing a kit has been on my to-do list for a long time and it’s about time I take care of it.

Keep non-perishable food on hand. We’re somewhat okay on this. We have bottled water and canned food, though I don’t think we have enough for three days. I intend to remedy this on our next trip to Costco. In fact, I’ve been making a mental checklist of food items to stock for while. Canned meats are high on the list, followed by dried fruits, and water. Lots of water.

Know where to get news. I was fortunate that Verizon’s data service remained online. Between listening to KOGO and reading Twitter, I had a pretty good idea of what was going on. The Twitter accounts I found the most useful today were, @SDGE, @SanDiegoCounty, @ReadySanDiego, @SDSheriff, @GoNCTD, @KPBSnews, and @nctimes.

Have your bug out bag (BOB) packed and your cars fueled. While we don’t have bug out bags, we do keep the cars fueled. I decided long ago to never let the fuel tanks drop below half, because you never know when you’ll need to drive somewhere without the opportunity to refill. Obviously, these precautions were unnecessary today, but non-emergencies like a widespread power outage give us the opportunity to think about what we need and test our preparedness without great risk. What if this had been a wildfire or an earthquake? Would we have been ready to evacuate at a moment’s notice? I’m sad to say, probably not. That leads to my most important lesson…

Know what to do. I need to make sure my family and I are on the same page if a disaster occurs, even if we are unable to communicate. Under what circumstances do we evacuate? Where do we go? What if our primary choice is unreachable? What if, as is likely the case in San Diego, the roads are jammed? As important as knowing when to go is knowing when to stay put and for how long.

There are lot of considerations that go into designing an emergency plan and I know I didn’t go into all of them here, nor did I intend this to be a comprehensive list. These were just the main things I’ve been thinking generally about lately and specifically about today. When I do make our emergency preparation, I’ll likely follow up with another post. If anything, it will serve as documentation for my immediate and extended family. Now that I’ve put my thoughts into print, maybe my brain will let me sleep.

Farewell Rubio’s

Once one of my favorite restaurants, you and I simply don’t get along anymore.

I had food from Rubio’s for lunch today, brought in by the company hosting my colleagues and me for some technical training.  Not long after lunch, my asthma began to act up.  Since going Paleo several months ago, my daily inhaler and I have practically parted ways.  However, this evening I felt that I needed it.

A few weeks ago, I had dinner at Islands, where I indulged in some corn chips and salsa.  The following day, I needed my inhaler.  I, perhaps wrongly, concluded that grains, at least corn, were a trigger and have been much happier to avoid them ever since.

It should go without saying that I passed on the tortillas served alongside lunch today.  Nevertheless, not long afterwards I felt that all too familiar tightness in my chest, resulting in the use of my inhaler tonight, after two weeks without.

If I had to guess, I’d say it was the liberal use of soy in the cooking (is it to save money or demonstrate that the food is supposedly heart healthy?) that proved today’s trigger.  I don’t know what specifically was the cause, as it could be any one of soy’s negative properties, or even several in combination.  In the end, this is just further encouragement to avoid eating out.

Except for Elevation Burger (their site appears to be Flash-based, sorry).  That place is awesome.

OSCON 2011: Friday

Friday marked the last day of the O’Reilly Open Source Convention (OSCON), and my last day in Portland, Oregon. Unlike previous trips, I traveled home on Friday night instead of Saturday morning. In the past, I’ve sat around my hotel on Friday night with nothing to do except finish posts about OSCON. There is one drawback, though. I’m finally finishing this post 20 days later, which means it probably won’t be as fleshed out as my posts about Wednesday and Thursday.

After my near complete lack of interest in the keynotes I saw on Wednesday and Thursday mornings, I paid little attention to those on Friday. I thought the message Karen Sandler had about open health was good, but that’s about all I can say about them.

By far I was the most pleased by the sessions I attended on Friday. First, Kevin Falcone’s Shipwright: Application Distribution Simplified. Kevin works for Best Practical, a company with the best shirts. I plan on doing some evangelizing of Shipwright at work, as it would help a lot of people, including me, to better develop and deploy their applications.

I wasn’t planning on attending OSCON this year. I was perfectly happy skipping it and staying home during the last week of July. Then I happened to be looking over the list of Perl sessions and saw, at the very end of the list, Easy Distributed Computing with Perl and Grid::Request. It seems that Victor Felix has released a module that does exactly the same thing as some of the modules I’ve maintained at work, only the design is much better. However, it doesn’t support the batch system we use. I emailed Victor to discuss some collaboration and registered for OSCON so I could meet him. So yeah, I attended OSCON for one session. But it was worth it. The module looks great and Victor seems happy that I have an interest to contribute. It will be much better use of my time to contribute to a module on the CPAN than to continue pouring effort into what we have today.

Since, after chatting for a bit with Victor, I was already standing outside the room well into the next time slot, I popped into Git for Ages 4 & Up. Michael Schwern and Ricardo Signes demonstrated the Git commands everyone should know to get started with the version control system. As an added bonus, they used tinkertoys to help the audience visualize what Git’s internal representation of the repository looked like after each command. It was definitely a different and entertaining talk.

Prior to the closing keynote, Piers Cawley was invited to sing his library song, which I mentioned in Thursday’s post, again for the benefit of all OSCON attendees.

Paul Fenwick delivered the closing keynote. If you haven’t seen one of his talks, shame on you. Here, to help you fix that, I’ll refer you to his keynote, All Your Brains Suck—Known Bugs and Exploits in Wetware.

After three days in Portland, I finally ate at Burgerville. Eating at this regional chain is something I look forward to every time I’m in the area. Though, I suppose my change in diet may have suppressed my eagerness and led me to put it off until Friday. In any case, I ordered a cheeseburger with grilled onions (ditching the bun) and a large raspberry shake. While I prefer their blackberry shakes when available, the meal was delicious.

The high point of the conference happened, oddly enough, after it had ended. For whatever reason, I happened to wander into a different area of the convention center, in which a sock knitting conference was taking place. Outside of their expo hall was the Sockgate, a cardboard replica of a Stargate. As we were waiting to take pictures with it, Paul Fenwick happened by and offered to take some photos. He’s a really nice guy and I enjoyed finally getting the chance to meet him. After the photo op, he headed into the knitting expo hall. In retrospect, I should have done the same. It would have been interesting to see what it was like.

Sockgate

Photo Credit: Paul Fenwick

Finally, I learned that when I attend OSCON, I really do need to go for the entire week. Apparently, it takes me about two days to acclimate myself to the environment and really start interacting with people. Of course, by arriving Tuesday night, I was ready to interact on Friday, just as everyone was heading home. It didn’t help that I was staying in a hotel way out by the airport, with MAX service ending before 11:00 PM. With a new baby at home, I certainly don’t regret my choice to be away for a shorter period of time, but if I go next year, I’ll probably go for the entire week.

OSCON 2011: Thursday

Thursday was the second day of sessions at the O’Reilly Open Source Convention (OSCON) and my third day in Portland, Oregon. Overall, the sessions I attended were arguably more relevant to my work than those I attended on Wednesday. Still, the day left me feeling unsatisfied. At past OSCONs, I ended each day with my mind brimming with new ideas, scarcely able to wait until I could put some of them into practice. So far, this year’s conference hasn’t had the same effect on me.

In any case, the Thursday morning keynotes were far better than those foisted upon us on Wednesday morning. Gabe Zichermann’s talk, in particular, caught my attention. In Game theory applied to user engagement in Open Source he talked about using so-called gamification techniques to draw people into using Open Source software. Many of his examples had to do with using game theory to alter real life behavior, such as a lottery to reward good drivers in Sweden or the use of consumption graphs in hybrid vehicles. On a separate note, I tend to grow annoyed at the latter, having been stuck behind too many hypermiling drivers.

Getting into the sessions, I favored those more in line with the work I do as a Perl programming system administrator. Also, it didn’t hurt that The Conway Channel 2011 happened to take place during the first time slot of the day. I’m a bit sorry I passed up DIY Clinical Trials (Or: How to Guinea Pig Your Way to Scientific Truth and Better Health), if only for the reason that it would have been completely different from anything I normally do. But, I attended those types of sessions on Wednesday, so it was back to business, so to speak. Damian Conway was in his usual top form, as entertaining as he is educational. I won’t go into too much detail, only to note that he demonstrated four of his modules, using a theme I’m sure most will recognize. First, something old, updates to the Regexp::Grammars module. He then introduced something new, the IO::Prompter module, which supersedes his older IO::Prompt. There was something borrowed, the Data::Show module, which serves as a convenience wrapper around the Data::Dump module. And finally, something blue, the Acme::Crap module, which seems oddly cathartic.

I like to think I’m a halfway decent Perl programmer, but that doesn’t mean I think I can ignore things like Jacinta Richardson’s Perl Programming Best Practices 2011. The talk was a round-up of the tools and modules that are generally considered to be the best practices by the Perl community today. Yes, generally. People will have their differences of opinion, and I don’t always agree with the advertised best practices. However, if followed, the practices will lead to better code, and if violating a practice, I like to be able to back that up with a well thought out reason (it doesn’t necessarily have to be a good reason). The first of two, possibly pithy, examples of this is the local::lib and it’s default use of ~/perl5 as its include path. I prefer to use ~/local/lib/perl5 and, sure, the module allows me to do that easily enough, but it’s an extra, non-standard step. Second, the cpanm has been touted as the best way to install modules from CPAN. As a control freak with a highly customized CPAN configuration, I’ve never liked the way cpanm seems to do things its way. Admittedly, it may be customizable, but I’ve never had the need to look into it.

There’s been some noise around the office about testing Amazon’s EC2 offering. To that end, I thought James Loope’s Utility and Automation: Low Overhead Operations with Amazon & Puppet would be educational, possibly giving me some ideas about how to managing our own potential EC2 environment. Unfortunately, it didn’t work out that way for me. The talk was heavily focused on the way the web application was designed and pieces of Amazon’s infrastructure were used. We’re not creating or running web applications, so none of it was beneficial to me. There was nothing about Puppet aside from explaining that using it (or another configuration management tool) is vital for keeping everything running.

At this point, I was turned off from any cloud talks at OSCON. There seems to be, with probably good reason, an inextricable tangling of cloud and web applications. Because of this, I decided to pass on Achieving Hybrid Cloud Mobility with OpenStack and XCP and instead attended Piers Cawley’s Polymorphic Dispatch—It’s Not Just a Good Idea, It’s the Law. I’m glad I did, because there were definitely some very useful ideas presented. The idea, taken from Smalltalk, of passing messages to objects has a lot of merit. Combining this with polymorphism, sending a message and allowing different objects to act on it differently, vastly simplifies code. Simple code, of course, is easier to test and easier to debug when things go horribly wrong (and actually is less likely to go horribly wrong in the first place). Of particular interest to me were the Null Object pattern and what Piers referred to as the key tenant of object-oriented programming: tell, don’t ask. That is to say, if I understood correctly, instead of querying an object for information and using it to determine which action to perform, give the information to the object and have it perform the action. Finally, Smalltalk Best Practice Patterns was recommended as the best book on good coding practices out there. According to Piers, it “will change the way you think about programming.”

I was in way over my head in Tom Christiansen’s Unicode in Perl Regexes. The only thing I managed to learn is that I don’t know nearly enough about Unicode to actually understand using it. I’ll leave it at that. It was a very information-dense session and it’s possible that Tom knows more about Unicode than those who designed it. Other choices during this time slot, which may have been better for me, were Connecting iOS to the Real World with Arduino, presented by my friend Alasdair Allan, or, venturing again into the realm of health geekery, Open Source Preventive Medicine: Citizen Science Genomics

The last session I attended on Thursday had so much potential, but, for me, it fell flat. I expected A. Sinan Unur‘s Visualizing Economic Data Using Perl and HTML5′s Canvas to focus far more on visualization than it did. Instead, the majority of the presentation was about the difficulty of parsing public data published by the United States government. For this, Sinan uses Spreadsheet::ParseExcel and explained a few of the techniques he uses to extract data from tables designed primarily for visual consumption. Unfortunately, very little time was spent showing how Canvas was used. We were given one static example and an explanation that there is no method available for determining the height of text in a Canvas element. I had hoped to return to work with some ideas for using Canvas to visualize data from our batch scheduling system, but ultimately left disappointed.

After the last session, I met up with a coworker, an old friend, and a new friend to have dinner at Chipotle. Normally, I like to avoid chain restaurants—national chains in particular—when traveling, preferring to sample the local cuisine. But, we wanted a quick dinner and it was nearby. My opinion was requested, on the relative healthfulness of pinto versus black beans. I simply stated that I would be ordering my carnitas bowl without any beans.

After dinner, we returned to the convention center for the Perl Lightning Talks and the State of the Onion. As always, the talks were quite entertaining. Of note was a juggling demonstration, illustrating various programming languages and databases. Near the end, Ricardo Signes recounted a conversation he had with a couple of women from the knitting conference sharing the convention center with us. Its presence provided a wonderful juxtaposition. While OSCON is male-dominated and many don’t know how to act when women brave their way into our midst, the knitting convention is completely opposite. Ricardo’s message to us was, take the time to look up from our laptops and chat with those around us. We might just have a better time and make new friends.

Finally, Piers Cawley favored us, as he does every year, with a song. This year, however, he did not bear a tale of levity, but a message of deadly seriousness. The United Kingdom is closing libraries in an attempt to reduce public spending. As a protest, Piers wrote a song, “Child of the Library”. There doesn’t appear to be any video (yet) of Piers performing at OSCON, but I’ve gone ahead and embedded one that I found. It’s catchy, I had it stuck in my head for a couple of days after the conference.

We could easily see the same thing happen in the United States—and in fact I have already seen it proposed in San Diego. I’ll first admit that I have not set foot inside a library since college, over a decade ago (high school, if only counting public libraries). Do libraries still matter, or is the concern over their closing merely the knee-jerk nostalgia of those of us who came of age in a world that didn’t yet know the Internet? I can’t, and won’t, take a side on this issue until I’ve taken the time to visit my local library. If I can recognize it as something I saw in my childhood, perhaps it should be closed. If it has adapted to the so-called Information Age, maybe it’s worth funding.

As a final, humorous note, I almost didn’t make it back to my hotel. At least, not without finding an alternate method of transportation. At 10:22 PM, excusing myself and apologizing for staying so far away from the conference, I left Media Temple party at the Jupiter Hotel, arriving at the convention center MAX station at 10:32 PM. The schedule at the station listed 10:42 PM as the last red line train to the airport, with Google Maps concurring that a train was 10 minutes away. About two minutes later an unmarked blue line train arrived at the station, traveling east. At this point, Google Maps had decided it would rather show me its trip planner instead of the previous screen which showed the impending arrival of the red line. Forced to make a split-second decision, I hopped on the train. I knew that I could take it at least as far as the Gateway station, where I could transfer to the red line if it was still behind me. Around 11:00 PM I arrived at Gateway, after spending the ride thinking about how much a cab would cost. This station had a real-time display with train arrival times. The last red line of the day was only three minutes out. Whew.

OSCON 2011: Wednesday

Today marks my first day of the O’Reilly Open Source Convention, since I chose to only attend the sessions this year. I will also depart with my tradition of writing a post for every session I attend. I enjoyed it in the past, but it adds more stress and distraction than I’d like this year. Instead, I plan to relax and enjoy each session I attend. I’ll still take a few notes, but I’ll limit myself to recapping an entire day in a single post.

I had breakfast in my hotel’s restaurant this morning, a mistake I won’t make again —over half the plate was composed of potatoes and toast, leaving little room for the eggs and sausage—. It was an easy walk to the Cascades MAX station, until I saw the train arriving before me. I likely could have made it onto the train had I sprinted, but I also had to buy a ticket, so I let it go. Fortunately, it was the beginning of the morning commute, so another train was not far behind.

This morning’s keynotes were dry. At least, I didn’t find them at all interesting. Well, except for one. I enjoyed Ariel Waldman’s brief talk about Hacking Space Exploration. It reminded me that I don’t spend nearly enough time on Galaxy Zoo.

The final keynote was a so-called surprise announcement. We were first treated to a video in which a bunch of big names in technology—Bill Joy, Tim O’Reilly, and Al Gore to name a few—gushed over the possibilities of commodity cloud computing. All that build up ended up being nothing more then a lead-in to an overblown advertisement for something called Nebula. While the idea of open and commodity elastic compute is cool, I have difficulty taking something seriously when it’s surrounded by as much hype as I saw during the keynote. Maybe I’m alone in this, but OSCON doesn’t really seem like the right venue to go heavy on marketing and light on technical detail. Maybe those of us sitting in the ballroom weren’t the real audience for the announcement. Perhaps they were just using the large and popular conference as a way of getting media attention.

So, what sessions did I attend?

About half way through OSCON last year, I realized that attending Perl sessions was mostly a waste of my time. They tended to fall into two categories: stuff I already knew and web development (which I don’t do). Where do I end up for the first session of this conference? In Perl 5.14 for Pragmatists, presented by Ricardo Signes. For anyone who has read the Perl release notes (perl*delta), very little of what was presented will be novel. However, it was very useful to see the relative emphasis placed on different features by someone as familiar with Perl as Ricardo. In particular, fully half of the session was dedicated to Perl’s improved Unicode support. As Ricardo stated, Unicode isn’t going away, so we need to get better at working with it.

After attending a session of some relevance to my profession, I wanted to take advantage of a series of back-to-back sessions of a more personal interest. My passions of late have leaned towards health, fitness, and, in particular, a more so-called primal lifestyle. So I was excited to see the session Geeking in a Cabin in the Woods, presented by Ryo Chijiiwa on the schedule. Previously employed as a software engineer at Yahoo! and then Google, Ryo took us through the history and motivation behind quitting his job, buying 60 acres of barren land in northern California, and simplifying his life by living on it. It was a fascinating tale of overcoming challenges. Part of me would love to do exactly what he did. Ryo has a blog (with a really cool domain name) where he writes about his experiences.

Following in the same basic genre, I next attended Sarah Sharp’s talk on Growing Food with Open Source. Sarah is a Linux kernel hacker who also enjoys gardening. Being a lazy hacker (I can relate), she wants to automate all of the mundane, tedious work that comes with a hobby like gardening. She’s written code to manage planting calendars, hoping to eventually integrate it with a service like Remember the Milk, and an Android app to alert her of impending weather conditions that could affect her garden. The most impressive piece was the work she’s done to create an automatic watering system, using home-made moisture sensors and Arduinos. More information can be found on a site I will soon be spending a lot of time on, Garden Geek.

My earliest computer-related memory is playing text adventures on our Apple Macintosh, circa 1984. For that reason, I was excited to attend Ben Collins-Sussman’s talk on The Unexpected Resurgence of Interactive Fiction. So excited, in fact, that I passed up a session r0ml was presenting. Ben took us through a brief history of interactive fiction, from Adventure to present day. He talked about both the science and the art of the genre as both have evolved over the years. He focused primarily on the Inform language and the Glulx virtual machine (not to mention current efforts to produce a web browser-based player), which leads me to think that there isn’t much point in putting any more effort into playing with TADS. He also mentioned the annual Interactive Fiction Competition, which I love and have participated as a judge in for the last several years. This session has gotten me excited about interactive fiction again, after mostly ignoring it as a hobby for the last few years. I have a couple of ideas for games that I’d like to enter into the competition, which I should finally get started on.

For the final two sessions of the day, I decided to return to my core competency, and arguably the whole reason I’m here, and sat down in the Perl room. Damian Conway talked about (Re)Developing in Perl 6. I’ve previously attended his six hour class on this topic, but it was a nice refresher, since I don’t use Perl 6 regularly. He guided us through porting a handful of his modules—Acme::Don't, IO::Insitu, IO::Prompter, and Smart::Comments—from Perl 5 to Perl 6. Each of these modules was selected as a representative of a given method used to port the code. In the simplest case, a basic transliteration can be used. For some modules, new features of Perl 6 can be used to replace long pieces of code; argument lists are a great example. Finally, the ability to extend the grammar removes the need for source filters and allows the programmer to seamlessly add language features.

I ended my day with a session on improving code performance: Sooner, Cheaper, Better — Optimization on a Budget, presented by Eric Wilhelm. I didn’t find it very well organized or delivered, which is a shame, because I’ve seen him present before and he was rather good. After introducing us to the Rules of Optimization Club, Eric took us through a number of real world examples in which optimization might prove to be a waste of time. Old hat for a lot of people, I know. In fact, many people just wait for computers to get faster. However, he then switched gears into a more interesting problem. With today’s advances coming in the form of more cores rather than more speed, optimization was replaced with parallelization. The same rules apply and it’s good to remember that.

Following the last session of the day, a booth crawl was held in the expo hall. This involved setting up food and drink tables at the booths of various vendors, the idea being to bribe attendees to approach them. There was beer, possibly wine, and the food leaned heavily towards cookies and grain-wrapped items. I wandered around, played a Mario Kart-like Pac-Man multi-player racing game on an Android tablet at the QuIC booth, ate a bunch of cheese, and left at 7:00 PM …

To attend the .vimrc birds of a feather (BOF) session. A .vimrc, oft pronounced vim-wreck, is the name of the configuration file Vim uses. It’s more than a configuration file, though; it’s a full scripting engine, which provides quite a bit of potential for customization of one’s editor. Damian Conway, famed teacher of Vim, Perl, and myriad other topics, was in attendance. As expected, the entirety of the session was spent learning about some of the neat, as yet unreleased, scripts Damian has been working on for Vim.

I didn’t have it in me to attend any of the evening events. I was aware of two parties, but I neither wanted to drink nor stay out late. Unlike years past, I haven’t been very social this year, either. Instead, I made the relatively long trip back to my hotel, where I wrote this post (well, just the first draft; I finished it on Thursday morning over the lousy coffee provided by the Oregon Convention Center) and turned in early.

OSCON 2011: Tuesday

This marks the fourth time in five years I’ve attended the O’Reilly Open Source Convention (OSCON). I skipped it in 2009, when it took place in San Jose. This year the convention is back in Portland, Oregon, as it was last year. So I’m here, too.

Unlike in previous years, I didn’t show up on Sunday to explore Portland and attend the Monday tutorials. I didn’t want to spend an entire week away from home, but at the same time, nothing I saw on the tutorial schedule interested me. So I flew up Tuesday afternoon and plan to return on Friday night.

Most of the hotels near the Oregon Convention Center (OCC) were booked up, and I left my itinerary planning to someone else (who is unfamiliar with Portland), so I’m staying at the Courtyard Marriott by the airport. This wouldn’t be so bad, but, according to Google Maps, it’s a 1.2 mile walk to the Cascades MAX station.

Anyway, after getting settled in my hotel room, I headed to the OCC to meet up with my friend, Jonathan. I made it in time to register, pick up my swag, and grab some cheese and beer on the expo floor. I wandered over to the QuIC booth to chat and saw a nice demo of Android and HTML 5 applications running on Qualcomm demonstration hardware. It really showed off the power of the platform.

We decided not to stick around for the so-called OSCON Carnival, so hopped across the river on the MAX and looked around for dinner. In our wanderings, we dropped into Bailey’s Taproom to use the bathroom and have a beer. The bartender recommended the Davis Street Tavern for a good burger paired with a good tap list. I ended up having seared scallops, which were quite good. After dinner, we wandered over to the Puppet Labs party, where I got a souvenir Open Source Lab beer mug.

Bailing fairly early on the party, I caught the MAX red line back over the river and on to the Cascades station. The hotel’s shuttle driver had warned me against the walk, pointing out that there are no sidewalks. However, Google directed me away from the main road and through a business park. I don’t know why people are so averse to walking more than a couple of blocks. I found the walk to be quite pleasant, and there are blackberry brambles growing wild along the streets, providing snacking opportunities. It takes me back to childhood trips to the Pacific Northwest, when I would pick wild blackberries with my grandfather.

Back at the hotel, I grimaced at what they call a fitness center, swam a bit in the poor excuse for a pool, and soaked in the hot tub. Then it was off to bed, because, unlike the lucky folks staying near the OCC, I have to wake up in time for a 20 minute walk followed by a 25 minute MAX ride.