Repeated Capturing and Parsing in Perl

When I checked my email after arriving at the office today, I found a query that had been sent to our internal Perl mail list. The questioner was trying to match a pattern repeatedly, capturing all of the results in an array. But, it wasn’t doing quite what he expected. The message, with minor edits, went a little something like the following.

I’m trying to extract key/value pairs from a file with the following contents:

- name = gcc_xo_src_clk, type = rcg
+ name = cxo_clk, type = xo, fgroup = xo, wt = 10, bloo = blah
? type = hm_mnd_rcg, name = bo : type = rcg_mn
+ name = pxo_clk

I was hoping to do something like this:

@list = $_ =~ m{ ^[-+?] \s* (\S+) \s* = \s* (\S+) \s* (?:, \s* (\S+) \s* = \s* (\S+) \s*)* }xms;

Thinking @list would be assigned the alternating key/value pairs. But the above doesn’t extract anything sane. Adding the /gc modifiers doesn’t make any difference.

If I do the following, it extracts the first two key/value pairs correctly (if the line has more than one pair).

@list = $_ =~ m{
    ^[-+?] \s* (\S+) \s* = \s* (\S+) \s*
    , \s* (\S+) \s* = \s* (\S+) \s*
}xms;

If I keep repeating the pattern in the second line, it keeps matching more key/value pairs.

I would expect using (?: )* should mean zero or more instances of match inside the parentheses, but obviously it’s not working. What am I doing wrong?

When I’m presented with a problem like this, that is some kind of structured data, I immediately think of writing a parser. I’ll get back to that in a bit, but I wanted to address the confusion about capturing in the pattern. And, in fact, that’s how the discussion on the mail list proceeded.

Repeated Capturing

First, let’s simplify the example to demonstrate why our seeker of wisdom isn’t getting back the list of items he expected.

my @matches = 'a b c d e' =~ /^(a) \s* (?: ([bcde]) \s* )*/xms;

say "(@matches)";   # prints "(a e)"

Capturing parentheses in Perl are treated somewhat like registers. Most Perl programmers are familiar with the $n variables, which hold the values of a successful pattern match. For example $1 holds the value matched by the first set of parentheses, $2 holds the value of the second set, and so on.

When a pattern is matched in list context, as above, it’s effectively the same as writing,

'a b c d e' =~ /^(a) \s* (?: ([bcde]) \s* )*/xms;

my @matches = ( $1, $2 );

These pattern match variables are scalars and, as such, will only hold a single value. That value is whatever the capturing parentheses matched last. So, in our simplified example, $1 matches a, which is obvious enough. As the pattern repeats, $2 would be set to b, then c, and so on until the final match of e.

That explains why the pattern match wasn’t returning the expected list. What can be done about it?

Capturing Along the Way

If we break down the sample data, we see that it generalizes to,

prefix key = value[, ...] [: key = value[, ...]]

The first approach that came to mind is to split the data into multiple lines. Each line can then have its initial prefix removed and saved, then parsed for its key/value pairs. That’s starting to look a lot like parsing, which I promised to get to later. For the purposes of this discussion, I wanted to be able to accomplish the task with a single regular expression.

To capture all of the values we want, we need to remove the repeating set of non-capturing parentheses. However, we still need to repeat the match, ideally returning all of the captured values in one statement. We can do that with the /g and /c regular expression modifiers.

my @list = $string =~ m{ ([-+?,:]) \s* (\w+) \s* = \s* (\w+) \s* }xmsgc;

I’ve done two things here. First, I replaced the \S character classes, used to match the key and value, with \w. The + pattern in a Perl regular expression is greedy, so the former character class was also matching the comma used to separate key/value pairs in the data. This left the literal comma with nothing to match, so was one source of confusion.

Second, I noted that the initial prefix, while syntactically important, could be viewed in the same way as the comma and colon separators. I combined all of these separators and added a capture around them so we can later make sense of the parsed data.

When matched against the data, the pattern results in a list like,

("-", "name", "gcc_xo_src_clk", ",", "type", "rcg", "+", "name", "cxo_clk", ...)

Now we can process the data using a simple state machine.

my $state = undef;

while ( my $token = shift @list ) {
    if ( $token eq '-' ) { $state = 'dash'; next; }
    # ...
    if ( $token eq ',' ) { next; }

    my $key   = shift @list;
    my $value = shift @list;

    if ( $state eq 'dash' ) {
        # ...
    }
}

Even though we did all of the data extraction using a single pattern match, it looks remarkably like … a parser! The pattern is simply the tokenizer used to feed tokens into our state machine, the parser.

Parsing

I stated at the outset that I looked at this as a parsing problem, so the solution I would use is most likely a parser. For simple, one-off scripts, I’d use a technique similar to the one I described in the previous section. However, for more complex data or a more complex script, I’d turn to a real parser.

In fact, one of my contributions to the thread that led me to compose this post included an example of using the $^R and $^N variables in embedded code blocks to demonstrate a rudimentary parser that allowed a simulated form of capturing within a repeated non-capturing group. I won’t go into any detail beyond showing what I wrote. As this was from an early point in the thread, the prefix is ignored in this example.

my @list = ();

my $kv = qr{
    (\w+) (?{ $^N; })           # capture the key
    \s* = \s* (\w+)
    (?{ $^R = [ $^R, $^N ]; })  # capture the value, saving the key
    (?{ push @list, @{ $^R } }) # push the key/value onto @list
}xms;

$data =~ m{ (?: ^[-+?] \s* $kv \s* (?:[,:] \s* $kv \s* )* )* }xms;

Fortunately for us, there are parsing modules on the CPAN.

Prior to Perl 5.10, Damian Conway had written Parse::RecDescent, but with the introduction of grammar-like facilities like named captures and named backreferences, Damian improved upon his original work and presented the Perl community with Regexp::Grammars.

What does a parser for this data built with Regexp::Grammars look like?

my $parser = qr{
    <[Line]>+

    <token: Prefix>   <MATCH= ([-+?]) >
    <token: Key>      <MATCH= (\w+) >
    <token: Value>    <MATCH= (\w+) >

    <rule: Line>      <Prefix> <Pairs> <Options>?
    <rule: Pairs>     <[Pair]>* % ,
    <rule: Pair>      <Key> = <Value>
    <rule: Options>   : <[Option]>* % ,
    <rule: Option>    <Key> = <Value>
}x;

if ( $data =~ $parser ) {
    # Do something with %/
}

This is a trivial example and all the work is left to be done by inspecting the parse tree in %/. However, the module supports embedded code that will be called when a token or rule matches, which can be used to process the data as its parsed.

References

OSCON 2011: Friday

Friday marked the last day of the O’Reilly Open Source Convention (OSCON), and my last day in Portland, Oregon. Unlike previous trips, I traveled home on Friday night instead of Saturday morning. In the past, I’ve sat around my hotel on Friday night with nothing to do except finish posts about OSCON. There is one drawback, though. I’m finally finishing this post 20 days later, which means it probably won’t be as fleshed out as my posts about Wednesday and Thursday.

After my near complete lack of interest in the keynotes I saw on Wednesday and Thursday mornings, I paid little attention to those on Friday. I thought the message Karen Sandler had about open health was good, but that’s about all I can say about them.

By far I was the most pleased by the sessions I attended on Friday. First, Kevin Falcone’s Shipwright: Application Distribution Simplified. Kevin works for Best Practical, a company with the best shirts. I plan on doing some evangelizing of Shipwright at work, as it would help a lot of people, including me, to better develop and deploy their applications.

I wasn’t planning on attending OSCON this year. I was perfectly happy skipping it and staying home during the last week of July. Then I happened to be looking over the list of Perl sessions and saw, at the very end of the list, Easy Distributed Computing with Perl and Grid::Request. It seems that Victor Felix has released a module that does exactly the same thing as some of the modules I’ve maintained at work, only the design is much better. However, it doesn’t support the batch system we use. I emailed Victor to discuss some collaboration and registered for OSCON so I could meet him. So yeah, I attended OSCON for one session. But it was worth it. The module looks great and Victor seems happy that I have an interest to contribute. It will be much better use of my time to contribute to a module on the CPAN than to continue pouring effort into what we have today.

Since, after chatting for a bit with Victor, I was already standing outside the room well into the next time slot, I popped into Git for Ages 4 & Up. Michael Schwern and Ricardo Signes demonstrated the Git commands everyone should know to get started with the version control system. As an added bonus, they used tinkertoys to help the audience visualize what Git’s internal representation of the repository looked like after each command. It was definitely a different and entertaining talk.

Prior to the closing keynote, Piers Cawley was invited to sing his library song, which I mentioned in Thursday’s post, again for the benefit of all OSCON attendees.

Paul Fenwick delivered the closing keynote. If you haven’t seen one of his talks, shame on you. Here, to help you fix that, I’ll refer you to his keynote, All Your Brains Suck—Known Bugs and Exploits in Wetware.

After three days in Portland, I finally ate at Burgerville. Eating at this regional chain is something I look forward to every time I’m in the area. Though, I suppose my change in diet may have suppressed my eagerness and led me to put it off until Friday. In any case, I ordered a cheeseburger with grilled onions (ditching the bun) and a large raspberry shake. While I prefer their blackberry shakes when available, the meal was delicious.

The high point of the conference happened, oddly enough, after it had ended. For whatever reason, I happened to wander into a different area of the convention center, in which a sock knitting conference was taking place. Outside of their expo hall was the Sockgate, a cardboard replica of a Stargate. As we were waiting to take pictures with it, Paul Fenwick happened by and offered to take some photos. He’s a really nice guy and I enjoyed finally getting the chance to meet him. After the photo op, he headed into the knitting expo hall. In retrospect, I should have done the same. It would have been interesting to see what it was like.

Sockgate

Photo Credit: Paul Fenwick

Finally, I learned that when I attend OSCON, I really do need to go for the entire week. Apparently, it takes me about two days to acclimate myself to the environment and really start interacting with people. Of course, by arriving Tuesday night, I was ready to interact on Friday, just as everyone was heading home. It didn’t help that I was staying in a hotel way out by the airport, with MAX service ending before 11:00 PM. With a new baby at home, I certainly don’t regret my choice to be away for a shorter period of time, but if I go next year, I’ll probably go for the entire week.

OSCON 2011: Thursday

Thursday was the second day of sessions at the O’Reilly Open Source Convention (OSCON) and my third day in Portland, Oregon. Overall, the sessions I attended were arguably more relevant to my work than those I attended on Wednesday. Still, the day left me feeling unsatisfied. At past OSCONs, I ended each day with my mind brimming with new ideas, scarcely able to wait until I could put some of them into practice. So far, this year’s conference hasn’t had the same effect on me.

In any case, the Thursday morning keynotes were far better than those foisted upon us on Wednesday morning. Gabe Zichermann’s talk, in particular, caught my attention. In Game theory applied to user engagement in Open Source he talked about using so-called gamification techniques to draw people into using Open Source software. Many of his examples had to do with using game theory to alter real life behavior, such as a lottery to reward good drivers in Sweden or the use of consumption graphs in hybrid vehicles. On a separate note, I tend to grow annoyed at the latter, having been stuck behind too many hypermiling drivers.

Getting into the sessions, I favored those more in line with the work I do as a Perl programming system administrator. Also, it didn’t hurt that The Conway Channel 2011 happened to take place during the first time slot of the day. I’m a bit sorry I passed up DIY Clinical Trials (Or: How to Guinea Pig Your Way to Scientific Truth and Better Health), if only for the reason that it would have been completely different from anything I normally do. But, I attended those types of sessions on Wednesday, so it was back to business, so to speak. Damian Conway was in his usual top form, as entertaining as he is educational. I won’t go into too much detail, only to note that he demonstrated four of his modules, using a theme I’m sure most will recognize. First, something old, updates to the Regexp::Grammars module. He then introduced something new, the IO::Prompter module, which supersedes his older IO::Prompt. There was something borrowed, the Data::Show module, which serves as a convenience wrapper around the Data::Dump module. And finally, something blue, the Acme::Crap module, which seems oddly cathartic.

I like to think I’m a halfway decent Perl programmer, but that doesn’t mean I think I can ignore things like Jacinta Richardson’s Perl Programming Best Practices 2011. The talk was a round-up of the tools and modules that are generally considered to be the best practices by the Perl community today. Yes, generally. People will have their differences of opinion, and I don’t always agree with the advertised best practices. However, if followed, the practices will lead to better code, and if violating a practice, I like to be able to back that up with a well thought out reason (it doesn’t necessarily have to be a good reason). The first of two, possibly pithy, examples of this is the local::lib and it’s default use of ~/perl5 as its include path. I prefer to use ~/local/lib/perl5 and, sure, the module allows me to do that easily enough, but it’s an extra, non-standard step. Second, the cpanm has been touted as the best way to install modules from CPAN. As a control freak with a highly customized CPAN configuration, I’ve never liked the way cpanm seems to do things its way. Admittedly, it may be customizable, but I’ve never had the need to look into it.

There’s been some noise around the office about testing Amazon’s EC2 offering. To that end, I thought James Loope’s Utility and Automation: Low Overhead Operations with Amazon & Puppet would be educational, possibly giving me some ideas about how to managing our own potential EC2 environment. Unfortunately, it didn’t work out that way for me. The talk was heavily focused on the way the web application was designed and pieces of Amazon’s infrastructure were used. We’re not creating or running web applications, so none of it was beneficial to me. There was nothing about Puppet aside from explaining that using it (or another configuration management tool) is vital for keeping everything running.

At this point, I was turned off from any cloud talks at OSCON. There seems to be, with probably good reason, an inextricable tangling of cloud and web applications. Because of this, I decided to pass on Achieving Hybrid Cloud Mobility with OpenStack and XCP and instead attended Piers Cawley’s Polymorphic Dispatch—It’s Not Just a Good Idea, It’s the Law. I’m glad I did, because there were definitely some very useful ideas presented. The idea, taken from Smalltalk, of passing messages to objects has a lot of merit. Combining this with polymorphism, sending a message and allowing different objects to act on it differently, vastly simplifies code. Simple code, of course, is easier to test and easier to debug when things go horribly wrong (and actually is less likely to go horribly wrong in the first place). Of particular interest to me were the Null Object pattern and what Piers referred to as the key tenant of object-oriented programming: tell, don’t ask. That is to say, if I understood correctly, instead of querying an object for information and using it to determine which action to perform, give the information to the object and have it perform the action. Finally, Smalltalk Best Practice Patterns was recommended as the best book on good coding practices out there. According to Piers, it “will change the way you think about programming.”

I was in way over my head in Tom Christiansen’s Unicode in Perl Regexes. The only thing I managed to learn is that I don’t know nearly enough about Unicode to actually understand using it. I’ll leave it at that. It was a very information-dense session and it’s possible that Tom knows more about Unicode than those who designed it. Other choices during this time slot, which may have been better for me, were Connecting iOS to the Real World with Arduino, presented by my friend Alasdair Allan, or, venturing again into the realm of health geekery, Open Source Preventive Medicine: Citizen Science Genomics

The last session I attended on Thursday had so much potential, but, for me, it fell flat. I expected A. Sinan Unur‘s Visualizing Economic Data Using Perl and HTML5′s Canvas to focus far more on visualization than it did. Instead, the majority of the presentation was about the difficulty of parsing public data published by the United States government. For this, Sinan uses Spreadsheet::ParseExcel and explained a few of the techniques he uses to extract data from tables designed primarily for visual consumption. Unfortunately, very little time was spent showing how Canvas was used. We were given one static example and an explanation that there is no method available for determining the height of text in a Canvas element. I had hoped to return to work with some ideas for using Canvas to visualize data from our batch scheduling system, but ultimately left disappointed.

After the last session, I met up with a coworker, an old friend, and a new friend to have dinner at Chipotle. Normally, I like to avoid chain restaurants—national chains in particular—when traveling, preferring to sample the local cuisine. But, we wanted a quick dinner and it was nearby. My opinion was requested, on the relative healthfulness of pinto versus black beans. I simply stated that I would be ordering my carnitas bowl without any beans.

After dinner, we returned to the convention center for the Perl Lightning Talks and the State of the Onion. As always, the talks were quite entertaining. Of note was a juggling demonstration, illustrating various programming languages and databases. Near the end, Ricardo Signes recounted a conversation he had with a couple of women from the knitting conference sharing the convention center with us. Its presence provided a wonderful juxtaposition. While OSCON is male-dominated and many don’t know how to act when women brave their way into our midst, the knitting convention is completely opposite. Ricardo’s message to us was, take the time to look up from our laptops and chat with those around us. We might just have a better time and make new friends.

Finally, Piers Cawley favored us, as he does every year, with a song. This year, however, he did not bear a tale of levity, but a message of deadly seriousness. The United Kingdom is closing libraries in an attempt to reduce public spending. As a protest, Piers wrote a song, “Child of the Library”. There doesn’t appear to be any video (yet) of Piers performing at OSCON, but I’ve gone ahead and embedded one that I found. It’s catchy, I had it stuck in my head for a couple of days after the conference.

We could easily see the same thing happen in the United States—and in fact I have already seen it proposed in San Diego. I’ll first admit that I have not set foot inside a library since college, over a decade ago (high school, if only counting public libraries). Do libraries still matter, or is the concern over their closing merely the knee-jerk nostalgia of those of us who came of age in a world that didn’t yet know the Internet? I can’t, and won’t, take a side on this issue until I’ve taken the time to visit my local library. If I can recognize it as something I saw in my childhood, perhaps it should be closed. If it has adapted to the so-called Information Age, maybe it’s worth funding.

As a final, humorous note, I almost didn’t make it back to my hotel. At least, not without finding an alternate method of transportation. At 10:22 PM, excusing myself and apologizing for staying so far away from the conference, I left Media Temple party at the Jupiter Hotel, arriving at the convention center MAX station at 10:32 PM. The schedule at the station listed 10:42 PM as the last red line train to the airport, with Google Maps concurring that a train was 10 minutes away. About two minutes later an unmarked blue line train arrived at the station, traveling east. At this point, Google Maps had decided it would rather show me its trip planner instead of the previous screen which showed the impending arrival of the red line. Forced to make a split-second decision, I hopped on the train. I knew that I could take it at least as far as the Gateway station, where I could transfer to the red line if it was still behind me. Around 11:00 PM I arrived at Gateway, after spending the ride thinking about how much a cab would cost. This station had a real-time display with train arrival times. The last red line of the day was only three minutes out. Whew.

For Want of a Newline

Today I had the pleasure of spending three hours debugging an obscure bug. An obscure bug I caused by introducing a newline. That little punk, 0x0A.

I released a new version of a command line program. It’s an elegant piece of work, combining a marvelously complex-but-intuitive configuration for system administrators with a absolutely simple interface for users. To use the command, the user runs it with a couple of arguments and it prints out a single line of useful text derived from the marvelously complex configuration.

But, it doesn’t print a newline.

Anyone who has encountered a command like this knows well my irritation. You end up with something like this:

my awesome prompt> some_lame_command
my awesome prompt>e answer

Argh!

The workaround most of us use is to see the above, face-palm, then run something like this:

my awesome prompt> echo `some_lame_command`
42 is obviously the answer
my awesome prompt>

Being the arrogant bastard programmer that I am, I decided to fix this. Since all commands print newlines, everyone should already be assuming that this one does too and should already be handling it in the proper manner. When writing a shell script, the distinction between newline-printing and non-newline-printing commands is irrelevant. In either Bourne shell,

FROBBED=`frobnosticate`

or C shell,

setenv FROBBED `frobnosticate`

the shell is benevolent enough to remove the newline, if it exists. After all, this is the most commonly desired behavior when assigning command output to a variable. However, things are a bit different when switching to a programming language, like Perl:

$ENV{FROBBED} = `frobnosticate`; # Caution, newline ahead!

Sure, it looks more or less the same, but veteran Perl programmers will immediately grimace when reading the above. Unlike the shell, Perl, like other programming languages, will preserve the output of the command. In this case, preserving data and letting the programmer decide how to use it is the most commonly desired behavior. Since everything coming from an external command ends with a newline, the environment variable being set in this case will have a newline. This will almost always cause a problem. One that, as I’ve learned, is not always easy to find. Since stripping input of newlines is just as common as the desire to preserve data, Perl makes this easy and most Perl programmers will habitually write this:

chomp( $ENV{FROBBED} = `frobnosticate` );

Now it doesn’t matter if the command prints a newline or not, the chomp function has your back. It’s just like being in the warm embrace of the shell, only with a little extra syntax.

So it turns out that one of the engineering groups I support was using a Perl script that set an environment variable as in the first example. The value of this environment variable was then being passed off to the batch system and used by an engineering program as a network address to connect to. Of course, the program made the fatal mistake of trusting user input and, in a spectacular fashion, failed to connect to the server whose name just happened to contain a newline.

After chasing down a couple of red herrings which left me flummoxed, one of the affected users shared with me an error log and the script that generated it. There, in all its syntax highlighted, monospaced glory was the environment variable being set without attempting to trim off the newline. I quickly released an update that reverted the newline behavior and the problem went away. My engineers—at least, the subset using this particular script—could once again get their work done.

By far, this isn’t the worst thing I’ve done to our batch system. One time I caused all jobs that launched on Solaris hosts to immediately fail. Whoops.

Anyway, what’s the lesson to be learned from today’s experience?

Never—and I’ll repeat that, never—assume everyone will be doing the right thing. Inevitably, someone won’t be.

There’s a corollary to today’s lesson. When coming across something that could be improved with a small change, don’t. Seriously, just don’t. Inevitably, someone will be depending on the current behavior, no matter how right or wrong it may seem.

OSCON 2010: Tuesday

I returned from the O’Reilly Open Source Convention three weeks ago, and I’ve had drafts for my Tuesday through Friday travel posts sitting around since then. I’ve finally found a moment on a lazy Sunday afternoon to enjoy a pint of ale while writing. Although, it is a beautiful day, which I’d be spending outdoors if my family weren’t sick (and I’m not convinced I’m altogether healthy).

Tuesday was the second and final day of the tutorial sessions. In the morning I attended a tutorial on PostgreSQL’s new hot stand-by and streaming replication features; and, in the afternoon I attended part of a tutorial on Cassandra. Why only part? I’ll get to that.

I didn’t feel like going across the river to the food trucks for lunch, so I joined Debbie for lunch at Burgerville. Aside from the delicious food made from local ingredients, there are two things that struck me about Burgerville. The first I noticed when I walked in the door: for the first time, disposing of my trash would require me to read instructions. Burgerville uses three bins for trash: one for recyclable materials, one for compost, and finally one for trash that can neither be recycled nor composted. I thought this was neat, though I did get a kick out of the soft drink cup. It’s from the Coca-Cola company and advertises itself as something that can be composted; with the footnote that this was only possible in a large facility capable of composting such cups. Not something one can throw into their garden compost pile, I guess. The second thing I noticed caused me immediate regret: the receipt lists the calorie count of the foods ordered, along with carbohydrate and fiber content. Looking over the details of the burger, onion rings, and raspberry milkshake I ordered, I decided that it would not be a very paleo day for me. Oh well, the milkshake was very good.

While enjoying our carb-loaded, calorie-filled lunch, Debbie noticed someone wearing a pair of Vibram FiveFingers that we hadn’t seen before. From a distance, they looked almost like normal shoes and appeared to be made with a dark brown suede. With both of us deciding that a post-lunch, calorie-burning walk was called for, and sharing a desire to buy a new pair of FiveFingers, we set out for Portland’s REI store. A trip on the MAX, a walk, a few blocks on the trolley, and another walk brought us to the store.

The shoes turned out to be the KSO Trek. They’re very nice and I’m considering purchasing a pair for hiking. Unfortunately, I struck out on the trip. REI has been having a hard time keeping FiveFingers in stock, so I wasn’t able to find or buy a pair of the Classic version. Fortunately, I’m still satisfied with my KSOs, which I was wearing at the time.

Our impromptu quest for footwear took us well beyond the alloted time for lunch. Fortunately, this time was not wasted. While walking, we had received a call from our coworker back in the expo hall, who needed help setting up the QuIC booth. For some reason, it was fun being allowed into the expo hall while booths were still being constructed. Not sure why, other than that I enjoy seeing things taken apart and (sometimes) being put back together. After getting the booth set up, I made it to the second half of the Cassandra tutorial. I’m told by those who attended the first half that I didn’t miss much.

We had some time to kill between the end of the day’s sessions and the evening’s Ignite talks. So we walked a few blocks to a place called rontoms. Had I not been looking for the specific address, I would have walked right past, not noticing that this was either a restaurant or a bar. The cavernous interior was devoid of anyone save the bartender and a waitress, who would disappear as quickly as she appeared. The photographs on the wall, ost of which featured a man in an animal costume, ranged from strange to disturbing. After a moment’s hesitation, we ventured out back to find a patio crowded with patrons enjoying food, beer, and spirits. With what appeared to be only a single waitress working and not having particularly strong appetites, we went back inside, obtained pints directly from the bartender, and found a comfortable area to sit and chat. Twice we encountered people entering the restaurant, looking for people they didn’t know by sight. Both times my colleagues convinced them that we were those people; one girl even sat down with us for a few minutes before we let her in on the joke. After a while, I received a page from Jonathan that there was beer, salami, and cheese being served outside the ballroom at the convention center. This sounded like an excellent and delicious dinner to me, so I made my way back.

I hadn’t been to an Ignite session before, so I was looking forward to this one. Right off the bat we were warned that we would likely enjoy some talks and dislike others. Fortunately, each talk would only last five minutes, so we were free to use the time to retrieve another beer. By the time we returned, the talk would be over. I don’t believe I took advantage of this, instead waiting for the break, during which some awards were being presented.

Two talks stand out in my memory. The first, perhaps appropriately, was the first in the lineup: Paul Fenwick talking about Maximum XP: Optimising life for adventure (which he gave again, at a much better pace, at the Perl Lightning Talks). Presented in song, Paul’s message seemed to be to enjoy travel and to take advantage of opportunities to meet people and have fun. Based on what I’ve read on his Twitter stream, I’d say he’s been successful.

The other talk, Your Infinite Do-Loop Exercises Bores Me, struck a chord with me. John Scott and Jim Stogdill paired up for this talk, one would perform exercises while the other would speak, switching places at the halfway mark. Not only was it refreshing to see a talk about fitness at a convention populated by a class of people not known for their physical exertion, but it was about a method of fitness I’ve recently become interested in. While I don’t practice CrossFit myself, I frequently look at the exercises on the site and prefer it to the typical, repetitive gym workout. They also mentioned the paleo diet, which, along with the primal lifestyle, I’ve become a big fan of.

My coworkers all turned in early, so I hopped back on the MAX and headed downtown to have drinks with Kevin at Bailey’s Tap Room. I had a wonderful sour beer, which I no longer remember the name or origin of, and had the pleasure of meeting Steve, Jeff, and Michael Schwern. Jeff and Schwern were discussing the use of the Log4perl module in the latter’s gitpan project.

After last call at Bailey’s, I caught the last yellow line across the river and turned in myself.

OSCON 2010: State of the Onion

The Thursday sessions are over, but before I head out to the parties, I’m attending the 14th State of the Onion address. This is the always well-attended update on the universe of Perl. I immediately noticed that Larry is surrounded by his wife and his son, the former dressed as an angel, the latter as a devil.

Larry claims that so rarely does he talk about Perl in the States of the Onion addresses that he has brought his conscience with him today to prod him in the right direction (the aforementioned angel and devil).

The current state of the onion is segmented into left, central, and right sections. It can be labeled, say, 5 and 6. They can also be labeled 0 and 1, for false and true. Larry then asked a series of boolean questions, asking the audience to weigh in on the veracity.

Do you think Perl 5 and Perl 6 are really the same language?

Do you think Perl 5 and Perl 6 are really different languages?

As the angel and the devil argued, Larry pointed out that an important skill for a language designer is to be able to stay on the fence long enough until he can determine which side the grass is greener on. Sometimes you discover that you’re sitting on the wrong fence and the voices in your head start to argue about which side has the greener grass.

When the voices in your head start arguing if the purple cow eats greener grass than the brown fence, it’s time to see a doctor. Or find a better drug dealer.

— Larry Wall

This is, of course, a metaphor for being a language designer. Sometimes you sit on the fence for language features, without ever knowing which direction is the better one.

Next up is a live demo of Perl 6; or, more specifically, of Rakudo Star, which is scheduled to be released next week. Some of the demos, without comment:

.say if 6 %% $_ for 1..^6
[+] gather { take $_ if 6 %% $_ for 1..^6 }
[+] grep { 6 %% $_ }, 1..^6
~[+] grep 6 %% *, 1..^6
-> $n { $n == [+] grep $n %% *,  ..^ $n }
-> $n { $n == [+] grep $n %% *,  ..^ $n }(6)

At this point, the examples scrolled off the screen due to a “whatever” example being run. That’s good news, though. It means Rakudo Star supports lazy lists and, as such, we finally have those infinite lists we’ve been promised:

0, 1, ... *

The whatever star can, in addition to being used as in an infinite series, can be used to curry a function:

(1, 1, *+* ... *)[^20]    # Fibbonacci
(0, !* ... *)[^20]        # 0 1 0 1 0 1 ...

In a recent video interview, Larry was asked, if he were hit by a bus, has he designated anyone to be his successor as the leader of the Perl 6 project? His response was that he trusts the Perl community to choose the right person.

Onions can make you cry, so can disruptive technologies or innovations. Almost everyone has labeled their technology as disruptive. As such, the phrase has lost most of its meaning.

A disruptive technology simultaneously does something worse and does something better than its competitors. In a time of the Unix philosophy of “do one thing and do it well,” Perl came along and attempted to do everything, but didn’t necessarily do any of it well. The Unix philosophy was broken by its own utilities. No one knew what a “thing” was, and no utility of the time did it well. By the time Perl 4 turned into Perl 5, it demonstrated that a tool that was itself an entire tool shed could run circles around shell scripts.

In California, we once had many, many colonies of ants. Now, most of California is populated by a single colony of Argentine ants. This is because the colonies have forgotten how to fight with each other. Perl 6 has benefited from multiple teams creating multiple implementations, in the end working together to create a better product, even if that product takes longer to complete.

If you don’t like Camelia, you can just fork off.

— Larry Wall

The takeaway, I think: It is up to all of us to determine what Perl 6 will be. What kind of disruptive technology will it be?

OSCON 2010: Awesome Things You’ve Missed in Perl

Paul Fenwick (Perl Training Australia)

Ever since I saw An Illustrated History of Failure two years ago, I’ve made it a point to see @pjf‘s talks. That’s how I find myself in his mid-afternoon session, Awesome Things You’ve Missed in Perl. Judging by the size of the crowd, I’m not the only one. However, I won’t attempt to pass along his humour in this post. I’d never do it justice.

In his introduction, Piers Cawley asked that we go wild when Paul took the stage, so the folks in the Google Wave session next door would be taken aback, and realize that Perl is not, in fact, dead.

People are still out there writing Perl as if still in the dark ages of 2008. Paul doesn’t want us to write old Perl, but only new and shiny Perl. This talk only covers practices that have come about since Perl Best Practices was released.

Object-oriented Perl is not awesome. Not even close. If you look at the old ways of doing it, all of them are either wrong, stupid, or both. The rest are too hard. There’s a simple way to fix this: use Moose. This module does so much of the infrastructure work of composing classes, it makes object-oriented programming enjoyable again.

Paul spent a lot of time giving a humorous, high-level overview of the features available in Moose.

The Moose module contains a huge number of extension modules in the MooseX namespace.

When I have a problem, I go down to the pub with other Perl mongers and bitch.

One of the limitations of Perl, that is exposed to Moose, is that not everything is an object. This means methods like push() or isa() can’t be called on everything. And checking types defeats the purpose of polymorphism. Enter the autobox module, which turns everything into an object. As a bonus, it operates in lexical scope. Moose exposes autobox through the Moose::Autobox module.

A module that Paul wrote, autodie, which is now included in core. This lexically scoped module removes all of the boilerplate code that goes along with trapping errors from subroutines.

Not only is Perl 5.10 awesome, but Perl 5.10 regular expressions are awesome. In particular, the introduction of named captures (via %+) made regular expressions extremely awesome.

Perl 5.10 also provides grammars in the regular expression engine. This is the basis for Damian Conway’s Regexp::Grammars module.

Referring to an article on SweeperBot in The Perl Review. However, there’s the problem of distributing a program that uses half of CPAN to users of inferior operating systems, such as Microsoft Windows. That’s where the PAR module comes in. It will pack up all of the modules used by the program, including the Perl interpreter itself if necessary, so a single, self-reliant file can be distributed to users who need it.

Remember to never optimize code. Programmer time is far more valuable than CPU time. However, when you must optimize code, profile first. The Devel::NYTProf makes profiling awesome.

Code reviews are important, but Perl programmers are lazy. Fortunately, the Perl::Critic module has read Perl Best Practices for you and will complain about where your code violates the practices in the book. At my day job, it does about half the work of code reviews for me, loudly announcing violations of the coding standards that I enforce with an iron fist.

If you find an awesome module, buy the author a beer if you have the opportunity. There’s also CPAN Ratings to leave feedback or perlthanks in recent versions of Perl.

OSCON 2010: 21st Century Systems Perl

Matt Trout (Shadowcat Systems Limited)

The full title of this session is, 21st Century Systems Perl – the New Perl Enlightment for sysadmins

Introduction

While Perl isn’t dying, “PERL” most certainly is dying. This is a good thing, because it includes all the really crappy stuff, such as Matt’s Script Archive. Thank goodness for that. To be fair, this code would have been horrible written in any language. Remember, blame the artist, not the tool.

We have a very mature community, which means we also have very mature practices. We are also converging on a standard platform, even if there are more than one ways to do something.

Part 1: Minimising Developer Fatalities

As a developer, we should do what we can to make our sysadmins’ lives easier.

Right off the bat, we should use the local::lib module, which allows an application to use custom library areas without polluting the system installation areas. It can even work with /etc/skel. Matt is a big fan of using a local library path, included with the application, so it can be maintained separately from both the operating system vendor’s modules and even other applications.

Improve module installation using Module::Install.

Package modules for your distribution of choice using cpan2dist.

Improve the CPAN experience using App::cpanminus, which is amazing easy to bootstrap:

> wget cpanmin.us
> ./cpanm

Start using all of the modules associated with best practices by installing Task::Kensho.

Vendors are getting better at distributing Perl and keeping up with module releases. The Debian Perl team is the strongest, with Fedora lagging quite a bit far behind. Fedora is finally getting better, now that members of the Perl community have a say in the packaging of Perl and the modules.

After many debug sessions, Matt has come to the conclusion that mod_$lang is evil. Jamming languages into the web server is a bad, bad idea. However, actually hooking into the different handlers can be useful. Matt’s preference now is now FastCGI.

Part 2: Maximising Automation Banality

“In the systems world, shiny and exciting is not good.”

Use the autodie (in core as of 5.10) and the IPC::System::Simple modules to reduce the repetitiveness and the common errors of systems programming.

Use IO::All to fix the syntax and semantics of I/O operations.

Systems script shouldn’t need to be deployed. It should be possible to just drop the script onto a host and it will Just Work. That’s where PAR::Packer.

OSCON 2010: Dist::Zilla

Ricardo Signes (Pobox.com)

The full title of this talk is, Dist::Zilla – Maximum Overkill for CPAN Distributions.

Every CPAN distribution contains a significant amount of crap. It’s infrastructure used for the distribution tools.

ExtUtils::MakeMaker has been the traditional way to work on the infrastructure code. By necessity, it contains a lot of legacy, which can be cumbersome to maintain. Enter Module::Install, which can look in the expected places for the necessary information, such as the author name. But, the author still must write all the boilerplate. Module::Starter was written to address this, composing all the boilerplate on behalf of the author. There is so much boilerplate that, by default, Module::Starter also provides a boilerplate test to detect it.

Why are we doing all of this? How much repetitive work are we doing?

What can Dist::Zilla do for us? For starters, we can remove some files:

  • LICENSE
  • MANIFEST.SKIP
  • Makefile.PL
  • README
  • t/pod.t
  • t/pod-coverage.t

Leaving us with only our Changes file, our code, and our tests. The non-infrastructure parts. On top of that, Dist::Zilla does all of the boring distribution bits for us. It only handles the make dist command. It does not handle the make install command, which means the users who install the module don’t need all of the dependencies.

Dist::Zilla puts all of its functionality into plugins, which will be the meat of the rest of this session. It also uses a very simple INI-style configuration file.

The main command provided by the module is dzil build. This bundles the distribution, which will contain all of the infrastructure necessary for users to install the module. When building, it follows a simple work flow:

  1. Gather files
  2. Munge files
  3. Collect metadata
  4. Write out

There is no default configuration, but there is a Basic plugin bundle that will include all of the most common plugins.

What followed were examples of what the plugins can do. Of course, all of them are designed to reduce cruft—the non-code, non-documentation bits that we’re forced to maintain. The philosophy is the same one I advocate to anyone who will listen: computers are good at doing boring, repetitive tasks with derived data; why don’t we let them do more of that stuff?

I’ve followed @rjbs on Twitter for a while, and I’ve seen him talk about Dist::Zilla. I’ve wanted to try it out for a while, to simplify my distributions—both for CPAN and for my day job—but I didn’t realize until this session just how awesome the tool is. It’s a complete framework for managing Perl module distributions. Dist::Zilla will give my Laziness score a huge bump.

OSCON 2010: Smalltalk-style Traits

Curtis “Ovid” Poe (BBC)

After a long break, an apple, a cup of coffee, and a beer, I’m back in the Perl track.

The full title of this session is, Scratching the 40-Year Itch of Inheritance with Smalltalk-style Traits.

This is not a tutorial. How to use traits is easy, but why to use them is a more complex discussion.

Inheritance is a very complex problem and an easy one to get wrong. Then people start doing things with multiple inheritance and, even if they’re not doing something deliberately stupid, they end up with diamond inheritance. Not only is this a problem, but it’s been a problem for a very long time—40 years, in fact.

Complex systems can lead to deep class hierarchies. When hierarchies are deep, in particular with a dynamic language like Perl, it becomes difficult to determine where a method came from. Even when its known where a method comes from, undesired behavior may be inherited. This becomes worse when multiple inheritance is used.

As systems grow, the problem becomes two-fold:

  1. Class responsibility – larger classes are desired
  2. Class reuse – smaller classes are desired

Inheritance, by itself, cannot solve this problem. So the solution is to
decouple the sub-problems.

Several solutions have been tried:

  • Interfaces
  • Delegation
  • Mixins – incredibly popular

As expected by the name of this session, traits (or roles in the nomenclature of Moose) solve the problem far better than any of the above solutions. Much of the session involved showing real-world application of roles to clean up code at the BBC.