“Ninja” Code

The Amazon booth at OSCON 2008 is advertising heavily that they are hiring. They are also holding a raffle. To enter, simple look over some Perl code they have written out on some poster board and tell them what it does. It looks a little something like this (transcribing from memory):

my $code = qq{
    print 1+1 . "\n";
    $code =~ m/(\d+)\+(\d+)/;
    $new = $1 + $2;
    $code =~ s/\d+\+(\d+)/$2+$new/;
};

for ( 1 .. 10 ) {
    eval($code);
}

What’s the first bug? Yes, it should use q{}, or the variables will interpolate on the initial assignment to $code. To their credit, they initially used single quotes, but people said it was too hard to read.

I wasn’t content with just figuring out what the code did and fixing a small bug. I think it can be written better.

eval($code = q{
    print 1+1 . "\n";
    $code =~ s/(\d+)(\+)(\d+)/"$3$2" . ($1 + $3)/e;
    eval $code;
});

Much better. Not only is it more concise, I was able to remove that pesky loop, so I wouldn’t be bothered by any silly upper bounds.

So what does it do? Should be obvious. Head over to the Amazon booth and let them know.

OSCON 2008: Strawberry Perl: Achieving Win32 Platform Equality

My first session of the day is Strawberry Perl: Achieving Win32 Platform Equality, presented by Adam Kennedy. Originally, I had considered a Parrot talk, but I saw a similar talk at SCALE6x, and I happened upon Adam on IRC this morning. I chatted briefly with him about his talk, and he happens to be in communication with a friend of mine, who is working on Camelbox, a Windows build of Perl originally targeted as a way to easily distribute applications written with Gtk front ends (I hope I got the motivation correct).

Recently, Adam has been funded by The Perl Foundation, Perl in Israel, and Stonehenge to use Perl from nothing but his flash drive. This provides an excellent motivation to get Strawberry Perl working in a highly portable way.

Originally, Perl was awesome and worked everywhere—except Windows. That was okay, because Windows didn’t matter. No one did any real work on Windows. Then, around 1995, Windows started to matter. A brief history of Perl on Windows followed, resulting in what is today ActiveState.

Much of what Adam wrote for PPI does not work in ActivePerl, which makes it a non-starter for him, as he tends to work on Windows. Anything depending on Scalar::Util or List::MoreUtils modules will not work with the ActivePerl build system. This led to an embarrassing problem for Adam when he gave a talk three years ago at OSCON. He couldn’t give his demo, because PPI would not build in ActivePerl. In fact, ActiveState’s package manager has gotten so much worse that almost any module that is at all useful does not exist—and thus nothing useful can be done on Windows (big surprise).

Moving away from ActiveState, this talk is essentially about Adam trying to get his own laptop to work. That’s really all he wants. It’s a modest desire. More importantly, the CPAN module has to work. Without that, what’s the use of Perl?

So Adam offered a prize: a yard-high stack of cases of any beer desired by the first person who could provide a fully-installable and working (by the above definition of working) version of Perl for Windows. After six months and no sign of a winner, he changed the prize to “craploads” of beer. In 24 hours, he received two entries. The winner cheated a lot, but the loser was Vanilla Perl, which has become a testing ground for experimentation.

Strawberry Perl is the Perl for Windows designed for people who don’t use Windows. That is, the people who do all of their work on Unix or Unix-like systems—Linux, Solaris, and Mac OS X. The main goal of the project is to make it easy—it is Perl, after all.

In the future will come Chocolate Perl—completing the holy trinity of neopolitan flavors—for people who know Windows, but don’t know Perl, and thus the Unix-like characteristics of Perl.

The target of Adam’s financial support is Portable Perl: Perl for flash drives. Carry it around, install CPAN modules onto, or from, the flash drive. It’s network-aware, does the right thing, and juliennes fries. An excellent standard being developed for portable apps is, in fact, PortableApps.com, where applications such as Firefox or Putty can be downloaded and installed to those ever-growing flash drives.

Available Thursday at the Perl Foundation‘s booth in the expo hall will be branded flash drives with Portable Perl on them. At least, I think I heard that correctly.

I really like the work Adam is doing. He’s accomplished so much to get Perl everywhere. That’s a cause I can get behind.

“The main problem today is Vista.”
— Adam Kennedy

Okay, I took that out of context, but I couldn’t resist capturing the quote. What he really means is that changes made to Windows in Vista have made things not work, in particular the access control. It’s not an unusual problem when upgrading to new systems, but it is more difficult with proprietary platforms, which Open Source authors have very little access to.

OSCON 2008: Perl Worst Practices

I’m sitting in Portland 252 for my first tutorial of the day, Perl Worst Practices with Damian Conway. He’s started off by complimenting us on our intelligence and our ability to convince our bosses or significant others that paying for a worst practices course was a good idea.

Most of us are, of course, aware of the concept of best practice when coding. Writing code that’s maintainable, predictable, and follows the rules. Oh, and uses Java.

Worst practice is, by contrast, code that is obfuscated, unmaintainable, and breaks all of the rules. Today, we will be studying code that Damian has submitted to the Obfuscated Perl contest. This promises to be very, very scary.

Damian’s entry to this contest was SelfGOL, a program capable of self-replication, rewriting other Perl programs to themselves self-replicate, detecting un-rewritable programs, playing Conway’s “Game of Life,” and, as if that wasn’t enough, animating any text as a cycling marquee banner. The main constraint of the contest is that the entry must be under 1,000 bytes of code, so it shouldn’t be too difficult to understand. Obviously it doesn’t use any modules, because that would be too easy. Not only that, but it doesn’t use a single control structure. This is going to be great.

Following an amusing demonstration of SelfGOL, we moved into treating it as a case study for a set of principles. Principles that will focus on the very practices SelfGOL embodies, and why they should never, ever be used. As I intend to enjoy the discussion, I won’t spend much time writing about the discussion and examples accompanying these principles, but rather simply note the principles for my own benefit (documentation for the win). After all, sharing all my new tips and tricks would suck all the fun out of it.

Principle 1: Sane and consistent layout makes code more maintainable (but it isn’t a magic bullet if the code itself is beyond help).

Principle 2: Using built-in features isn’t necessarily smarter or cleaner (even though fellow developers’ futile struggles to recall those features can be highly amusing).

Principle 3: Obscure obsolete features are obscure and obsolete for a reason (and restasking them for even more obscure purposes is not helping).

Principle 4: Each statement should do one thing only (since that’s the upper limit most brains can comprehend).

Principle 5: Relying on default behavior makes code very slightly easier to write and vastly harder to read (because most readers can see better than they can think).

Principle 6: Randomly placed subroutine definitionss are static (in the radio interference sense).

Principle 7: Choose data structures that simplify your task (even if the task is to make those data structures incomprehensible).

Principle 8: Just because you use some operation frequently doesn’t mean it should be in a utility function (especially if it’s in a function merely to abbreviate its name).

Principle 9: Encapsulating the familiar can decrease maintainability (refactoring isn’t a substitute for sanity).

Principle 10: Treat any clever one-line solution as an alarm bell (or as an antipersonnel mine with a six-month delay fuse).

Principle 11: Familiarity breeds comprehension (it breeds contempt (but hey, what’ doesn’t?)).

Principle 12: Table-driven solutions are clean, efficient, and extensible (as long as you don’t mind losing a little comprehensibility).

Principle 13: Building a messy data structure and then cleaning it up is often easier than building it cleanly in the first place (and to hell with the purists).

Principle 14: Some code is better compiled at run-time (but the urge to use eval is Nature’s way of letting you know there’s not yet enough pain or misey in your life).

Principle 15: Parentheses are our friends (cos, if you can remember all 24 levels of Perl’s precedence, you gotta get a life, dude!).

Principle 16: Edge cases suck (and edge cases of familiar constructs suck worst of all).

Principle 17: Code should do what it seems to be doing (especially when it seems to be doing something subtle).

Principle 18: Conceptual elegance is no guarantee of actual maintainability (nor a good substitute for it).

Principle 19: If you’re going to have default values, define them near the place they may actually be used (or, at least, somewhere they have a slim chance of being discovered).

Principle 20: No matter how good you think your error messages are, they’re still too brief, too obscure, and too hard to decipher (even if you’ve already taken Principle 20 into account).

Principle 21: Avoid using obsolete and arcane magic punctuation variables with unfamiliar default values and unexpected global effects (even if you happen to enjoy a little self-inflicted pain in other recreational activities).

Principle 22: The fundamental complexity of any problem is irreducible (optimizations merely redistribute the pain differently).

Principle 23: Code that breaks when it’s reformatted is already broken (though on a much more profound and interesting level).

Principle 24: If it’s impossible to understand, it’ll be impossible to maintain (on the bright side, of course, such code is highly stable).

This last one should, but often doesn’t, go without saying.

Principle 25: Phenomimetic retrodeterministic nominativism generally does not improve code comprehension (then again, did it sound like it would?).

Principle 26: Don’t allow dynamic behavior to violate static expectations (and the easiest way to do that is reusing over-scoped variables for unrelated purposes).

Principle 27: Explicit behaviors are better than implicit behaviors (especially when the specification of the implicit behavior is syntactically baroque and hard-to-spot, and the behavior itself is unknown to the majority of developers).

At this late point of the tutorial, Brad pointed out to me that all of these principles are in the included materials. Now that I’ve already transcribed so much from the slides, I don’t have the heart to delete it all. Of course, since I haven’t been commenting on all of the black magic to this point, there would then be very little in the end to post. Brad also has a much better post about this tutorial, since he actually took real notes.

Principle 28: Code that pre-caches or precomputes its data is much easier to maintain than code that caches or computes on-the-fly (when you’re running at multiple gigahertz, acquiring your data a few thousand operations early is still plenty JIT enough).

Principle 29: Coding is an art, but code shouldn’t be art (evolution made programmers boring, pedestrian, and aesthetically challenged for good reasons).

It’s mesmerizing to listen to the thought process behind Damian’s obfuscated code. I can’t help but wonder if this well-organized, well-thought-out explanation is anything close to how Damian designed this program. Or, rather, if there are extremely convoluted, scary, and most importantly, evil gears grinding away inside his head. In fact, I suspect this entire tutorial may have been designed purely as a way of documenting SelfGOL so Damian himself can remember how it works. Clever.

This kind of programming is silly and fun, but it serves a real purpose. Pushing the limits of a language teaches about its dark places. The understanding that comes from it vastly improves the skills of the programmer, even if—especially if—the bad things are never, ever used. Perl, even more than other languages, encourages this kind of play, thanks to its rich diversity and culture.

Important safety tip: keep these tricks and contrivances for recreational purposes only.

I don’t know what’s more disturbing, how much of the tutorial I understood, or how much I already knew coming in.

[tags]oscon, oscon08, Perl, Damian Conway[/tags]

OSCON 2008: Perl Security

After lunch, I wandered over to Portland 255 with Brad and Al for the Perl Security tutorial, presented by Paul Fenwick. Straight away I can tell that he’s going to be a lively and entertaining presenter. His slides go by quickly, as they are merely short counterpoints to his commentary. His commentary, which is also very quick and slightly witty. I don’t expect to have any trouble paying attention. If anything, I’m worried that I’ll fail to pay attention to my writing and, of course, to the #oscon IRC channel.

“A computer is secure if you can depend on it and its software to behave as you expect.”
—Simson Garfinkel and Gene Spafford in Practical UNIX & Internet Security

In a nutshell, that’s what security is. If a computer behaves as expected, it is secure. That is, unless it’s expected to be insecure, I suppose. I’m not sure I’d enjoy that situation, so I’ll assume the assumption of expected behavior is both expected and secure.

Most security boils down to common sense. Unfortunately, this mythical state of being is far less common than its name would imply. Sad, but true. People are often lazy or distracted, and these usually lead to really stupid mistakes.

There is a key acronym when thinking about security: CIA. No, not that CIA. Yes, I thought so, too, at first. What it really means is, Confidentiality, Integrity, and Accessibility. Confidentiality, because information will not remain secure if it does not remain confidential. Integrity, because information must remain known and trusted to remain secure. Accessibility, because denial of access to information may result in insecurity. I may not have done justice to this acronym, because the tutorial moved on quickly after this point. I’m sure there are web sites dedicated to security that can better define the term.

Perhaps the most important piece of advice for the unwitting Perl programmer is to always perform data validation. Never, ever trust input, regardless of where it came from. Fortunately, Perl provides Taint Mode, which forces the program to mistrust input.

Paul shared with us a variety of examples to demonstrate why input should not be trusted, as well as a number of examples of how to properly untaint data. As with anything, it’s easy to become lazy when untainting data, which can sometimes be as bad as not using Taint Mode at all.

The tutorial continued into what is essentially a list of best practices to follow when programming securely with files.

  • Do: Use the three argument version of open(), to prevent attacks using file names with magic characters in them.
  • Do: Use sysopen() instead of open(), which provides ways to avoid overwriting a file, thus helping to prevent symlink attacks often as a result of race conditions.

The common attack vector in so many of the examples given so far has been via file names. Wouldn’t it be great if we could write programs without file names at all? Well, when working in a Unix-like environment, we can. Perl has the ability to use anonymous files by passing undef as the third argument to open(). He was even kind enough to provide us with a way of passing these anonymous file handles to child processes, by disabling the close-on-exec flag prior to calling system(). Sorry, the slide went by too quickly for me to transcribe the method. It, along with all the other examples, are available online.

Calling system() and using backticks make Paul really, really angry. Why? Because doing it right is hard. In fact, just correctly checking the result in $? requires 10 lines of code, according to the documentation for system() in the perlfunc manual page. So, 10 lines just to verify that a single line of code executed successfully.

I briefly became distracted by news of a fire back home. However, what I was able to get is that Paul has written a module, IPC::System::Simple, which, as the name implies, makes the process of calling system commands quite simple.

After the mid-afternoon break, we ventured into setuid and setgid programs. Perl provides ways to determine who is really running the program ($<, $() and who is effectively running the program ($>, $)). Perl is, however, ignorant of the saved UID, which is the third UID in Unix, along with real and effective. Unfortunately, the standard for setuid scripts is confusing and implemented differently on various systems, so don’t use it. Really.

Even worse, the $< and $> variables are cached by Perl, so they may lie to the program, especially when using the setresuid() system call to properly drop privileges, as recommended. Fortunately, another useful module from Paul, Proc::UID provides a solution to this caching problem.

Now we move into DBI security. SQL injection attacks are very similar to the file name or shell attacks covered previously. Any database programmer worth his salt should be aware of the hazards of composing SQL, so I won’t go into the examples here. Programmers should, of course, use placeholders if they’re available. The DBI module itself provides its own Taint Mode, both for input and output, adding all the benefits of Perl Taint Mode to database interface code. Even better, it can be controlled on a per-statement basis.

All of this careful taint checking we’ve done and Perl may end up sabotaging us anyway. When presented with files on the command line, Perl is happy to just open them using the simplistic, dangerous, single argument open() call. Typically, this is done when using the <> operator in a while loop. Also, everyone forgets to use Taint Mode in cron jobs. Don’t do that. Really.

Because Perl is written in C, the null byte becomes very interesting. While it is a perfectly valid character in Perl strings, it marks the end of a C string. In most circumstances, this is not a problem. However, it can mean bad things when making systems calls, which are written in C. Normally, at a terminal, null bytes don’t occur in user input, unless that input comes from the Web. Null bytes can be trivially represented by the %00 escape sequence.

I need to go through the list of Paul’s modules, since they appear to be ideal for the type of programming I tend to do, as an IT developer. In fact, he’d like to see some Solaris patches for Proc::UID, so I can probably help him with that.

I noticed during the tutorial that Paul must read the Fail Blog and I Can Has Cheezburger, or at least knows someone who does. Quite a few of the images that have appeared on his slides have graced the pages of those web sites.

As an added bonus, the tutorial ended 40 minutes early, and Paul had bonus material. What a guy.

The tutorial, and with it the day, is now over. It’s time for dinner, then maybe a BOF session or maybe just a trip to a pub.

[tags]oscon, oscon08, perl, security[/tags]

OSCON 2008: Mastering Perl

It’s early on Monday morning and I’m in my first tutorial session of the day, following the continental breakfast provided in Convention Hall E. I wasn’t overly impressed with the tutorial options this year. So, being who I am, I mostly opted for the Perl track. That brings me to where I sit now: D136, listening to brian d foy teach us about Mastering Perl. I almost didn’t attend this tutorial, since I’ve read the book and, while I found it excellent, I learned very little from it. I took this to mean that I’ve already mastered Perl. But, like I said, my options are limited—I’m not very interested in the introductory Python tutorial.

The idea behind Mastering Perl is not to talk about Perl to a group of Perl masters. Instead, it’s about mastering Perl in the guild sense (and not of the SCA variety). Back in the day, and still existing in some professions today, there was an apprentice system. A neophyte—in today’s nomenclature, a noob—would begin acquiring skills under a master of the art. As he progressed, he would be entrusted with more and more responsibility, until finally he became a master himself and took people under his own wings.

This apprenticeship system, somewhat unfortunately, does not exist in the computing world. That’s where brian d foy feels that Mastering Perl fits. Lacking true masters, the book acts as a substitute. Someday, we may even create a guild system. But then we’d probably have to pay dues and follow rules, and that’s not very attractive. That said, it’s the model I’m hoping to use at my own place of work. I’d like to hire one or two developers who I can take under my own wing and mentor them in the ways of Perl and the grid.

The first two topics covered are tools for optimization, profiling and benchmarking. Often mis-attributed to Donald Knuth, Tony Hoare once said, “Premature optimization is the root of all evil.” What this means is that one should never assume what requires optimization. Let the testing be the guide.

While profiling is objective, benchmarks, like statistics, are not always objective. Everyone has an agenda and benchmarks are subjective. Often, benchmarks are short-sighted. For example, benchmarking code run time and attempting to optimize for it may not be worth the expense of the developer time required to make the requisite changes. It’s worth analysing what is important before blindly following benchmarks.

I’ve been on the receiving end of misplaced premature optimization. I worked with a development group that put far too much emphasis on achieving perfect results on their Devel::Cover reports. This led to strange bugs in their code, and a strong belief that “new() doesn’t work that way.” As it turns out, their test suite was calling new() in two ways. I forget what the second method was, but it was not used anywhere else in their code. However, in order to get this test code to run, and get 100% coverage, they added code to the constructor for every class. Code that prevented inheritance of the method. The team then convinced themselves that constructors could not be inherited in Perl, rather than realizing that their own habits were the problem.

After the mid-morning break, we wrapped up the discussion on profiling and benchmarking, and moved into configuration. This is a vital topic for anyone who desires the ability to pass a program off to users without being bothered to modify it later in response to users’ desire to customize the program for a slightly different use.

External configuration, particularly via the command line, is something I depend on heavily, even in very simple Perl or Bourne shell scripts. I almost always create command line options for performing a dry run or output debugging information. Not only are these useful for development, they can live on in the final program, providing help to the final user, who more often than not is me. Sometimes I will even add configuration to values that never change, just for when they eventually do.

Jumping past configuration, we move on to logging. It’s really easy to add to a program, and it’s really useful to leave in a program when it’s released. The ability to enable logging on the fly sure beats adding a bunch of print() calls in the code when it inevitably breaks at three in the morning. The Log::Log4perl module is a particularly powerful method of adding logging to programs. It’s well worth investigating for anyone who wants to easily add logging functionality to their code.

The final topic of the day is lightweight persistence. It’s always nice to have data stick around between program invocations. The easy way (and everything in the second half of the tutorial is easy) to add persistence to code is to not use DBI. While DBI is powerful, it also tends to require a database server (ignoring SQLite for the moment). Modules such as Data::Dumper, YAML or Storable are ideal for easily storing and retrieving data in code.

After the tutorial, brian will be available at the Powell’s Books mini store, located near the registration desk, to sign copies of Mastering Perl. I already have a copy, thanks to my local Perl Mongers group, but it’s all marked up with the group name, and I wouldn’t mind having a signed copy.

Now it’s time for lunch, which is good, because I’m quite hungry. I hope the conference-provided lunch is decent during the tutorials, as it was last year.

[tags]oscon, oscon08, perl[/tags]

I Need Minions

My development group at work, for the last couple of years, has been composed of three senior level programmers—two highly experienced (including myself) and one hard-working, but not as experienced. This week, the other highly experienced developer left our group for supposedly greener pastures.

A couple of things resulted from this change. First and foremost, we have a lot of slack to take up, so the rest of the year will be very busy for us. Second, I am now the de facto lead developer in the group. A group for which we need to hire two more developers (we had an open position before the loss of our comrade).

Two fresh, new, dreamy eyed developers. For me to lead, to teach, to mold. I like to think of these potential developers as my minions, willing to do my bidding.

For a while, we filled our open developer position with a temporary employee. We tasked this person with the creation of a process work flow for our development efforts. Something we could use to identify tasks, categorize them, prioritize them, assign them, and sometimes even work on them. The final result of this effort looks something like this:

Old, poorly-designed process flow

No, no, no. This will never do. I can’t use this. Look at how many boxes there are. Not only that, look the sheer complexity introduced by all those decision branches! I could never trust my minions with so much independent thought. Also, I have no desire to confuse my minions any more than they already are. So I designed a new process flow, which I believe is far simpler and easier to remember.

New, easy-to-follow decision flow

Yes, this is more like it. I suspect even the simplest of minions can effectively follow this process. And if they can’t, well, we have ways of dealing with them.

So I need minions. There are a few requirements, however.

  • Familiarity with Perl (other programming languages are acceptable—except Python)
  • Experience administering Linux (or another Unix-like system, I guess)
  • Fascination for grid computing
  • Misplaced enthusiasm for supporting users
  • Blind devotion to me

Not necessarily in that order.

Falsifying Data

One of the many expensive products we use at work is Platform LSF License Scheduler. Essentially, it’s designed to coordinate the use of even more expensive licenses in one or more LSF clusters. However, like a lot of proprietary software, it has its share of bugs.

My task this week was to compensate for one of these bugs. Basically, the request was to somehow lie to License Scheduler’s data collection process, convincing it that the license counts are different than the reality. The collection process uses Macrovision‘s lmstat(1) command to gather license counts. Okay, no problem. Twenty lines of Perl later, and I have my own lmstat command, which behaves identically to the real version (which I simply execute) except the license counts have been altered.

In my group, we’re supposed to be working primarily on projects. All of these projects are assigned awkward, forgettable acronyms. So I decided that this project needed an acronym, too. Not just any old acronym, either, but something memorable. After a bit of searching through /usr/share/dict/words, I finally settled on Project FALSE: Falsifying Answers in the License Scheduler Environment.

So with my quick hack, I’ve both defeated an expensive piece of software and won the prize for the best project name so far.