OSCON 2010: State of the Onion

The Thursday sessions are over, but before I head out to the parties, I’m attending the 14th State of the Onion address. This is the always well-attended update on the universe of Perl. I immediately noticed that Larry is surrounded by his wife and his son, the former dressed as an angel, the latter as a devil.

Larry claims that so rarely does he talk about Perl in the States of the Onion addresses that he has brought his conscience with him today to prod him in the right direction (the aforementioned angel and devil).

The current state of the onion is segmented into left, central, and right sections. It can be labeled, say, 5 and 6. They can also be labeled 0 and 1, for false and true. Larry then asked a series of boolean questions, asking the audience to weigh in on the veracity.

Do you think Perl 5 and Perl 6 are really the same language?

Do you think Perl 5 and Perl 6 are really different languages?

As the angel and the devil argued, Larry pointed out that an important skill for a language designer is to be able to stay on the fence long enough until he can determine which side the grass is greener on. Sometimes you discover that you’re sitting on the wrong fence and the voices in your head start to argue about which side has the greener grass.

When the voices in your head start arguing if the purple cow eats greener grass than the brown fence, it’s time to see a doctor. Or find a better drug dealer.

— Larry Wall

This is, of course, a metaphor for being a language designer. Sometimes you sit on the fence for language features, without ever knowing which direction is the better one.

Next up is a live demo of Perl 6; or, more specifically, of Rakudo Star, which is scheduled to be released next week. Some of the demos, without comment:

.say if 6 %% $_ for 1..^6
[+] gather { take $_ if 6 %% $_ for 1..^6 }
[+] grep { 6 %% $_ }, 1..^6
~[+] grep 6 %% *, 1..^6
-> $n { $n == [+] grep $n %% *,  ..^ $n }
-> $n { $n == [+] grep $n %% *,  ..^ $n }(6)

At this point, the examples scrolled off the screen due to a “whatever” example being run. That’s good news, though. It means Rakudo Star supports lazy lists and, as such, we finally have those infinite lists we’ve been promised:

0, 1, ... *

The whatever star can, in addition to being used as in an infinite series, can be used to curry a function:

(1, 1, *+* ... *)[^20]    # Fibbonacci
(0, !* ... *)[^20]        # 0 1 0 1 0 1 ...

In a recent video interview, Larry was asked, if he were hit by a bus, has he designated anyone to be his successor as the leader of the Perl 6 project? His response was that he trusts the Perl community to choose the right person.

Onions can make you cry, so can disruptive technologies or innovations. Almost everyone has labeled their technology as disruptive. As such, the phrase has lost most of its meaning.

A disruptive technology simultaneously does something worse and does something better than its competitors. In a time of the Unix philosophy of “do one thing and do it well,” Perl came along and attempted to do everything, but didn’t necessarily do any of it well. The Unix philosophy was broken by its own utilities. No one knew what a “thing” was, and no utility of the time did it well. By the time Perl 4 turned into Perl 5, it demonstrated that a tool that was itself an entire tool shed could run circles around shell scripts.

In California, we once had many, many colonies of ants. Now, most of California is populated by a single colony of Argentine ants. This is because the colonies have forgotten how to fight with each other. Perl 6 has benefited from multiple teams creating multiple implementations, in the end working together to create a better product, even if that product takes longer to complete.

If you don’t like Camelia, you can just fork off.

— Larry Wall

The takeaway, I think: It is up to all of us to determine what Perl 6 will be. What kind of disruptive technology will it be?

OSCON 2010: Awesome Things You’ve Missed in Perl

Paul Fenwick (Perl Training Australia)

Ever since I saw An Illustrated History of Failure two years ago, I’ve made it a point to see @pjf‘s talks. That’s how I find myself in his mid-afternoon session, Awesome Things You’ve Missed in Perl. Judging by the size of the crowd, I’m not the only one. However, I won’t attempt to pass along his humour in this post. I’d never do it justice.

In his introduction, Piers Cawley asked that we go wild when Paul took the stage, so the folks in the Google Wave session next door would be taken aback, and realize that Perl is not, in fact, dead.

People are still out there writing Perl as if still in the dark ages of 2008. Paul doesn’t want us to write old Perl, but only new and shiny Perl. This talk only covers practices that have come about since Perl Best Practices was released.

Object-oriented Perl is not awesome. Not even close. If you look at the old ways of doing it, all of them are either wrong, stupid, or both. The rest are too hard. There’s a simple way to fix this: use Moose. This module does so much of the infrastructure work of composing classes, it makes object-oriented programming enjoyable again.

Paul spent a lot of time giving a humorous, high-level overview of the features available in Moose.

The Moose module contains a huge number of extension modules in the MooseX namespace.

When I have a problem, I go down to the pub with other Perl mongers and bitch.

One of the limitations of Perl, that is exposed to Moose, is that not everything is an object. This means methods like push() or isa() can’t be called on everything. And checking types defeats the purpose of polymorphism. Enter the autobox module, which turns everything into an object. As a bonus, it operates in lexical scope. Moose exposes autobox through the Moose::Autobox module.

A module that Paul wrote, autodie, which is now included in core. This lexically scoped module removes all of the boilerplate code that goes along with trapping errors from subroutines.

Not only is Perl 5.10 awesome, but Perl 5.10 regular expressions are awesome. In particular, the introduction of named captures (via %+) made regular expressions extremely awesome.

Perl 5.10 also provides grammars in the regular expression engine. This is the basis for Damian Conway’s Regexp::Grammars module.

Referring to an article on SweeperBot in The Perl Review. However, there’s the problem of distributing a program that uses half of CPAN to users of inferior operating systems, such as Microsoft Windows. That’s where the PAR module comes in. It will pack up all of the modules used by the program, including the Perl interpreter itself if necessary, so a single, self-reliant file can be distributed to users who need it.

Remember to never optimize code. Programmer time is far more valuable than CPU time. However, when you must optimize code, profile first. The Devel::NYTProf makes profiling awesome.

Code reviews are important, but Perl programmers are lazy. Fortunately, the Perl::Critic module has read Perl Best Practices for you and will complain about where your code violates the practices in the book. At my day job, it does about half the work of code reviews for me, loudly announcing violations of the coding standards that I enforce with an iron fist.

If you find an awesome module, buy the author a beer if you have the opportunity. There’s also CPAN Ratings to leave feedback or perlthanks in recent versions of Perl.

OSCON 2010: 21st Century Systems Perl

Matt Trout (Shadowcat Systems Limited)

The full title of this session is, 21st Century Systems Perl – the New Perl Enlightment for sysadmins

Introduction

While Perl isn’t dying, “PERL” most certainly is dying. This is a good thing, because it includes all the really crappy stuff, such as Matt’s Script Archive. Thank goodness for that. To be fair, this code would have been horrible written in any language. Remember, blame the artist, not the tool.

We have a very mature community, which means we also have very mature practices. We are also converging on a standard platform, even if there are more than one ways to do something.

Part 1: Minimising Developer Fatalities

As a developer, we should do what we can to make our sysadmins’ lives easier.

Right off the bat, we should use the local::lib module, which allows an application to use custom library areas without polluting the system installation areas. It can even work with /etc/skel. Matt is a big fan of using a local library path, included with the application, so it can be maintained separately from both the operating system vendor’s modules and even other applications.

Improve module installation using Module::Install.

Package modules for your distribution of choice using cpan2dist.

Improve the CPAN experience using App::cpanminus, which is amazing easy to bootstrap:

> wget cpanmin.us
> ./cpanm

Start using all of the modules associated with best practices by installing Task::Kensho.

Vendors are getting better at distributing Perl and keeping up with module releases. The Debian Perl team is the strongest, with Fedora lagging quite a bit far behind. Fedora is finally getting better, now that members of the Perl community have a say in the packaging of Perl and the modules.

After many debug sessions, Matt has come to the conclusion that mod_$lang is evil. Jamming languages into the web server is a bad, bad idea. However, actually hooking into the different handlers can be useful. Matt’s preference now is now FastCGI.

Part 2: Maximising Automation Banality

“In the systems world, shiny and exciting is not good.”

Use the autodie (in core as of 5.10) and the IPC::System::Simple modules to reduce the repetitiveness and the common errors of systems programming.

Use IO::All to fix the syntax and semantics of I/O operations.

Systems script shouldn’t need to be deployed. It should be possible to just drop the script onto a host and it will Just Work. That’s where PAR::Packer.

OSCON 2010: Dist::Zilla

Ricardo Signes (Pobox.com)

The full title of this talk is, Dist::Zilla – Maximum Overkill for CPAN Distributions.

Every CPAN distribution contains a significant amount of crap. It’s infrastructure used for the distribution tools.

ExtUtils::MakeMaker has been the traditional way to work on the infrastructure code. By necessity, it contains a lot of legacy, which can be cumbersome to maintain. Enter Module::Install, which can look in the expected places for the necessary information, such as the author name. But, the author still must write all the boilerplate. Module::Starter was written to address this, composing all the boilerplate on behalf of the author. There is so much boilerplate that, by default, Module::Starter also provides a boilerplate test to detect it.

Why are we doing all of this? How much repetitive work are we doing?

What can Dist::Zilla do for us? For starters, we can remove some files:

  • LICENSE
  • MANIFEST.SKIP
  • Makefile.PL
  • README
  • t/pod.t
  • t/pod-coverage.t

Leaving us with only our Changes file, our code, and our tests. The non-infrastructure parts. On top of that, Dist::Zilla does all of the boring distribution bits for us. It only handles the make dist command. It does not handle the make install command, which means the users who install the module don’t need all of the dependencies.

Dist::Zilla puts all of its functionality into plugins, which will be the meat of the rest of this session. It also uses a very simple INI-style configuration file.

The main command provided by the module is dzil build. This bundles the distribution, which will contain all of the infrastructure necessary for users to install the module. When building, it follows a simple work flow:

  1. Gather files
  2. Munge files
  3. Collect metadata
  4. Write out

There is no default configuration, but there is a Basic plugin bundle that will include all of the most common plugins.

What followed were examples of what the plugins can do. Of course, all of them are designed to reduce cruft—the non-code, non-documentation bits that we’re forced to maintain. The philosophy is the same one I advocate to anyone who will listen: computers are good at doing boring, repetitive tasks with derived data; why don’t we let them do more of that stuff?

I’ve followed @rjbs on Twitter for a while, and I’ve seen him talk about Dist::Zilla. I’ve wanted to try it out for a while, to simplify my distributions—both for CPAN and for my day job—but I didn’t realize until this session just how awesome the tool is. It’s a complete framework for managing Perl module distributions. Dist::Zilla will give my Laziness score a huge bump.

OSCON 2010: Thursday Plenary Session

This morning’s plenary session, based on the scheduled speakers, is focused on the nebulous cloud. The cloud is what everyone in technology talks about, but everyone defines differently. It’s the section of the flow chart where magic happens. Somehow, we will send our data into the cloud and all our wishes will be fulfilled.

To be fair, this vagueness and my pessimism are precisely why these speakers have been invited to the O’Reilly Open Source Convention. Tim O’Reilly has a grand vision for the cloud, for ubiquitous computing, and the use of technology to help solve the world’s problems. I commend him for that. I hope this morning’s speakers do justice to his vision and that, if there are valuable lessons to be learned, that we learn them.

Today’s LAMP Stack

David Recordon (Facebook)

Over the last decade, the LAMP stack hasn’t been fundamentally updated. A cache, such as memcached has been added. Different languages (Perl, Python, Ruby) have been used in place of the original PHP. Even different web servers have been used in place of Apache.

Facebook created HipHop for PHP, which compiles PHP into C++. Creating native executables in this way reduces CPU use by a large factor (a number I didn’t catch).

There are alternatives for the database component in the stack, too. MySQL is ubiquitous at this point. Facebook doesn’t really use the relational bits of MySQL very much. So they have been using databases from the NoSQL family—Hadoop, according to the presentation.

David made a point I think a lot of people miss. When evaluating databases, or any other software, first look at what problem needs solving, then find a product that solves it in the correct way. Too often I see people advocating their preferred solution without even looking at the problem.

Data is the lifeblood of Facebook (and we all have our own opinions about that). They are able to use a plethora of Open Source tools to store the data, scale the data, and analyze the data.

This talk wasn’t very focused on the cloud, aside from Facebook being a nebulous site where people store their data without really knowing where it goes or how it’s used. I expect this was more for public relations, given all the bad press they’ve had. Not that anyone stops using Facebook.

Open SETIQuest – It Will Be What You Make It!

Jill Tarter (SETI Institute)

Jill started her talk by explaining what SETI is and why it exists, which I won’t go into here, since it’s just a Google search away. I used to run SETI@home a bit over a decade ago when I was in college and felt like using my computer as a space heater.

Jill is here, representing SETI, because she wants to involve the world in their search. SETI has classroom materials covered, but they are lacking in the social networking world. Jill wants people to first identify themselves as Earthlings, recognizing our place in the Universe.

SETI, with the development of the setiQuest community, hopes to use the vast resources available in the Open Source world to improve the project. These include physical resources, such as cloud storage and compute cycles, to human resources, such as programmers and analysts.

Cloudant has created a SETI stack on the Amazon AWS infrastructure.

Open Cloud, Open Data

Jean Paoli (Microsoft)

I’m always a little concerned when I see a speaker from Microsoft at OSCON. While I can imagine that there are employees at the company who genuinely embrace Open Source—and, presumably from this talk, open data—I can’t lay aside my suspicion. Microsoft does not have a benevolent history with competition, so when a representative shows up to talk about an open cloud with open data, I instinctively look for the company’s angle. What is their nefarious plan?

Jean talked about open standards and open data. Data portability, standards, easy migration and deployment, and developer choice. For some reason, when he talks about the “open cloud,” I have thoughts about Microsoft’s OpenDocument move a few years ago. Sure, parts of it were open, but the format as a whole was useless for non-Microsoft tools.

He claimed that Microsoft Windows Azure is an open and interoperable platform. I have a hard time swallowing that. The #oscon IRC channel was not kind in its commentary. A selection from the channel logs:

<b3gl> "Microsoft totally agrees..." as long as you pay your Windows, Azure and MSSQL license fees

<alapapa> standards are great…as long as they're ours

<dbrewer> wow, thanks Microsoft.  You think I should be able to use any language I want, I appreciate your permissions.

<b3gl> dbrewer: Notice he didn't say "We believe if you want to use Linux ...."

Public Static Void

Rob Pike (Google, Inc.)

Programming languages used to be relatively simple, but still fairly powerful. They’ve gotten considerably more complex and confusing. The C++ language was used as an easy target during the talk. Rob went on to bash various (in most cases deservedly) programming languages as a way to lead into what he called the renaissance of language design.

Many of the emerging languages are dynamic and interpreted, and there’s a false dichotomy that they are considered good while the static and compiled languages are considered bad. Part of the problem is that the latter class of languages are old, designed in a different age of computing.

Obviously, the end goal of this talk was to talk about the Go language, which tries to bridge the gap between the dynamic interpreted languages and the static compiled languages.

Toward an Open Cloud

Lew Moorman (Rackspace.com)

Lew’s talk was to introduce OpenStack. Rackspace took the internal software that powers their cloud and donated it to OpenStack. I wonder if this is something we can use at my day job to build an internal cloud. The stack is licensed under the Apache 2 license and they don’t use a dual licensing model, which sounds nice.


I was wrong, the talks weren’t really about demonstrating the wonder of ubiquitous computing and how we can move in that direction so much as a showcase of organizations in the cloud or using the cloud. It was really just one long commercial.

OSCON 2010: Hands-on Cassandra

The second tutorial I attended on Tuesday, and the last one of the conference, was Hands-on Cassandra. Actually, I missed the first half of this tutorial, for reasons which I explain in my Tuesday recap post.

I’ve been told by those that attended the full tutorial that the first half wasn’t really worth attending. In fact, when I arrived at the beginning of the second half, I caught the tail end of the presenter demonstrating how he recreated Twitter using Cassandra, something he dubbed Twissandra. This seems to be the exercise of choice for any distributed system. In a way, that’s smart. Take a highly distributed system everyone is familiar with, explain the challenges faced by such a system, then demonstrate the effectiveness with which the software in question can solve the problem.

In any case, the second half of the tutorial was mostly dedicated to an explanation of how Cassandra distributes its data. The details and, frankly, the delivery weren’t that interesting for me, so I didn’t follow the discussion. It was too high level to keep my interest.

I still think that Cassandra is deserving of some investigation. I have a project in mind that it may be perfect for. At my day job, we have what is essentially a distributed, key-based data store. We’ve had to implement all of the data replication functionality. If Cassandra can alleviate the need to design and implement our own data replication and integrity systems, we can put more effort into the final delivery of the data, instead of its transmission.

OSCON 2010: Environmental Monitoring with Arduino

Russell Nelson (Open Source Initiative)

For my final session of the day, I’m in Environmental Monitoring with Arduino and Compatibles. Since I attended the Arduino tutorial on Monday, I thought it would be fun to attend a session on using them.

The take-away points, presented up front for our convenience:

  • Environmental monitoring is important
  • Arduino is cheap and easy
  • Small computers are fun

The Arduino is not just the chip and board, but the IDE used to program the board. It also, as I learned on Monday, has a very shallow learning curve.

Russell works for a company doing water monitoring of the Hudson River. He’s using his domain knowledge from his job to explain how one would do something similar on a smaller scale. The values he describes detecting, and the circuits used to take the measurements, are,

  • Temperature
  • Turbidity
  • Salinity – can’t measure this directly, but salinity conducts and we can measure resistance

Now I just need to figure out what I want to monitor at home.

OSCON 2010: Smalltalk-style Traits

Curtis “Ovid” Poe (BBC)

After a long break, an apple, a cup of coffee, and a beer, I’m back in the Perl track.

The full title of this session is, Scratching the 40-Year Itch of Inheritance with Smalltalk-style Traits.

This is not a tutorial. How to use traits is easy, but why to use them is a more complex discussion.

Inheritance is a very complex problem and an easy one to get wrong. Then people start doing things with multiple inheritance and, even if they’re not doing something deliberately stupid, they end up with diamond inheritance. Not only is this a problem, but it’s been a problem for a very long time—40 years, in fact.

Complex systems can lead to deep class hierarchies. When hierarchies are deep, in particular with a dynamic language like Perl, it becomes difficult to determine where a method came from. Even when its known where a method comes from, undesired behavior may be inherited. This becomes worse when multiple inheritance is used.

As systems grow, the problem becomes two-fold:

  1. Class responsibility – larger classes are desired
  2. Class reuse – smaller classes are desired

Inheritance, by itself, cannot solve this problem. So the solution is to
decouple the sub-problems.

Several solutions have been tried:

  • Interfaces
  • Delegation
  • Mixins – incredibly popular

As expected by the name of this session, traits (or roles in the nomenclature of Moose) solve the problem far better than any of the above solutions. Much of the session involved showing real-world application of roles to clean up code at the BBC.

OSCON 2010: Building Applications with the Simple Cloud API

Doug Tidwell (IBM)

http://www.oscon.com/oscon2010/public/schedule/detail/13976

I finally left the Perl track. I attended Tim Bunce’s presentation on Devel::NYTProf at OSCON two years ago and, while there have been many enhancements made to module since that time, I expect this year’s talk won’t differ much from the previous one.

This session on Simple Cloud is being presented by IBM’s Cloud Computing Evangelist. The drivers behind this product (is it a product?) are the development and promotion of a standard cloud API. There is some relevancy with my day job, not only because of the possibility of using cloud services, but as a way of getting ideas for the API I develop for our engineers to interact with the batch compute system.

There are several levels of where we can work. The levels start at the wire, where we have to generate and parse data ourselves. From there, we have vendor-specific APIs, service-specific APIs, and finally service-neutral APIs. This last level is where we want to be.

The Simple Cloud API covers three areas: file storage, document storage, and simple queues. Once thought of in these simplified concepts, there really isn’t any reason the interface used by a program can’t be standardized. A program should no more need to concern itself with the implementation details of an individual cloud provider than it does the details of the file system of the computer on which it runs.

The API uses the Factory and Adapter design patterns, with a configuration file used by the Factory object to determine which Adapter should be created. These patterns are exactly what I’ve been looking at for the API I work on at my day job.

A demo of the Simple Cloud API followed. There wasn’t much to these demos. The first showed listing data stored at two different providers. The second showed queue manipulation.

After the demo, the Apache libcloud, which is getting a good deal of vendor support.

OSCON 2010: PostgreSQL Reloaded

The full title of this session is PostgreSQL Reloaded – Hot Standby, Streaming Replication & More! It was presented by Chander Ganesan, who, even before the tutorial started, demonstrated his skill as a presenter. Reading his biography, I noted that he appears to be a professional trainer, which is a nice sign. He started out by waiting for a whiteboard to be delivered. Good! That means pictures will be drawn and audience interaction may take place. I really appreciate his dynamic personality and presenting style. Having gotten little sleep the night before, he was able to keep me awake and focused.

Unlike Monday, I chose tutorials on Tuesday that held some relevance to the work I’m doing. At my day job, we have a MySQL database backing a critical production system. We have spent years fighting with it and dealing with its failures and instability. I have a bias towards PostgreSQL, having used it in the past, and finding it a superior database to MySQL. That, however, is beside the point. What is pertinent is that I have been considering a complete redesign of the system, using PostgreSQL as the data source, and a tutorial on the built-in standby and replication capabilities coming with the release of PostgreSQL 9.0 is timely.

The slides for this tutorial were distributed to us when we registered. They are intended to stand on their own, serving as documentation if we later work on implementing the concepts presented here. That said, the information density of the slides didn’t at all detract from the presentation. As a hands-on demonstration, Chander didn’t project the slides very often and, when he did, only referenced them as he spent time explaining the material.

In order to better understand how PostgreSQL implements hot standby and replication, Chander first gave us an overview of how PostgreSQL manages the data a database. I’ll be brief, so this is probably not entirely correct. For efficiency, data is manipulated in 8 kilobyte pages stored in memory, in what is called the shared buffer pool. These pages remain in memory until the pool is exhausted, at which point one or ore infrequently used pages will have any changes written to disk and purged from the pool. This means that while the updates are stored in the pool, there is a (potentially long) window of time in which a crash will cause data loss. To prevent data loss, all update operations are first written to the write-ahead log (WAL) files. During a recovery operation, these WAL files can be used to play back any transactions that were lost in the crash.

Having these WAL files means that, from a given point in time, the database can be reconstructed. It’s not a stretch to shift the playback of these WAL files into real time on a secondary system. This automatically creates the possibility of a live replicated database, which can be queried in place of the primary database.

The rest of the tutorial was devoted to demonstrating how to set up and use warm standby databases, hot standby databases, and streaming replication.