OSCON 2010: Dist::Zilla

Ricardo Signes (Pobox.com)

The full title of this talk is, Dist::Zilla – Maximum Overkill for CPAN Distributions.

Every CPAN distribution contains a significant amount of crap. It’s infrastructure used for the distribution tools.

ExtUtils::MakeMaker has been the traditional way to work on the infrastructure code. By necessity, it contains a lot of legacy, which can be cumbersome to maintain. Enter Module::Install, which can look in the expected places for the necessary information, such as the author name. But, the author still must write all the boilerplate. Module::Starter was written to address this, composing all the boilerplate on behalf of the author. There is so much boilerplate that, by default, Module::Starter also provides a boilerplate test to detect it.

Why are we doing all of this? How much repetitive work are we doing?

What can Dist::Zilla do for us? For starters, we can remove some files:

  • LICENSE
  • MANIFEST.SKIP
  • Makefile.PL
  • README
  • t/pod.t
  • t/pod-coverage.t

Leaving us with only our Changes file, our code, and our tests. The non-infrastructure parts. On top of that, Dist::Zilla does all of the boring distribution bits for us. It only handles the make dist command. It does not handle the make install command, which means the users who install the module don’t need all of the dependencies.

Dist::Zilla puts all of its functionality into plugins, which will be the meat of the rest of this session. It also uses a very simple INI-style configuration file.

The main command provided by the module is dzil build. This bundles the distribution, which will contain all of the infrastructure necessary for users to install the module. When building, it follows a simple work flow:

  1. Gather files
  2. Munge files
  3. Collect metadata
  4. Write out

There is no default configuration, but there is a Basic plugin bundle that will include all of the most common plugins.

What followed were examples of what the plugins can do. Of course, all of them are designed to reduce cruft—the non-code, non-documentation bits that we’re forced to maintain. The philosophy is the same one I advocate to anyone who will listen: computers are good at doing boring, repetitive tasks with derived data; why don’t we let them do more of that stuff?

I’ve followed @rjbs on Twitter for a while, and I’ve seen him talk about Dist::Zilla. I’ve wanted to try it out for a while, to simplify my distributions—both for CPAN and for my day job—but I didn’t realize until this session just how awesome the tool is. It’s a complete framework for managing Perl module distributions. Dist::Zilla will give my Laziness score a huge bump.

OSCON 2010: Thursday Plenary Session

This morning’s plenary session, based on the scheduled speakers, is focused on the nebulous cloud. The cloud is what everyone in technology talks about, but everyone defines differently. It’s the section of the flow chart where magic happens. Somehow, we will send our data into the cloud and all our wishes will be fulfilled.

To be fair, this vagueness and my pessimism are precisely why these speakers have been invited to the O’Reilly Open Source Convention. Tim O’Reilly has a grand vision for the cloud, for ubiquitous computing, and the use of technology to help solve the world’s problems. I commend him for that. I hope this morning’s speakers do justice to his vision and that, if there are valuable lessons to be learned, that we learn them.

Today’s LAMP Stack

David Recordon (Facebook)

Over the last decade, the LAMP stack hasn’t been fundamentally updated. A cache, such as memcached has been added. Different languages (Perl, Python, Ruby) have been used in place of the original PHP. Even different web servers have been used in place of Apache.

Facebook created HipHop for PHP, which compiles PHP into C++. Creating native executables in this way reduces CPU use by a large factor (a number I didn’t catch).

There are alternatives for the database component in the stack, too. MySQL is ubiquitous at this point. Facebook doesn’t really use the relational bits of MySQL very much. So they have been using databases from the NoSQL family—Hadoop, according to the presentation.

David made a point I think a lot of people miss. When evaluating databases, or any other software, first look at what problem needs solving, then find a product that solves it in the correct way. Too often I see people advocating their preferred solution without even looking at the problem.

Data is the lifeblood of Facebook (and we all have our own opinions about that). They are able to use a plethora of Open Source tools to store the data, scale the data, and analyze the data.

This talk wasn’t very focused on the cloud, aside from Facebook being a nebulous site where people store their data without really knowing where it goes or how it’s used. I expect this was more for public relations, given all the bad press they’ve had. Not that anyone stops using Facebook.

Open SETIQuest – It Will Be What You Make It!

Jill Tarter (SETI Institute)

Jill started her talk by explaining what SETI is and why it exists, which I won’t go into here, since it’s just a Google search away. I used to run SETI@home a bit over a decade ago when I was in college and felt like using my computer as a space heater.

Jill is here, representing SETI, because she wants to involve the world in their search. SETI has classroom materials covered, but they are lacking in the social networking world. Jill wants people to first identify themselves as Earthlings, recognizing our place in the Universe.

SETI, with the development of the setiQuest community, hopes to use the vast resources available in the Open Source world to improve the project. These include physical resources, such as cloud storage and compute cycles, to human resources, such as programmers and analysts.

Cloudant has created a SETI stack on the Amazon AWS infrastructure.

Open Cloud, Open Data

Jean Paoli (Microsoft)

I’m always a little concerned when I see a speaker from Microsoft at OSCON. While I can imagine that there are employees at the company who genuinely embrace Open Source—and, presumably from this talk, open data—I can’t lay aside my suspicion. Microsoft does not have a benevolent history with competition, so when a representative shows up to talk about an open cloud with open data, I instinctively look for the company’s angle. What is their nefarious plan?

Jean talked about open standards and open data. Data portability, standards, easy migration and deployment, and developer choice. For some reason, when he talks about the “open cloud,” I have thoughts about Microsoft’s OpenDocument move a few years ago. Sure, parts of it were open, but the format as a whole was useless for non-Microsoft tools.

He claimed that Microsoft Windows Azure is an open and interoperable platform. I have a hard time swallowing that. The #oscon IRC channel was not kind in its commentary. A selection from the channel logs:

<b3gl> "Microsoft totally agrees..." as long as you pay your Windows, Azure and MSSQL license fees

<alapapa> standards are great…as long as they're ours

<dbrewer> wow, thanks Microsoft.  You think I should be able to use any language I want, I appreciate your permissions.

<b3gl> dbrewer: Notice he didn't say "We believe if you want to use Linux ...."

Public Static Void

Rob Pike (Google, Inc.)

Programming languages used to be relatively simple, but still fairly powerful. They’ve gotten considerably more complex and confusing. The C++ language was used as an easy target during the talk. Rob went on to bash various (in most cases deservedly) programming languages as a way to lead into what he called the renaissance of language design.

Many of the emerging languages are dynamic and interpreted, and there’s a false dichotomy that they are considered good while the static and compiled languages are considered bad. Part of the problem is that the latter class of languages are old, designed in a different age of computing.

Obviously, the end goal of this talk was to talk about the Go language, which tries to bridge the gap between the dynamic interpreted languages and the static compiled languages.

Toward an Open Cloud

Lew Moorman (Rackspace.com)

Lew’s talk was to introduce OpenStack. Rackspace took the internal software that powers their cloud and donated it to OpenStack. I wonder if this is something we can use at my day job to build an internal cloud. The stack is licensed under the Apache 2 license and they don’t use a dual licensing model, which sounds nice.


I was wrong, the talks weren’t really about demonstrating the wonder of ubiquitous computing and how we can move in that direction so much as a showcase of organizations in the cloud or using the cloud. It was really just one long commercial.

OSCON 2010: Hands-on Cassandra

The second tutorial I attended on Tuesday, and the last one of the conference, was Hands-on Cassandra. Actually, I missed the first half of this tutorial, for reasons which I explain in my Tuesday recap post.

I’ve been told by those that attended the full tutorial that the first half wasn’t really worth attending. In fact, when I arrived at the beginning of the second half, I caught the tail end of the presenter demonstrating how he recreated Twitter using Cassandra, something he dubbed Twissandra. This seems to be the exercise of choice for any distributed system. In a way, that’s smart. Take a highly distributed system everyone is familiar with, explain the challenges faced by such a system, then demonstrate the effectiveness with which the software in question can solve the problem.

In any case, the second half of the tutorial was mostly dedicated to an explanation of how Cassandra distributes its data. The details and, frankly, the delivery weren’t that interesting for me, so I didn’t follow the discussion. It was too high level to keep my interest.

I still think that Cassandra is deserving of some investigation. I have a project in mind that it may be perfect for. At my day job, we have what is essentially a distributed, key-based data store. We’ve had to implement all of the data replication functionality. If Cassandra can alleviate the need to design and implement our own data replication and integrity systems, we can put more effort into the final delivery of the data, instead of its transmission.

OSCON 2010: Environmental Monitoring with Arduino

Russell Nelson (Open Source Initiative)

For my final session of the day, I’m in Environmental Monitoring with Arduino and Compatibles. Since I attended the Arduino tutorial on Monday, I thought it would be fun to attend a session on using them.

The take-away points, presented up front for our convenience:

  • Environmental monitoring is important
  • Arduino is cheap and easy
  • Small computers are fun

The Arduino is not just the chip and board, but the IDE used to program the board. It also, as I learned on Monday, has a very shallow learning curve.

Russell works for a company doing water monitoring of the Hudson River. He’s using his domain knowledge from his job to explain how one would do something similar on a smaller scale. The values he describes detecting, and the circuits used to take the measurements, are,

  • Temperature
  • Turbidity
  • Salinity – can’t measure this directly, but salinity conducts and we can measure resistance

Now I just need to figure out what I want to monitor at home.

OSCON 2010: Smalltalk-style Traits

Curtis “Ovid” Poe (BBC)

After a long break, an apple, a cup of coffee, and a beer, I’m back in the Perl track.

The full title of this session is, Scratching the 40-Year Itch of Inheritance with Smalltalk-style Traits.

This is not a tutorial. How to use traits is easy, but why to use them is a more complex discussion.

Inheritance is a very complex problem and an easy one to get wrong. Then people start doing things with multiple inheritance and, even if they’re not doing something deliberately stupid, they end up with diamond inheritance. Not only is this a problem, but it’s been a problem for a very long time—40 years, in fact.

Complex systems can lead to deep class hierarchies. When hierarchies are deep, in particular with a dynamic language like Perl, it becomes difficult to determine where a method came from. Even when its known where a method comes from, undesired behavior may be inherited. This becomes worse when multiple inheritance is used.

As systems grow, the problem becomes two-fold:

  1. Class responsibility – larger classes are desired
  2. Class reuse – smaller classes are desired

Inheritance, by itself, cannot solve this problem. So the solution is to
decouple the sub-problems.

Several solutions have been tried:

  • Interfaces
  • Delegation
  • Mixins – incredibly popular

As expected by the name of this session, traits (or roles in the nomenclature of Moose) solve the problem far better than any of the above solutions. Much of the session involved showing real-world application of roles to clean up code at the BBC.

OSCON 2010: Building Applications with the Simple Cloud API

Doug Tidwell (IBM)

http://www.oscon.com/oscon2010/public/schedule/detail/13976

I finally left the Perl track. I attended Tim Bunce’s presentation on Devel::NYTProf at OSCON two years ago and, while there have been many enhancements made to module since that time, I expect this year’s talk won’t differ much from the previous one.

This session on Simple Cloud is being presented by IBM’s Cloud Computing Evangelist. The drivers behind this product (is it a product?) are the development and promotion of a standard cloud API. There is some relevancy with my day job, not only because of the possibility of using cloud services, but as a way of getting ideas for the API I develop for our engineers to interact with the batch compute system.

There are several levels of where we can work. The levels start at the wire, where we have to generate and parse data ourselves. From there, we have vendor-specific APIs, service-specific APIs, and finally service-neutral APIs. This last level is where we want to be.

The Simple Cloud API covers three areas: file storage, document storage, and simple queues. Once thought of in these simplified concepts, there really isn’t any reason the interface used by a program can’t be standardized. A program should no more need to concern itself with the implementation details of an individual cloud provider than it does the details of the file system of the computer on which it runs.

The API uses the Factory and Adapter design patterns, with a configuration file used by the Factory object to determine which Adapter should be created. These patterns are exactly what I’ve been looking at for the API I work on at my day job.

A demo of the Simple Cloud API followed. There wasn’t much to these demos. The first showed listing data stored at two different providers. The second showed queue manipulation.

After the demo, the Apache libcloud, which is getting a good deal of vendor support.

OSCON 2010: PostgreSQL Reloaded

The full title of this session is PostgreSQL Reloaded – Hot Standby, Streaming Replication & More! It was presented by Chander Ganesan, who, even before the tutorial started, demonstrated his skill as a presenter. Reading his biography, I noted that he appears to be a professional trainer, which is a nice sign. He started out by waiting for a whiteboard to be delivered. Good! That means pictures will be drawn and audience interaction may take place. I really appreciate his dynamic personality and presenting style. Having gotten little sleep the night before, he was able to keep me awake and focused.

Unlike Monday, I chose tutorials on Tuesday that held some relevance to the work I’m doing. At my day job, we have a MySQL database backing a critical production system. We have spent years fighting with it and dealing with its failures and instability. I have a bias towards PostgreSQL, having used it in the past, and finding it a superior database to MySQL. That, however, is beside the point. What is pertinent is that I have been considering a complete redesign of the system, using PostgreSQL as the data source, and a tutorial on the built-in standby and replication capabilities coming with the release of PostgreSQL 9.0 is timely.

The slides for this tutorial were distributed to us when we registered. They are intended to stand on their own, serving as documentation if we later work on implementing the concepts presented here. That said, the information density of the slides didn’t at all detract from the presentation. As a hands-on demonstration, Chander didn’t project the slides very often and, when he did, only referenced them as he spent time explaining the material.

In order to better understand how PostgreSQL implements hot standby and replication, Chander first gave us an overview of how PostgreSQL manages the data a database. I’ll be brief, so this is probably not entirely correct. For efficiency, data is manipulated in 8 kilobyte pages stored in memory, in what is called the shared buffer pool. These pages remain in memory until the pool is exhausted, at which point one or ore infrequently used pages will have any changes written to disk and purged from the pool. This means that while the updates are stored in the pool, there is a (potentially long) window of time in which a crash will cause data loss. To prevent data loss, all update operations are first written to the write-ahead log (WAL) files. During a recovery operation, these WAL files can be used to play back any transactions that were lost in the crash.

Having these WAL files means that, from a given point in time, the database can be reconstructed. It’s not a stretch to shift the playback of these WAL files into real time on a secondary system. This automatically creates the possibility of a live replicated database, which can be queried in place of the primary database.

The rest of the tutorial was devoted to demonstrating how to set up and use warm standby databases, hot standby databases, and streaming replication.

OSCON 2010: Cool Perl 6 Today

Patrick Michaud (pmichaud.com)

I’m just back from lunch at Burgerville with Juan and Jonathan. On the way back into the convention center, I ran into Alasdair, who has been attending the hardware hacking sessions. That made me think that I may want to try to find non-Perl sessions to attend. After all, I tend to keep up with Perl news, so the sessions are of marginal usefulness. Unfortunately, nothing on the schedule looked very interesting to me. I was curious about the session on Open Source Tool Chains for Cloud Computing until I read the description. While it looked cool, it wouldn’t be useful for me in my work. The session would go through provisioning, setup, and maintenance of hosts, all of which we already have well-entrenched solutions for in my day job. So, I ended up back in the Perl track. My friends in the San Diego Perl Mongers group will appreciate that, I think.

Anyway, on to the session.

The name Perl 6 is a language specification, rather than any particular implementation. All of the references and links off-handedly mentioned in this post are available from the Perl 6 website.

Patrick is the lead developer of Rakudo Perl, which is the most feature complete and up-to-date.

Perl 6 has a language specification and a test suite. There are still many places in Perl 6 that are not being tested yet.

Rakudo * (Star) is scheduled to be released a week from tomorrow, targeted at being a useful, usable, early adopter distribution.

At this point, Patrick began to enumerate the new language features and how they work in Perl 6, such as variables, loops, interpolation, and so on. I won’t go into these here, since there are numerous places on the Web where this has been documented.

About half way through this session, I realized that “r0ml” was presenting in another room. If I’d noticed that before, I would have attended that session.

OSCON 2010: Perl 5.12

Jesse Vincent (Best Practical)

This talk could be titled something along the lines of “Lessons Learned from Project Management.”

Jesse Vincent is the current Perl 5 pumpking, which for the moment can be thought of as the project janitor.

People who say “Perl is dead” or that Perl hackers are “desperate” are behind the times.

There are a lot of exiting things happening that are not in the Perl core. Audrey Tang has said that “CPAN is the language, Perl is the syntax.” Like Piers in the previous session, Jesse enumerated a handful of things that make Perl awesome:

While some of the coolest new things happening in the CPAN world, it merely scratches the surface of what is available.

About three months ago, Jesse uploaded Perl 5.12. Amazingly, no one has reported any critical regressions.

Jesse has been assured that Rakudo * will be out next week, on 29 July. However, Perl 6 will not replace Perl 5, which has paid Jesse’s mortgage for many, many years. Also, thanks to Perl 5.12, Perl 5.10 is no longer “too new to use.”

Perl 5.12 marks the latest release in the process of cleaning up the inernals and adding much desired features. Some of the highlights:

  • Deprecations warn by default
  • suidperl is dead
  • package Foo::Bar 1.0; – better version import syntax
  • Y2038 compliant – thanks to Schwern
  • Unicode improvements; upgrade to 5.2
  • Pluggable keywords
  • Overridable function lookup
  • Dtrace support
  • Deprecated modules – Class::ISA, Pod::Plainer, Shell, Switch (but still on CPAN)
  • Yadda, yadda, yadda operator

Jesse believes the best new thing in Perl 5.12 is the release process, including him as the pumpking. Twenty years ago, Perl didn’t use version control. He recommends learning from this mistake.

It took five years to release Perl 5.10, after burning through two pumpkings.

Before 5.12, maintenance releases contained all sorts of bug fixes and updates, but could not break binary compatibility. Doing so was a huge task, was very difficult, and, contrary to its name, is unmaintainable. Even without all this work, the pumpking’s job is a lot of work. Jesse really doesn’t want to burn out after a release of Perl.

Traditionally, the process of turning someone with the necessary skills to be the pumpking involves preventing them from using those skills and replacing them with management skills. It’s a shame.

The system is broken and Perl 5 isn’t going anywhere, so how can it be fixed? We can reinvent it, but that’s already being done by Perl 6. Alternatively, we can refactor it. There is no reason many of the skills and duties required of the pumpking can’t be delegated out to people with those skills. In effect, the most important skill and duty for the pumpking is project management.

The 5.9 releases, leading up to 5.10, were haphazard. The 5.11 releases, leading up to 5.12, have settled into a new release every month on the twentieth, with a couple of exceptions. The 5.13 series has followed suit. One of the reasons this was possible was documenting the entire release process.

Releases in the 5.12 series are on a fixed schedule, every three months. A release schedule has been created for 5.14, too.

One of the things I’ve learned working in an enterprise and my observations of the Fedora Project is that good project management is vital. Jesse Vincent is exactly what Perl needed and he continues to demonstrate that, with regular, high quality releases of Perl. What’s more, he is a good spokesman for the project, being able to come to OSCON and give a session on all of this detail in a cojent and interesting format.

OSCON 2010: New Beginnings in Perl 5

Piers Cawley (BBC)

After reviewing today’s session schedule, I quickly came to the conclusion that I will spend my entire day sitting in the room “Portland 256.” This is, apparently, where the Perl track is located.

Paul Fenwick introduced Piers in song, to the tune of Gilligan’s Island.

Piers switched from Perl to Ruby a while back and swore that he wouldn’t return to Perl until 6. Facetiously, the reason he switched to Ruby was the handsome community associated with it and he reason he switched back to Perl was the amazingly supportive community associted with it. He began with a point about programming style. We think of code as describing what we are doing, but in reality the majority of our code actually describes how we are doing it. This infrastructure code is noise.

More seriously, he absolutely hated unrolling the @_ variable in every function. In such a high level language like Perl, why must we pop arguments off the stack in the same manner we would in an assembly language? This leads to long subroutines, every single one containing anti-patterns designed to implement the language infrastructure, instead of the language doing the work for us.

Moose does a lot to improve writing classes, using a more declarative syntax. However, even within Moose methods we need to write the infrastructure code. The MooseX::Declare module solves this problem, giving method syntax a more declarative style. By moving the infrastructure code out of sight, we can better focus on what we are trying to do, rather than how we are doing it.

Piers proceeded to list the modules that “rock” and brought him back to Perl:

Perl’s object-orientation absolutely “sucks.” However, this turns out to be a good thing. It allows very clever people to create modules that extend the semantics of the language. In a language like Ruby, which has a good object-orientation built-in, it’s essentially stuck. If, in the future better ideas of object-orientation are developed, they can be implemented in Perl far more easily than in Ruby. An interesting point: sometimes when the tool sucks, things are better. People develop layers of tools that enhance and extend the original.

It also helps that the Perl release schedule has accelerated.

Piers continued with a high-level, hand-waving explanation of how MooseX::Declare works. While not informative, it was entertaining. Including a video of Matt Trout attempting to hypnotize the room.

Piers ended by thanking the Perl community and expressing how good it feels to be back into it and developing in Perl again.