OSCON 2010: Thursday Plenary Session

This morning’s plenary session, based on the scheduled speakers, is focused on the nebulous cloud. The cloud is what everyone in technology talks about, but everyone defines differently. It’s the section of the flow chart where magic happens. Somehow, we will send our data into the cloud and all our wishes will be fulfilled.

To be fair, this vagueness and my pessimism are precisely why these speakers have been invited to the O’Reilly Open Source Convention. Tim O’Reilly has a grand vision for the cloud, for ubiquitous computing, and the use of technology to help solve the world’s problems. I commend him for that. I hope this morning’s speakers do justice to his vision and that, if there are valuable lessons to be learned, that we learn them.

Today’s LAMP Stack

David Recordon (Facebook)

Over the last decade, the LAMP stack hasn’t been fundamentally updated. A cache, such as memcached has been added. Different languages (Perl, Python, Ruby) have been used in place of the original PHP. Even different web servers have been used in place of Apache.

Facebook created HipHop for PHP, which compiles PHP into C++. Creating native executables in this way reduces CPU use by a large factor (a number I didn’t catch).

There are alternatives for the database component in the stack, too. MySQL is ubiquitous at this point. Facebook doesn’t really use the relational bits of MySQL very much. So they have been using databases from the NoSQL family—Hadoop, according to the presentation.

David made a point I think a lot of people miss. When evaluating databases, or any other software, first look at what problem needs solving, then find a product that solves it in the correct way. Too often I see people advocating their preferred solution without even looking at the problem.

Data is the lifeblood of Facebook (and we all have our own opinions about that). They are able to use a plethora of Open Source tools to store the data, scale the data, and analyze the data.

This talk wasn’t very focused on the cloud, aside from Facebook being a nebulous site where people store their data without really knowing where it goes or how it’s used. I expect this was more for public relations, given all the bad press they’ve had. Not that anyone stops using Facebook.

Open SETIQuest – It Will Be What You Make It!

Jill Tarter (SETI Institute)

Jill started her talk by explaining what SETI is and why it exists, which I won’t go into here, since it’s just a Google search away. I used to run SETI@home a bit over a decade ago when I was in college and felt like using my computer as a space heater.

Jill is here, representing SETI, because she wants to involve the world in their search. SETI has classroom materials covered, but they are lacking in the social networking world. Jill wants people to first identify themselves as Earthlings, recognizing our place in the Universe.

SETI, with the development of the setiQuest community, hopes to use the vast resources available in the Open Source world to improve the project. These include physical resources, such as cloud storage and compute cycles, to human resources, such as programmers and analysts.

Cloudant has created a SETI stack on the Amazon AWS infrastructure.

Open Cloud, Open Data

Jean Paoli (Microsoft)

I’m always a little concerned when I see a speaker from Microsoft at OSCON. While I can imagine that there are employees at the company who genuinely embrace Open Source—and, presumably from this talk, open data—I can’t lay aside my suspicion. Microsoft does not have a benevolent history with competition, so when a representative shows up to talk about an open cloud with open data, I instinctively look for the company’s angle. What is their nefarious plan?

Jean talked about open standards and open data. Data portability, standards, easy migration and deployment, and developer choice. For some reason, when he talks about the “open cloud,” I have thoughts about Microsoft’s OpenDocument move a few years ago. Sure, parts of it were open, but the format as a whole was useless for non-Microsoft tools.

He claimed that Microsoft Windows Azure is an open and interoperable platform. I have a hard time swallowing that. The #oscon IRC channel was not kind in its commentary. A selection from the channel logs:

<b3gl> "Microsoft totally agrees..." as long as you pay your Windows, Azure and MSSQL license fees

<alapapa> standards are great…as long as they're ours

<dbrewer> wow, thanks Microsoft.  You think I should be able to use any language I want, I appreciate your permissions.

<b3gl> dbrewer: Notice he didn't say "We believe if you want to use Linux ...."

Public Static Void

Rob Pike (Google, Inc.)

Programming languages used to be relatively simple, but still fairly powerful. They’ve gotten considerably more complex and confusing. The C++ language was used as an easy target during the talk. Rob went on to bash various (in most cases deservedly) programming languages as a way to lead into what he called the renaissance of language design.

Many of the emerging languages are dynamic and interpreted, and there’s a false dichotomy that they are considered good while the static and compiled languages are considered bad. Part of the problem is that the latter class of languages are old, designed in a different age of computing.

Obviously, the end goal of this talk was to talk about the Go language, which tries to bridge the gap between the dynamic interpreted languages and the static compiled languages.

Toward an Open Cloud

Lew Moorman (Rackspace.com)

Lew’s talk was to introduce OpenStack. Rackspace took the internal software that powers their cloud and donated it to OpenStack. I wonder if this is something we can use at my day job to build an internal cloud. The stack is licensed under the Apache 2 license and they don’t use a dual licensing model, which sounds nice.


I was wrong, the talks weren’t really about demonstrating the wonder of ubiquitous computing and how we can move in that direction so much as a showcase of organizations in the cloud or using the cloud. It was really just one long commercial.

OSCON 2010: Hands-on Cassandra

The second tutorial I attended on Tuesday, and the last one of the conference, was Hands-on Cassandra. Actually, I missed the first half of this tutorial, for reasons which I explain in my Tuesday recap post.

I’ve been told by those that attended the full tutorial that the first half wasn’t really worth attending. In fact, when I arrived at the beginning of the second half, I caught the tail end of the presenter demonstrating how he recreated Twitter using Cassandra, something he dubbed Twissandra. This seems to be the exercise of choice for any distributed system. In a way, that’s smart. Take a highly distributed system everyone is familiar with, explain the challenges faced by such a system, then demonstrate the effectiveness with which the software in question can solve the problem.

In any case, the second half of the tutorial was mostly dedicated to an explanation of how Cassandra distributes its data. The details and, frankly, the delivery weren’t that interesting for me, so I didn’t follow the discussion. It was too high level to keep my interest.

I still think that Cassandra is deserving of some investigation. I have a project in mind that it may be perfect for. At my day job, we have what is essentially a distributed, key-based data store. We’ve had to implement all of the data replication functionality. If Cassandra can alleviate the need to design and implement our own data replication and integrity systems, we can put more effort into the final delivery of the data, instead of its transmission.

OSCON 2010: Environmental Monitoring with Arduino

Russell Nelson (Open Source Initiative)

For my final session of the day, I’m in Environmental Monitoring with Arduino and Compatibles. Since I attended the Arduino tutorial on Monday, I thought it would be fun to attend a session on using them.

The take-away points, presented up front for our convenience:

  • Environmental monitoring is important
  • Arduino is cheap and easy
  • Small computers are fun

The Arduino is not just the chip and board, but the IDE used to program the board. It also, as I learned on Monday, has a very shallow learning curve.

Russell works for a company doing water monitoring of the Hudson River. He’s using his domain knowledge from his job to explain how one would do something similar on a smaller scale. The values he describes detecting, and the circuits used to take the measurements, are,

  • Temperature
  • Turbidity
  • Salinity – can’t measure this directly, but salinity conducts and we can measure resistance

Now I just need to figure out what I want to monitor at home.

OSCON 2010: Smalltalk-style Traits

Curtis “Ovid” Poe (BBC)

After a long break, an apple, a cup of coffee, and a beer, I’m back in the Perl track.

The full title of this session is, Scratching the 40-Year Itch of Inheritance with Smalltalk-style Traits.

This is not a tutorial. How to use traits is easy, but why to use them is a more complex discussion.

Inheritance is a very complex problem and an easy one to get wrong. Then people start doing things with multiple inheritance and, even if they’re not doing something deliberately stupid, they end up with diamond inheritance. Not only is this a problem, but it’s been a problem for a very long time—40 years, in fact.

Complex systems can lead to deep class hierarchies. When hierarchies are deep, in particular with a dynamic language like Perl, it becomes difficult to determine where a method came from. Even when its known where a method comes from, undesired behavior may be inherited. This becomes worse when multiple inheritance is used.

As systems grow, the problem becomes two-fold:

  1. Class responsibility – larger classes are desired
  2. Class reuse – smaller classes are desired

Inheritance, by itself, cannot solve this problem. So the solution is to
decouple the sub-problems.

Several solutions have been tried:

  • Interfaces
  • Delegation
  • Mixins – incredibly popular

As expected by the name of this session, traits (or roles in the nomenclature of Moose) solve the problem far better than any of the above solutions. Much of the session involved showing real-world application of roles to clean up code at the BBC.

OSCON 2010: Building Applications with the Simple Cloud API

Doug Tidwell (IBM)

http://www.oscon.com/oscon2010/public/schedule/detail/13976

I finally left the Perl track. I attended Tim Bunce’s presentation on Devel::NYTProf at OSCON two years ago and, while there have been many enhancements made to module since that time, I expect this year’s talk won’t differ much from the previous one.

This session on Simple Cloud is being presented by IBM’s Cloud Computing Evangelist. The drivers behind this product (is it a product?) are the development and promotion of a standard cloud API. There is some relevancy with my day job, not only because of the possibility of using cloud services, but as a way of getting ideas for the API I develop for our engineers to interact with the batch compute system.

There are several levels of where we can work. The levels start at the wire, where we have to generate and parse data ourselves. From there, we have vendor-specific APIs, service-specific APIs, and finally service-neutral APIs. This last level is where we want to be.

The Simple Cloud API covers three areas: file storage, document storage, and simple queues. Once thought of in these simplified concepts, there really isn’t any reason the interface used by a program can’t be standardized. A program should no more need to concern itself with the implementation details of an individual cloud provider than it does the details of the file system of the computer on which it runs.

The API uses the Factory and Adapter design patterns, with a configuration file used by the Factory object to determine which Adapter should be created. These patterns are exactly what I’ve been looking at for the API I work on at my day job.

A demo of the Simple Cloud API followed. There wasn’t much to these demos. The first showed listing data stored at two different providers. The second showed queue manipulation.

After the demo, the Apache libcloud, which is getting a good deal of vendor support.

OSCON 2010: PostgreSQL Reloaded

The full title of this session is PostgreSQL Reloaded – Hot Standby, Streaming Replication & More! It was presented by Chander Ganesan, who, even before the tutorial started, demonstrated his skill as a presenter. Reading his biography, I noted that he appears to be a professional trainer, which is a nice sign. He started out by waiting for a whiteboard to be delivered. Good! That means pictures will be drawn and audience interaction may take place. I really appreciate his dynamic personality and presenting style. Having gotten little sleep the night before, he was able to keep me awake and focused.

Unlike Monday, I chose tutorials on Tuesday that held some relevance to the work I’m doing. At my day job, we have a MySQL database backing a critical production system. We have spent years fighting with it and dealing with its failures and instability. I have a bias towards PostgreSQL, having used it in the past, and finding it a superior database to MySQL. That, however, is beside the point. What is pertinent is that I have been considering a complete redesign of the system, using PostgreSQL as the data source, and a tutorial on the built-in standby and replication capabilities coming with the release of PostgreSQL 9.0 is timely.

The slides for this tutorial were distributed to us when we registered. They are intended to stand on their own, serving as documentation if we later work on implementing the concepts presented here. That said, the information density of the slides didn’t at all detract from the presentation. As a hands-on demonstration, Chander didn’t project the slides very often and, when he did, only referenced them as he spent time explaining the material.

In order to better understand how PostgreSQL implements hot standby and replication, Chander first gave us an overview of how PostgreSQL manages the data a database. I’ll be brief, so this is probably not entirely correct. For efficiency, data is manipulated in 8 kilobyte pages stored in memory, in what is called the shared buffer pool. These pages remain in memory until the pool is exhausted, at which point one or ore infrequently used pages will have any changes written to disk and purged from the pool. This means that while the updates are stored in the pool, there is a (potentially long) window of time in which a crash will cause data loss. To prevent data loss, all update operations are first written to the write-ahead log (WAL) files. During a recovery operation, these WAL files can be used to play back any transactions that were lost in the crash.

Having these WAL files means that, from a given point in time, the database can be reconstructed. It’s not a stretch to shift the playback of these WAL files into real time on a secondary system. This automatically creates the possibility of a live replicated database, which can be queried in place of the primary database.

The rest of the tutorial was devoted to demonstrating how to set up and use warm standby databases, hot standby databases, and streaming replication.

OSCON 2010: Cool Perl 6 Today

Patrick Michaud (pmichaud.com)

I’m just back from lunch at Burgerville with Juan and Jonathan. On the way back into the convention center, I ran into Alasdair, who has been attending the hardware hacking sessions. That made me think that I may want to try to find non-Perl sessions to attend. After all, I tend to keep up with Perl news, so the sessions are of marginal usefulness. Unfortunately, nothing on the schedule looked very interesting to me. I was curious about the session on Open Source Tool Chains for Cloud Computing until I read the description. While it looked cool, it wouldn’t be useful for me in my work. The session would go through provisioning, setup, and maintenance of hosts, all of which we already have well-entrenched solutions for in my day job. So, I ended up back in the Perl track. My friends in the San Diego Perl Mongers group will appreciate that, I think.

Anyway, on to the session.

The name Perl 6 is a language specification, rather than any particular implementation. All of the references and links off-handedly mentioned in this post are available from the Perl 6 website.

Patrick is the lead developer of Rakudo Perl, which is the most feature complete and up-to-date.

Perl 6 has a language specification and a test suite. There are still many places in Perl 6 that are not being tested yet.

Rakudo * (Star) is scheduled to be released a week from tomorrow, targeted at being a useful, usable, early adopter distribution.

At this point, Patrick began to enumerate the new language features and how they work in Perl 6, such as variables, loops, interpolation, and so on. I won’t go into these here, since there are numerous places on the Web where this has been documented.

About half way through this session, I realized that “r0ml” was presenting in another room. If I’d noticed that before, I would have attended that session.

OSCON 2010: Perl 5.12

Jesse Vincent (Best Practical)

This talk could be titled something along the lines of “Lessons Learned from Project Management.”

Jesse Vincent is the current Perl 5 pumpking, which for the moment can be thought of as the project janitor.

People who say “Perl is dead” or that Perl hackers are “desperate” are behind the times.

There are a lot of exiting things happening that are not in the Perl core. Audrey Tang has said that “CPAN is the language, Perl is the syntax.” Like Piers in the previous session, Jesse enumerated a handful of things that make Perl awesome:

While some of the coolest new things happening in the CPAN world, it merely scratches the surface of what is available.

About three months ago, Jesse uploaded Perl 5.12. Amazingly, no one has reported any critical regressions.

Jesse has been assured that Rakudo * will be out next week, on 29 July. However, Perl 6 will not replace Perl 5, which has paid Jesse’s mortgage for many, many years. Also, thanks to Perl 5.12, Perl 5.10 is no longer “too new to use.”

Perl 5.12 marks the latest release in the process of cleaning up the inernals and adding much desired features. Some of the highlights:

  • Deprecations warn by default
  • suidperl is dead
  • package Foo::Bar 1.0; – better version import syntax
  • Y2038 compliant – thanks to Schwern
  • Unicode improvements; upgrade to 5.2
  • Pluggable keywords
  • Overridable function lookup
  • Dtrace support
  • Deprecated modules – Class::ISA, Pod::Plainer, Shell, Switch (but still on CPAN)
  • Yadda, yadda, yadda operator

Jesse believes the best new thing in Perl 5.12 is the release process, including him as the pumpking. Twenty years ago, Perl didn’t use version control. He recommends learning from this mistake.

It took five years to release Perl 5.10, after burning through two pumpkings.

Before 5.12, maintenance releases contained all sorts of bug fixes and updates, but could not break binary compatibility. Doing so was a huge task, was very difficult, and, contrary to its name, is unmaintainable. Even without all this work, the pumpking’s job is a lot of work. Jesse really doesn’t want to burn out after a release of Perl.

Traditionally, the process of turning someone with the necessary skills to be the pumpking involves preventing them from using those skills and replacing them with management skills. It’s a shame.

The system is broken and Perl 5 isn’t going anywhere, so how can it be fixed? We can reinvent it, but that’s already being done by Perl 6. Alternatively, we can refactor it. There is no reason many of the skills and duties required of the pumpking can’t be delegated out to people with those skills. In effect, the most important skill and duty for the pumpking is project management.

The 5.9 releases, leading up to 5.10, were haphazard. The 5.11 releases, leading up to 5.12, have settled into a new release every month on the twentieth, with a couple of exceptions. The 5.13 series has followed suit. One of the reasons this was possible was documenting the entire release process.

Releases in the 5.12 series are on a fixed schedule, every three months. A release schedule has been created for 5.14, too.

One of the things I’ve learned working in an enterprise and my observations of the Fedora Project is that good project management is vital. Jesse Vincent is exactly what Perl needed and he continues to demonstrate that, with regular, high quality releases of Perl. What’s more, he is a good spokesman for the project, being able to come to OSCON and give a session on all of this detail in a cojent and interesting format.

OSCON 2010: New Beginnings in Perl 5

Piers Cawley (BBC)

After reviewing today’s session schedule, I quickly came to the conclusion that I will spend my entire day sitting in the room “Portland 256.” This is, apparently, where the Perl track is located.

Paul Fenwick introduced Piers in song, to the tune of Gilligan’s Island.

Piers switched from Perl to Ruby a while back and swore that he wouldn’t return to Perl until 6. Facetiously, the reason he switched to Ruby was the handsome community associated with it and he reason he switched back to Perl was the amazingly supportive community associted with it. He began with a point about programming style. We think of code as describing what we are doing, but in reality the majority of our code actually describes how we are doing it. This infrastructure code is noise.

More seriously, he absolutely hated unrolling the @_ variable in every function. In such a high level language like Perl, why must we pop arguments off the stack in the same manner we would in an assembly language? This leads to long subroutines, every single one containing anti-patterns designed to implement the language infrastructure, instead of the language doing the work for us.

Moose does a lot to improve writing classes, using a more declarative syntax. However, even within Moose methods we need to write the infrastructure code. The MooseX::Declare module solves this problem, giving method syntax a more declarative style. By moving the infrastructure code out of sight, we can better focus on what we are trying to do, rather than how we are doing it.

Piers proceeded to list the modules that “rock” and brought him back to Perl:

Perl’s object-orientation absolutely “sucks.” However, this turns out to be a good thing. It allows very clever people to create modules that extend the semantics of the language. In a language like Ruby, which has a good object-orientation built-in, it’s essentially stuck. If, in the future better ideas of object-orientation are developed, they can be implemented in Perl far more easily than in Ruby. An interesting point: sometimes when the tool sucks, things are better. People develop layers of tools that enhance and extend the original.

It also helps that the Perl release schedule has accelerated.

Piers continued with a high-level, hand-waving explanation of how MooseX::Declare works. While not informative, it was entertaining. Including a video of Matt Trout attempting to hypnotize the room.

Piers ended by thanking the Perl community and expressing how good it feels to be back into it and developing in Perl again.

OSCON 2010: Wednesday Morning Keynotes

I haven’t had a chance to compose my Tuesday blog posts.  Hopefully, I’ll find time throughout the day to work on them.  All that really means is that my posts will be chronologically out of order.

It’s Wednesday morning at the O’Reilly Open Source Convention, which means it’s time for the introductory keynotes. The first thing I’ve noticed this morning is how crowded it is. Certainly more so than when I was last here in 2008. I don’t know if that’s just because we aren’t being given breakfast in the expo hall this year, so everyone is crowded into the area outside the ballroom. Another thing I’ve noticed is the gender makeup of the attendees. While still overwhelmingly male, I have noticed more women in attendance this year. Diversity is good.

Without any further ado, we’re getting started.

Welcome

Allison Randal, Edd Dumbill (O’Reilly Media Inc.)

This year’s co-chairs welcomed us and talked a bit about OSCON this year. Obviously, there wasn’t a lot of content, but they did mention the Android Hands-on event being sponsored by Google tonight. I did register for that, since it sounds like it will be fun.

Keynote

Tim O’Reilly (O’Reilly Media Inc.)

First up is the namesake of the convention. Every year he presents his vision, not just for the conference, but for the future he wants to see. He has been steering his company away from being just a book publisher or a content producer, but a company trying to make the world a better place. He urges the Open Source community to think about the cloud. Don’t just think about Linux, or whatever project, but about the bigger picture and where we’re going as a society.

He is fascinated by the ability of technology to reinvent government, a concept he’s dubbed “Gov 2.0.” We fall into the cycle of thinking of government as a vending machine, something we simply get things out of, and get frustrated when we don’t. Over the last few years, he has been talking about government as a platform.

We shouldn’t think just about selling to the enterprise, but about building a better world. We all benefit when that happens.

Coding the Next Generation of American History

Jennifer Pahlka (Code for America)

The government doesn’t have to be this obscure, opaque thing we get stuff from. It can be a platform for us to work together. Currently, the majority of the municipal workforce is over 40, and a significant percentage will retire soon. This creates a huge age gap, which leads to a technology gap.

In Oakland, California, the city workers can’t search city council meeting notes online. The method of entering the data in the computer is to scan the written notes, which are impossible for them to index.

Code for America was created to encourage younger, technologically-savvy individuals to apply their talents to government. It’s designed to create technology to open up government, to make it more accessible to the citizens. It’s a little like the iPhone or Android ecosystems. Government provides the platform, essentially the data. We, the citizens, build the apps.

Keynote

Bryan Sivak (Government of the District of Columbia

Those in the government of DC are big fans of Open Source, running Linux among other projects. They’ve long talked about being committed to Open Source, partly to save the taxpayers’ money. Unfortunately, much of this commitment is all talk.

For any project used in DC, forms are required to be filled out, justifying the choice and the expense. On this form is the question, “What Open Source projects were considered?” This is often left blank and still slips through without comment.

Proprietary solutions tend to come with copious documentation and an implementation plan. Open Source projects are more open-ended, which requires people within the government to have that vision and that creativity. This goes back to the age and technology gaps mentioned previously.

It’s good that these challenges have been identified and are being addressed.

Got MeeGo?

Dirk Hohndel (Intel Corporation)

MeeGo is the result of the unification of Moblin and Maemo. It targets netbooks, handset, tablets, and just about anything designed to be more mobile than a traditional notebook. It offers a full client Linux Open Source stack, from the kernel all the way up to the user interface, including the flexibility to support proprietary devices.

Dirk went over the primary goals and philosophy of the project (to be completely open), then went on to describe the organization of MeeGo at a high level. This included both the technical building blocks and the relationship with upstream projects.

Is Your Data Free?

Stormy Peters (GNOME Foundation)

Many of us use completely Free software on our computers, some even insist on it. However, when it comes to online services, we’ve gotten lazy.

Free software was driven by two types of people. There were those who advocated that all software should be Free, that it should be available to all people, regardless of their means. There were others who used and advocated Free software because they wanted something that didn’t crash. It’s this latter It Just Works motivation that Stormy believes has caused us to get lazy about demanding Freedom from our Web services.

She asks how many of us control our own email or have alternative ways to access it if something should happen to the primary service. What if Twitter or Facebook decides to delete your account? What happens to your data? She then went through a few examples of alternative services that have open data policies, such as Identica and Tomboy Online (it’s funny, I don’t use Tomboy because I won’t use Mono).

How many of us have read the agreements when signing up for Web services? Do we know who owns our data? Can we back it up ourselves? Who owns it, both while we’re using the service and if or when we decide to delete our data?

Keynote

Marten Mickos (Eucalyptus Systems)

The shift to the cloud is causing computing to scale, both up and out, far faster and far larger than any of the previous trends (mainframes, minicomputers, or client/server).

Many of the Open Source licenses were designed in an environment where everyone runs software on their own computers, software that requires distribution to be useful. Today we’re seeing more services being offered by companies running software within their own grids. Users never run the software themselves but rather send data in and get data out.

Eucalyptus is designed to be a highly scalable platform for on-premise use. As someone who supports many thousands of hosts in many data centers, this product has intrigued me for a while. Unfortunately, I’ve never taken the time to investigate it. It’s nice to see that those behind the company are committed to Open Source, using the split model. Users are free to download and use the software, while the company sells a supported version to enterprise.


The keynote sessions at OSCON tend to drag on for a while, making it difficult to pay attention the whole time. But they are finally over for now. We have a break before the first session of the day. I’m going to try to get some work done on yesterday’s posts before starting on my long day of Perl sessions.

I’m really impressed with the wireless network today. It had its problems during the tutorials on Monday and Tuesday. Traditionally, the network becomes almost unusable on Wednesday morning. This year, however, I have been able to connect to the Internet and write this blog post without any frustration.