OSCON 2010: Hands-on Cassandra

The second tutorial I attended on Tuesday, and the last one of the conference, was Hands-on Cassandra. Actually, I missed the first half of this tutorial, for reasons which I explain in my Tuesday recap post.

I’ve been told by those that attended the full tutorial that the first half wasn’t really worth attending. In fact, when I arrived at the beginning of the second half, I caught the tail end of the presenter demonstrating how he recreated Twitter using Cassandra, something he dubbed Twissandra. This seems to be the exercise of choice for any distributed system. In a way, that’s smart. Take a highly distributed system everyone is familiar with, explain the challenges faced by such a system, then demonstrate the effectiveness with which the software in question can solve the problem.

In any case, the second half of the tutorial was mostly dedicated to an explanation of how Cassandra distributes its data. The details and, frankly, the delivery weren’t that interesting for me, so I didn’t follow the discussion. It was too high level to keep my interest.

I still think that Cassandra is deserving of some investigation. I have a project in mind that it may be perfect for. At my day job, we have what is essentially a distributed, key-based data store. We’ve had to implement all of the data replication functionality. If Cassandra can alleviate the need to design and implement our own data replication and integrity systems, we can put more effort into the final delivery of the data, instead of its transmission.

OSCON 2010: PostgreSQL Reloaded

The full title of this session is PostgreSQL Reloaded – Hot Standby, Streaming Replication & More! It was presented by Chander Ganesan, who, even before the tutorial started, demonstrated his skill as a presenter. Reading his biography, I noted that he appears to be a professional trainer, which is a nice sign. He started out by waiting for a whiteboard to be delivered. Good! That means pictures will be drawn and audience interaction may take place. I really appreciate his dynamic personality and presenting style. Having gotten little sleep the night before, he was able to keep me awake and focused.

Unlike Monday, I chose tutorials on Tuesday that held some relevance to the work I’m doing. At my day job, we have a MySQL database backing a critical production system. We have spent years fighting with it and dealing with its failures and instability. I have a bias towards PostgreSQL, having used it in the past, and finding it a superior database to MySQL. That, however, is beside the point. What is pertinent is that I have been considering a complete redesign of the system, using PostgreSQL as the data source, and a tutorial on the built-in standby and replication capabilities coming with the release of PostgreSQL 9.0 is timely.

The slides for this tutorial were distributed to us when we registered. They are intended to stand on their own, serving as documentation if we later work on implementing the concepts presented here. That said, the information density of the slides didn’t at all detract from the presentation. As a hands-on demonstration, Chander didn’t project the slides very often and, when he did, only referenced them as he spent time explaining the material.

In order to better understand how PostgreSQL implements hot standby and replication, Chander first gave us an overview of how PostgreSQL manages the data a database. I’ll be brief, so this is probably not entirely correct. For efficiency, data is manipulated in 8 kilobyte pages stored in memory, in what is called the shared buffer pool. These pages remain in memory until the pool is exhausted, at which point one or ore infrequently used pages will have any changes written to disk and purged from the pool. This means that while the updates are stored in the pool, there is a (potentially long) window of time in which a crash will cause data loss. To prevent data loss, all update operations are first written to the write-ahead log (WAL) files. During a recovery operation, these WAL files can be used to play back any transactions that were lost in the crash.

Having these WAL files means that, from a given point in time, the database can be reconstructed. It’s not a stretch to shift the playback of these WAL files into real time on a secondary system. This automatically creates the possibility of a live replicated database, which can be queried in place of the primary database.

The rest of the tutorial was devoted to demonstrating how to set up and use warm standby databases, hot standby databases, and streaming replication.