A version of this article is also published on my Perl blog.
Early in February, I received a jury summons for the United States District Court, Southern District of California. Prospective jurors for federal jury service (at least in this court) are placed on call for a period of about 30 days. I was to call for instructions on April 1 and potentially proceed to do so periodically until May 4 (assuming I wasn’t instructed to report).
Since my initial instruction date was nearly two months away, I created an entry for it in Google Calendar, and promptly forgot about it. On Monday, April 2, I was riding the train to work when I realized that I hadn’t remembered to check my instructions. Fortunately, after arriving at my office and checking my instructions, I had been deferred to the next day.
So I added a new entry in Google Calendar, this time with an SMS reminder. I proceeded to do this for most of April, checking my instructions and duplicating the calender entry with another SMS reminder.
I’m embarrassed to admit that it wasn’t until the last week of April that it occurred to me that I could automate the whole process. After all, isn’t automating drudgery the whole reason I ended up programming Perl in an engineering support group at my day job?
In addition to a telephone recording, jury instructions can be obtained online. In fact, this is the method I used all month. The form uses the HTTP POST method, so it wasn’t a simple matter of constructing an URL to fetch my instructions. While I could construct a POST request with curl(1) or the LWP module, it’s so much easier to do with with the WWW::Mechanize module.
my $mech = WWW::Mechanize->new(); $mech->get('http://jury.casd.uscourts.gov/AppearWeb/Default.aspx'); $mech->submit_form( form_name => 'Form1', fields => { 'ctl02$txtPart' => 'PARTICIPANT_ID', 'ctl02$txtZip' => 'ZIP_CODE', }, button => 'ctl02$btnInstructions', ); |
When I’m not supposed to report, the following message appears in the returned content:
<span id="ctl02_lblMsg">Please check again Sunday, April 29, after 6:00pm for further reporting instructions. Do NOT report at this time.</span> |
Given how simple this is, I could parse it with a regular expression. But, I figured it was worth trying to do it right, so I searched CPAN and found the HTML::DOM module. I’ve worked a bit with DOM in JavaScript, so the module appealed to me. Annoyingly, the parse method only supports file names or file handles. Fortunately, this isn’t terribly difficult to work around and the whole thing isn’t much more verbose than using a regular expression.
my $dom = HTML::DOM->new; $dom->parse_file( IO::Scalar->new( do { my $c = $mech->content; \$c } ) ); my $message = $dom->getElementById('ctl02_lblMsg')->innerHTML; |
Now that I have the message what does it say? Thus far my instructions have always been to check again on another day, so I’ll need to work with what I know and defensively code for the exceptions.
if ( $message !~ /Do NOT report at this time/ ) { # We didn't see the message we wanted to see, so we'd better alert... } |
If I don’t see the known message, I send myself an alert (I happened to use the Email::Sender module in the script) and exit. If this happens, I’ll need to address it as it probably means I need to report (or I’m no longer on call).
However, if I do see the above message, I need to figure out when I’m supposed to check again. If this fails for some reason (e.g., I don’t know what the format looks like if the day is a single digit), I go through the alert process again. It’s rather important that this script be noisy, given the nature of what I’m doing and the limited knowledge I’m working with.
if ( $message !~ /Please check again (?<weekday>\w+), (?<month>\w+) (?<day>\d+)/ ) { # We couldn't parse the next date to check, so we'd better alert... } my $dt = DateTime::Format::DateParse->parse_datetime("$+{'weekday'}, $+{'day'} $+{'month'} 18:15"); |
I’ve hard-coded the time to check as 6:15 PM, because the instructions are always updated at 6:00 PM.
Finally, the script schedules itself to run again at the time indicated. Here I’ve broken out of Perl to use the at(1) command. Since I’m running the script on my Linode VPS, this seemed an easy way to accomplish the task of rescheduling.
open my $at, '|-', 'at', $dt->strftime('%R'), $dt->strftime('%F'); say {$at} "$0 2>/dev/null"; # $0 must be fully qualified or in PATH close $at; |
Running this script once will set the rescheduling process in motion, alleviating me of the need to run it again. If I’d thought of this at the beginning of April, I could have forgotten about the whole bother of checking for instructions several times per week. Oh well, live and learn.
I’ve posted the full script as a Gist on GitHub.
As a way of outsourcing the work and perhaps offer this type of service to a wider audience, I looked at ifttt and Yahoo! Pipes. Unfortunately, the former doesn’t appear to have a way to trigger on scraping an arbitrary web page, and the latter doesn’t appear to support the HTTP POST method. If anyone knows of an approach using existing services, I’m open to suggestions.
Updated on 30 April 2012.
Mojo::UserAgent
Posting this on blogs.perl.org resulted in a few comments (including a rant I completely agree with, but that’s beside the point). There was one suggestion that I try using Mojo::UserAgent instead of WWW::Mechanize.
My first attempt at doing so wasn’t particularly successful. After a while, I realized that I needed to manually do some of the work that WWW::Mechanize was doing for me. Namely, fetch the page, extract the hidden fields, and submit the form with these fields included (there’s a cookie involved, but it’s taken care of behind the scenes by both modules).
Because of this, the Mojo::UserAgent version is a bit more annoying to write, but I think this is more than made up for by the built-in access to the DOM.
my $ua = Mojo::UserAgent->new; my $url = 'http://jury.casd.uscourts.gov/AppearWeb/Default.aspx'; my $res = $ua->get($url)->res; # initial fetch to get cookie and form fields my $tx = $ua->max_redirects(3)->post_form( $url => { '__VIEWSTATE' => $res->dom('form#Form1 > input#__VIEWSTATE')->[0]->attrs('value'), '__EVENTVALIDATION' => $res->dom('form#Form1 > input#__EVENTVALIDATION')->[0]->attrs('value'), 'ctl02$txtPart' => 'PARTICIPANT_ID', 'ctl02$txtZip' => 'ZIP_CODE', 'ctl02$btnInstructions' => 'Reporting Instructions', } ); $res = $tx->success or die $tx->error; my $message = $res->dom('span#ctl02_lblMsg')->[0]->text; |
WWW::Scripter
As I was working on the Mojo::UserAgent version of my script, I kept thinking how perfect it would be if WWW::Mechanize gave me access to the DOM in the same way. Well, as I was pushing the new jury-mojo.pl script to my Gist, cpansprout left a comment to not only tell me how I could remove my IO::Scalar hack, but that WWW::Scripter does exactly what I had just been wishing for. It’s like he read my mind.
my $mech = WWW::Scripter->new(); $mech->get('http://jury.casd.uscourts.gov/AppearWeb/Default.aspx'); $mech->submit_form( form_name => 'Form1', fields => { 'ctl02$txtPart' => 'PARTICIPANT_ID', 'ctl02$txtZip' => 'ZIP_CODE', }, button => 'ctl02$btnInstructions', ); my $message = $mech->document->getElementById('ctl02_lblMsg')->innerHTML; |
I like this last version the most and have updated my Gist accordingly. Also, my automation worked and emailed me tonight to inform me that my jury service has concluded.
Chris – I would like to link to your article and if you have any additional comments for a more general technical audience I would be happy to post them too. And please feel free to write to me directly. The court automation community definitely needs to see this. Thanks, Jim
Please feel free to link to this article.
I’d love to see more work done to streamline things like jury service. It strikes me as odd that the system can call to remind me to check my service on a given day, but it can’t go the extra step to call if I actually need to report. Offering email or SMS options would be even more convenient.