Single Threading Good

Paul Brown @ 2006-05-09T05:39:00Z

Perhaps surprisingly, a post from Robert Sayre, who's been playing with Erlang on a SUN Fire T2000 (lucky bum on both counts), doesn't surprise me: best performance is achieved when the number of Erlang schedulers is equal to the number of hardware threads (i.e., “CoolThreads” in the case of the SUN box). Experience has borne out that using a single low-level thread under lightweight higher-level structures to manage concurrency is usually a good plan, and at least intuitively, the pigeonhole principle says that this should apply to additional low-level threads (or cores) if the higher-level structures are designed properly. (More threads than cores would mean that at any point in time one of the cores was supporting multiple threads and thus slower than it would be if it were only supporting one.)

(comment bubbles) 0 comments

Scalability and Data Partitioning

Paul Brown @ 2006-05-09T05:16:00Z

Dion made a post about “minimal shared data” as a key to scaling an application like Basecamp that got me thinking. This is indeed a pattern in scalability — partitioning. If the data will cooperate, then partitioning makes a good alternative to either fancy hardware (e.g., a SAN) or fancy software (e.g., a big multi-master database cluster).

A few references and tidbits on the topic come to mind:

  • Brad Fitzpatrick's presentation on the LiveJournal backend: “Can't join. Don't care. Never join user data w/ other user data.” He goes on to talk about how they move users between partitions and how they built the individual partitions.
  • A great paper from Microsoft Research about migrating TerraServer from a SAN to commodity hardware in 2003. The paper goes into gory detail, all the way to data loading and backup strategies, rack configurations, and network topology. (Thanks to Dragos for the pointer to the paper.)

Gory detail on system loads, network topology, and hardware configuration is what you'd expect to find in a discussion of scalability, as opposed to the language(s) used to write the software.

If you aren't stuck in a relational rut, there are very cool things out there. There are the online storage services like Amazon S3 and StrongSpace, academic projects like OceanStore and OpenDHT, whatever Microsoft and Google field in the space, and plenty more. In terms of things to either download and play with (as opposed to access over the net), the Bamboo Distributed Hash Table (“DHT”) comes to mind, and the accompanying paper on churn is a good read. In the same vein, there is (yet another) interesting paper from Microsoft Research on a system called “BitVault” designed for reliable long-term storage of massive amounts of data like images (e.g., every check processed or every X-Ray taken in the US) and media that uses a DHT under the covers.

(comment bubbles) 0 comments

Greenies

Paul Brown @ 2006-05-06T19:25:00Z

I read an article somewhere (NYT, maybe?) recently about the fact that even with tax breaks and subsidies, hybrid cars are not a winning economic proposition. Nonetheless, people buy them for non-economic reasons, as a social and political statement. This is certainly not a new behavior, as most people who own fast cars in big cities don't actually drive them fast from stoplight to stoplight. Come to think of it, it's about the opposite — the people I remember driving like maniacs on the Chicago freeways were usually in low-end Japanese cars with lots of primer spots and great big spoilers to metaphorically show their manhood.

Since we're a Northwest-living, organic-food-eating, hybrid-car-contemplating, trash-recycling, sandal-wearing, co-sleeping, photovoltaic array-considering, eco-touring family that would otherwise fall into the yuppie demographic, that now makes us “greenies”. (“Guppie” is becoming an official term, but I like "greenie" better...) I was thinking about greenies as an economic force yesterday when we were at REI with the kid. There were a lot of other families shopping, too, with kids and strollers, except that unlike the usual mix of Graco and the occasional Maclaren in the rest of Seattle, every stroller was a Bugaboo.

I'm happy with “less of better” being a mantra for my generation, but I can't help but think that the majority are in the “more of whatever is on TV” WalMart Nation instead.

(comment bubbles) 0 comments

More on Meeting-Making for Google Calendar

Paul Brown @ 2006-05-06T05:55:00Z

After having posted about how it would be possible to take the Atom feeds from Google Calendar and make a collaborative appointment scheduler (meeting time picker for multiple people), I decided to give it a shot using the Atom parsing library for Ruby from Martin Traverso and Brian McCallister.

The Atom library is slick, and doing some simple extensions to the basic binding to support the Google Data elements is straightforward. For example, here's a Ruby snippet that will read the start time, end time, and reminder settings from the feed:

require 'atom'
require 'xmlmapping'
require 'time'
require 'date'
require 'net/http'
require 'uri'

module GoogleData

  NAMESPACE = 'http://schemas.google.com/g/2005'
  
  def GoogleData.int_or_nil(s)
    if s.nil?
      nil
    else
      s.to_i
    end 
  end
  
  def GoogleData.date_or_datetime(s)
    if s.length == 10
      Date.parse(s)
    else
      Time.iso8601(s)
    end  
  end
  
  class Reminder
    include XMLMapping
    
    namespace NAMESPACE
    
    has_attribute :absolute_time, :name => 'absoluteTime',
      :transform => lambda { |t| Time.iso8601(t) }
    has_attribute :days, :name => 'days',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
    has_attribute :hours, :name => 'hours',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
    has_attribute :minutes, :name => 'minutes',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
  end
  
  class When 
    include XMLMapping

    namespace NAMESPACE
    
    # The following little hack is required because the
    # datatype switches between xs:date for all-day
    # appointments and xs:dateTime for non-all-day
    # appoinments.

    has_attribute :start_time, :name => 'startTime',
      :transform => lambda { |s| GoogleData.date_or_datetime(s) }
    has_attribute :end_time, :name => 'endTime',
      :transform => lambda { |s| GoogleData.date_or_datetime(s) }
    has_attribute :valueString
    
    has_many :reminders, :name => 'reminder', :type => Reminder
  end

  class Entry < Atom::Entry
    namespace NAMESPACE
    has_one :when, :name => 'when', :type => When
  end
  
  class Feed < Atom::Feed
    has_many :entries, :name => 'entry', :type => Entry
  end

The two key tricks above are extending Atom::Feed and Atom::Entry to add explicit handling for the extension elements that we're after. (Without any changes, Atom::Entry does capture an array of extension elements, but I'd prefer to work with objects.) Similar approaches can be applied to the other "kinds" of things in the feed. As an editorial comment, I'm lukewarm about the datatype of an attribute value determining its semantics; normally the semantics would determine the datatype.

To grab the data from the Atom feed of the calendar:

response = Net::HTTP.get_response(URI.parse(GCAL_FULL_URL))

# TODO: Limit the number of redirects to follow.
# TODO: Gracefully handle other non-200's here, too.
while response.kind_of? Net::HTTPRedirection
  response = Net::HTTP.get_response(URI.parse(response['location']))
end

feed = GoogleData::Feed.new(response.body)

feed.entries.each { |event|
  puts '---'
  puts event.title
  puts event.when.start_time.to_s + ' -- ' + event.when.end_time.to_s
}

Back to the original goal of building a "meeting maker" for Google Calendar based on the Atom feeds for participants' calendars, the additional work to properly handle recurrence and recurrenceException makes the problem look quite a bit more complicated (and interesting). (Fortunately, there does appear to be an iCalendar (RFC2445) library available as well...) So this is turning into more than a one-evening project.

With the added complexity of supporting recurring events and exceptions, there is probably a tidy approach that augments the list merge I suggested before with generators and sequence comprehensions for the recurring events — just enumerate possible meeting times from the complement of the merged list of "busy" times for non-recurring meetings and test for overlaps in the union (i.e., "or") of the sequences for each participant. (If I recall correctly, the meeting makers in the usual Exchange clients don't support optimal scheduling of recurring meetings, so that would be a nice feature as well, i.e., schedule the recurring meeting at the time with the fewest conflicts or at least minimize the conflicts for some subset of the participants.)

(comment bubbles) 0 comments

Enjoying (Temporary) Retirement

Paul Brown @ 2006-05-03T21:18:00Z

I left my position at Amazon at the end of March, and after pretty much not taking a break for almost seven years, I've been enjoying some time off. (I'm serious about not taking a break — for example, I was hitting Internet cafes during our honeymoon in Italy to check the status of FiveSight's Series A negotiations and read drafts of agreements.)

The odd thing is, like people who are genuinely retired, I find my time filled anyway, albeit with reconnecting with people I haven't talked to recently, checking out the local parks, tinkering with software, or grabbing breakfast at a sidewalk bakery/cafe while the kid runs around. (FWIW, I've decided that referring to my daughter by name in my blog either is or should be irrelevant to my readers who aren't otherwise friends in the real world.) It's entirely pleasant.

Nonetheless, I think that 60 days of this is about all I can stand, so my “v.next” kicks off in June.

(comment bubbles) 0 comments

Piled Higher and Deeper

Paul Brown @ 2006-04-28T19:17:00Z

Last year in the US, credit card issuers sent on the order of one “pre-approved” offer for every man, woman, and child on the planet. I know that they sent at least one to a child, as the kid got an invitation after she signed up as a frequent flier on United. It might be scarier that 0.3% of the offers (6x108 · 3x10-3 = 1.8x106) were accepted.

Surely there are better reasons to kill trees than just to fill landfills and recycling bins...

(comment bubbles) 2 comments

Some Homebrew Sauce for Google Calendar

Paul Brown @ 2006-04-15T03:50:00Z

As I looked at some cool ideas from Elias Torrez (via James Snell), it occurred to me that I could make some homebrew sauce for Google Calendar to address one of my wants, namely a meeting time chooser for one or more participants. Here's how it could work.

In the Atom feed for the calendar, there are elements in a Google namespace like so:

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
      xmlns:gd="http://schemas.google.com/g/2005">
  [...]
  <entry>
    [...]
    <gd:when startTime="2006-04-03T16:00:00.000Z"
             endTime="2006-04-03T17:00:00.000Z"/>
  </entry>
</feed>

We'd want a query that accepted a tuple of URLs for Atom calendar feeds and performed an iterative merge:

  1. Copy the <gd:when> elements from the first feed on the list into working storage of some kind. We'll write a @startTime and @endTime pair as (s,e) in what follows.
  2. Iterate through the elements of the next feed in the list; suppose that the current one is (x,y). If there is an element (w,z) in the scratch list such that z<x and w<y, replace (x,y) by (min(x,w),max(y,z)). Otherwise, add (x,y) to the scratch list.
  3. Repeat #2 with each feed.

As additional sugar, one additional feed could be used to represent desired days or time ranges by exclusion. The combined feed would contain the collective busy times for the group, and the publicly visible Atom feeds for each calendar would be all that would be needed.

This could be done in a browser with a script written in E4X, with the caveat of having to perform date arithmetic. (The XML Schema variant of ISO 8601 date-times compare cleanly as strings, but the JavaScript Date object is based on a different syntax.) XQuery supports operations and comparisons on dateTime and duration values, so it would be another good candidate. As would Ruby, thanks to Atom support and date support in a compatible format.

(comment bubbles) 0 comments

All Posts contains 397 items in 57 pages of 7 items each:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57