Seattle OpenCoffee Club

Paul Brown @ 2007-04-18T01:33:22Z

Andy Sack (of Judy's List) is hosting a weekly open forum for entrepreneurs, and I stopped by the first meeting this morning. I came late and talked with a few folks, but it seems like a good mix of people at different stages of their careers and businesses. It was a welcome dose of nostalgia — back in 1999-2000 when FiveSight was just raising an angel round and rolling-up its first customers, I was a regular attendee at monthly founder-only networking dinners (usually at Reza's downtown) hosted by Paul Doyle (of ProofSpace, then and now). The Chicago dinners scaled up to about 70 people at the peak but ended when the bubble burst and implosion of Divine Interventures dragged-down most of the companies.

I don't know that I'll go weekly, but I'll attend regularly.

(comment bubbles) 0 comments

Security Plus Obscurity

Paul Brown @ 2007-04-18T00:56:57Z

I've been getting a new bank account set up for use on-line, and for security, Bank of America's process is to send an initial password through the mail. Here's what they sent:

So, is that last character a one, a lowercase ell, a capital ell, or a pipe? (And, for that matter, what about people who are visually impaired?) Maybe a little more thought about the font is warranted. After getting a "our system is down, try again later" error while trying the possibilities, I tried again later only to be told that I'd exhausted my guesses. So I called them for assistance, as instructed, and once I'd navigated through several nodes of IVR menus, I was connected to a "Your call cannot be completed as dialed..." message.

I'll point out that it would be simpler (and $0.30 cheaper) to assign a password to or collect a password from people as they open their accounts.

Time for a new bank.

(comment bubbles) 0 comments

Reduce, Reuse, and as a last resort, Recycle

Paul Brown @ 2007-04-08T15:52:43Z

The bagel in a CD spindle case image floated by twice last week, once in TreeHugger and once in Assaf's new tumblog.

The mantra "reduce, reuse, recycle" is in my head from someplace, and maybe it comes from growing up in a house where we put out a couple of cans of garbage per year. Recycling has a green halo, but it's not as important as reduce or reuse. The best possible thing is simply not to consume in the first place other than to meet your realistic needs, and reuse comes second. Recycling is a distant third because it consumes resources — items must be collected and processed before being used as raw materials for something new. Not everyone can be no-impact-man (NYT, registration required), but I'm not even going to list "just throwing things away" as a choice. (Here in the Emerald City, it's illegal.)

Any health considerations of the CD spindle case aside (see thread on reddit, or read elsewhere on why plastic is the new lead), packaging is wasteful, and there should be a way to reuse it.

For example, I've wondered about the economics of an old school glass milk bottle-style exchange system applied to restaurant take-out containers. (For younger readers, the milk bottle system worked like this: pay $1 for a heavy glass milk bottle once, bring it back to the store when empty, pay for new milk and trade for a new bottle; the dairy cleans and refills the bottles, etc.) The consumer would pay a non-refundable deposit (e.g., $15) on the take-out containers, and old containers could be traded for new. The little details include what the containers would be made of, how they could be cleaned, what inventories would have to be like for a restaurant in terms of different sizes and form factors, and of course, whether consumers would go for it. There is also the question of whether or not reuse of glass containers (with the initial cost of manufacturing and ongoing impact of water use, detergents, transportation, etc.) would be lower-impact than single-use paper containers. The lightweight DIY version would be to bring empty containers to the restaurant to be filled, but that doesn't work with phone or delivery orders.

(comment bubbles) 0 comments

Trackback Spam and iptables Recipes

Paul Brown @ 2007-03-23T02:46:06Z

It was interesting to read the conclusions in the Chen-Ma-Niu-Wang paper "Spam Double-Funnel: Connecting Web Spammers with Advertisers", specifically about how most of the page spam involves permissive content posting sites and a relatively small group of IP addresses. (There is also a NYT article about the paper.) It's similar to my experiences with trackback spam.

The comment system in the version of typo that I'm using (for the time being) relies on Javascript, and it's apparently not widespread enough that someone's decided to crack it. (In fact, based on the fact that most commenters post several of the same comment, I expect that it either doesn't work well or is confusing for human users...) I get almost no comment spam. On the other hand, the trackback system is an unadorned HTTP POST, so it's trivial to automate trackback spamming. Akismet's great service correctly marks most of the spam, which is good, given that the ratio between spam and ham trackbacks is something like 5000:1, but it's still a pain for me to wade through gobs of spam looking for possible ham. Even worse, it's a safe assumption that the spambots are gobbling up bandwidth and CPU when they spider through my content looking for trackback links. Nonetheless, I'm not willing to give up and turn the feature off, even for old posts.

With the server logs from the last several months of traffic, I used grep, awk, sort, uniq, and column to get a listing, sorted by count, of the originating IP numbers for trackbacks with rolled-up counts for B and C subnets. (It wouldn't be useful to eliminate the few ham trackbacks, since they'll be insignificant in the overall numbers.) The results were useful: most of the spam trackback POST traffic was coming from a few individual IP addresses and a couple of subnets. From there, a little iptables magic from the httpd tools that accompany the (very good) book Apache Security by Ivan Rustic, and the gush of 2000-3000 trackback spam/day was down to a trickle of <100/day. (Ivan's blacklist tool doesn't support IP ranges yet, so I added a rule manually to block the 72.232 class B subnet and a couple of class C subnets.)

The heuristic is satisfactory for the time being, but it does seem like trackbacks are doomed without some kind of authority or authentication system in place. As alternatives, the Technorati linkcount and the Yahoo! Site Explorer API and badge (Y! ID required) look promising.

(comment bubbles) 0 comments

Not Quite No Transactions

Paul Brown @ 2007-03-20T02:56:00Z

I enjoyed Dan Pritchett's presentation on The EBay Architecture, but now it seems like people are getting carried away, where the intersection of people with my blogroll contains Martin Fowler. Martin is well-known and rightly regarded as a smart guy, but this fragment from his post on "no transactions" isn't quite right:

The rationale for not using transactions was that they harm performance at the sort of scale that eBay deals with. This effect is exacerbated by the fact that eBay heavily partitions its data into many, many physical databases. As a result using transactions would mean using distributed transactions, which is a common thing to be wary of.

(Unfortunately, Martin's bliki doesn't support comments, or I would be posting this over there.) The relevant bullet from Dan's presentation at SDForum (not the one at QCon) is:

Absolutely no client-side transactions.
∗ Single database transactions managed through anonymous PL/SQL blocks.
∗ No distributed transactions.

This is very different than "no transactions", but the game of telephone is already underway.

The reality is that attempts at being "for sure, for sure" (multi-master replication, two-phase commit, locking, aggressive retry, etc.) can and do inhibit scale. There are also plenty of stupid things you can do to hobble an application without leaving a single database, usually by inadvertently abusing locking. (One classic is a SELECT ... FOR UPDATE that includes a join across a code table.)

In terms of things that can be done to scale, it's a good idea to take the shortest path to a working system. It's a good idea to be intimately familiar with the performance characteristics of a system in production, and that means really, truly understanding how all of the pieces (load balancer, web server, application tier, database, physical network, hardware, etc.) work individually and collectively. It's important to be able to disentangle design choices from business constraints, and this is more or less the point that Assaf made in the original round of discussion about Dan Pritchett's presentation. As Dare Obasanjo and Charlie Wood point out, life without transactions isn't necessarily a picnic, either. Thinking about "what can I live without" in the abstract will lead down the path toward something like what Pat Helland describes in his Life Beyond Distributed Transactions paper, but that journey should be taken in pragmatic steps driven by experiment and experience. (And now that you're out of the jungle, Pat, can you hurry up and go back to blogging?)

(comment bubbles) 3 comments

Java Brainteaser on Regular Expressions

Paul Brown @ 2007-03-20T00:32:11Z

Suppose that you have a Java web application where regular expressions are used deep down in the implementation to do some work, but you observe that the an array index exception is occurring sporadically where the regular expressions are being used.

What's causing the exceptions? What's the solution?

(comment bubbles) 1 comment

STM and IO Redux

Paul Brown @ 2007-03-13T13:05:00Z

Pepe Iborra left a comment on my entry on STM and IO about the use unsafeIOToSTM that spurred me to do some more reading and ask a few questions by email. (Better yet, people who knew the answers were kind enough to respond.)

Better without unsafeIOToSTM

The consensus was to avoid the use of unsafeIOToSTM and just combine the IO actions in the IO monad. This changes things around a bit but in a good way; refactoring (if the word applies in this scenario) only took about 15 minutes.

Disregarding the suggestion to use TMVar for the moment, here are some revisions. (If you look at TMVar, the source is more informative than the documentation.)

First, check_out and check_in need to change, and storing an entry to disk can get simpler:

check_out :: TVar Holder -> IO Holder
check_out h = atomically ( do { h' <- readTVar h
                              ; if locked h'
                                then retry
                                else writeTVar h (lock h')
                              ; return h' } )

check_in :: TVar Holder -> Holder -> IO ()
check_in h' h = atomically ( do { h'' <- readTVar h'
                                ; if (locked h'')
                                  then writeTVar h' (unlock h)
                                  else error "Internal error." } )

store :: Holder -> IO Holder
store h = do { let e = entry h
              ; do writeFile (fname e) ((show e) ++ "\n")
              ; return h }

And two additional utility functions:

onHolders :: (Entry -> Entry) -> (Holder -> Holder)
onHolders f = \ (Holder e l) -> Holder (f e) l

s_apply :: (Entry -> Entry) -> TVar Holder -> IO ()
s_apply f h' = check_out h'
               >>= (store' . onHolders f)
               >>= (check_in h')

(The ">>=" in IO is sequencing where the output of each step is passed to the next as input.) The flow of s_apply is as follows:

  1. "Check out" the entry by setting the locked field to True and then pass the Holder to the next step.
  2. Apply the function f to the Entry wrapped in the Holder, write the result to disk, and pass it along.
  3. "Check in" the entry by setting the locked field to True and writing the new value into the TVar.

Publishing and unpublishing a persistent Entry now has the appealingly simple form:

s_publish :: TVar Holder -> IO ()
s_publish = s_apply publish

s_unpublish :: TVar Holder -> IO ()
s_unpublish = s_apply unpublish

Even Better with bracket

An anonymous commenter pointed out bracket, a function that has the same semantics as try { ... } finally { ... } in Java. The bracket function has the signature:

bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c

In the analogy with try/finally from Java, the IO a would occur before the try, like lock'ing a Lock in the usual idiom. The result of the initial computation is passed both to the inner computation and to the final computation, so the application of the function (e.g., publish) would need to be grouped with the check_out operation if the published Entry was to be the one checked back in. For my purposes, bracketOnError is preferable, since it only executes the fallback action if the inner action (i.e., the last argument) fails. With bracketOnError added and a little more clean-up from another pass over the code, everything gets a little simpler yet:

store :: Entry -> IO Entry
store e = do { writeFile (fname e) ((show e) ++ "\n")
             ; return e }

check_out :: TVar Holder -> IO Entry
check_out h = atomically ( do { h' <- readTVar h
                              ; if locked h'
                                then retry
                                else writeTVar h (lock h')
                              ; return $ entry h' } )

check_in :: TVar Holder -> Entry -> IO ()
check_in h' e = atomically ( do { h'' <- readTVar h'
                                ; if (locked h'')
                                  then writeTVar h' (Holder e False)
                                  else error "Programmer error." } )

s_apply :: (Entry -> Entry) -> TVar Holder -> IO ()
s_apply f h' = bracketOnError (check_out h')
               (\e -> (store.f) e >>= (check_in h'))
               (check_in h')

With s_publish as before, this does the right thing in the event of an error while writing:

*Main> th <- atomically ( newTVar (Holder (Entry "foo" False) False ))
*Main> s_show th
"Holder {entry = Entry {entry_id = \"foo\", published = False}, locked = False}"

*Main> :! chmod -w entry-foo.hb
*Main> do s_publish th
*** Exception: entry-foo.hb: openFile: permission denied (Permission denied)
*Main> s_show th
"Holder {entry = Entry {entry_id = \"foo\", published = False}, locked = False}"

*Main> :! chmod +w entry-foo.hb
*Main> do s_publish th
*Main> s_show th
"Holder {entry = Entry {entry_id = \"foo\", published = True}, locked = False}"

Whither TMVar?

From the TMVar source code comments:

A 'TMVar' is a synchronising variable, used for communication between concurrent threads. It can be thought of as a a box, which may be empty or full.

The idea would be that when an Entry was locked, its TMVar "box" would be empty, to be filled with a TVar wrapping the new value after the operation was completed. Other threads (e.g., threads rendering page views or feeds) need to be able to read values while output is being performed, so I don't think that a TMVar is what I'm after in this case.

(comment bubbles) 0 comments

All Posts contains 399 items in 57 pages of 7 items each:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57