Spamilestone

Paul Brown @ 2004-10-03T20:47:30Z

My spam corpus has just crossed 70,000 messages, so it's time for a quick retrospective on the last 10,000 messages. It looks like spam is picking up speed, with ~240/day (peaks over 300/day) for the last 10,000 versus ~215/day for the 10,000 before last and ~205/day for the 10,000 before that. With this last set of messages, the distribution of spam probabilities (as rated by bogofilter) is starting to flatten out a bit, so I'll probably have to tune the paramters or add an additional type of filter.

(comment bubbles) 0 comments

Blackjack, Roulette, and Beating the House

Paul Brown @ 2004-10-02T16:38:07Z

Although I don't get much chance to read, I've started a new non-fiction book, Bringing Down the House, about a group of MIT grad students who used refined card-counting techniques to beat the odds in blackjack.

If memory serves, the classic card counting techniques provide a about a 1% advantage over the house. That is, there is an asymptotic 1% advantage assuming an infinite bank; a simulation shows that actually realizing that advantage requires a substantial bank and patience -- something like 100 times the bet as a bank to reduce the chance of going bankrupt to 1%. A fancier heuristic can make a big difference, however. When I was a graduate student and had an opportunity to get over to Reno from Berkeley, I experimented with a four-slot card sorting technique that used both the weighted sum of the different category counts and the shape of the histogram ("buck-toothed" or "gap-toothed" shapes added additional information) to make decisions. I never had the time to run a real simulation, but the heuristic worked relatively well in practice even if I occasionally made stupid moves (intentionally, of course) to make it look like I was just being lucky. At any rate, coming up with a heuristic that provides useful information in practice (multi-deck shoe, different seating positions at the table, etc.) is an interesting challenge.

Another interesting read on the subject of beating the house is The Eudaemonic Pie by Thomas Bass, about a group that built tiny toe-controlled, radio-linked computers into their shoes to solve the dynamics of roulette on the fly. (The book was out of print (but available if you knew where to look), but it looks like it's been reprinted.) While their efforts with roulette didn't pay off directly, the team later applied their ideas to building models for trading derivatives.

(comment bubbles) 0 comments

Steve Shu Joins the Blogosphere

Paul Brown @ 2004-09-23T22:30:14Z

Steve Shu, who was FiveSight's "utility infielder" for almost four years (and, although I always kidded about him being the only guy who didn't code, an indispensible part of our team), has moved to Texas with his family, embarked on a new joint venture with his wife, and started blogging his thoughts about management and business.

It's a little odd to be a "startup" that's old enough to have alumni, but I hope Steve's time with us will serve him well in his new endeavors.

(comment bubbles) 0 comments

And the Name is... WS-BPEL 2.0

Paul Brown @ 2004-09-21T10:01:00Z

Votes have just been counted, and the first real version of the BPEL standard, i.e., the one currently in-process at OASIS, will be called "WS-BPEL 2.0".

(There is still no word on how to pronounce it, however...)

(comment bubbles) 0 comments

More Thorough Validation of WSDL documents and schemaLocation

Paul Brown @ 2004-09-17T23:59:55Z

The schemaLocation attribute and its cousin noNamespaceSchemaLocation are used to "suggest" the location or locations of a schema per namespace for validation purposes. The value of the schemaLocation attriubte consists of a list of whitespace-separated pairs of a namespace URI and a location of a schema for elements in that namespace.

For modular documents like WSDL (extensibility via <xsd:any namespace="##other" .../>), this means that the schemaLocation attribute will contain several items. For example, for a WSDL document without extensions, the pair:

http://schemas.xmlsoap.org/wsdl/ http://schemas.xmlsoap.org/wsdl/2003-02-11.xsd

should be present in the schemaLocation attribute. Pairs should also be present for the namespaces used for any extensibility elements. For example, for BPEL partnerLink definitions, the pair

http://schemas.xmlsoap.org/ws/2003/05/partner-link/  http://schemas.xmlsoap.org/ws/2003/05/partner-link/

should also be present. (The first use of the URI is the namespace; the second is a URL for the schema.) The same applies to the SOAP binding, HTTP binding, etc.

This is still only a partial solution, since it doesn't prevent extensibility elements from being used in inappropriate locations, but it's better than letting the extensibility elements fall silently through the ##other hole.

(comment bubbles) 0 comments

Number Theory Flashback

Paul Brown @ 2004-09-17T23:18:17Z

I am probably the last person to stumble across the Google challenge relating to 10-digit prime numbers and the digits of e. I still have fond memories of the number theory course I took from Joe Roberts as an undergraduate, complete with mimeographed hand-written notes. I didn't spend much time with number theory after that, but I know the two ingredients that I'd start with:

  • The digits of e can be computed by a number of efficient algorithms.
  • Strings of digits that pass reasonable heuristics (e.g., not an even number, relatively prime to 30 mod 30, etc.) can be attacked with Solovay-Strassen, Miller-Rabin, or something fancier.

I haven't tried to solve the problem. Why bother, since it's been spoiled anyway...?

(comment bubbles) 0 comments

Software, Services, and Avoiding O(n)

Paul Brown @ 2004-09-07T23:33:22Z

Bryan Cantrill has an interesting and detailed discussion of the economics of software on his weblog, and I've been trying to reconcile Bryan's thoughtful exposition with what I know about the costs -- in the current market -- of traditional sales and distribution of complex enterprise software. I'm using "distribution" to mean the set of activities that occur between making a sale and having a customer. For relatively simple software, that could be sending a CD or DVD in the mail.

Here's a snapshot of my mental scratch paper:

Presuming that you have a team that understands product development as opposed to project development, the production of two units of an enterprise software product costs the same as the production of one unit, and thus has zero variable cost. (If you have a project team as opposed to a product team, the production of n units actually costs n+O(n) times as much as producing one unit.) Selling and distributing that software requires targeted marketing, inside sales, outside sales, sales engineers for proofs of concept, and a bench of implementation staff at the ready. A straightforward analysis will show that the cost of selling and distributing n units is O(n) times the cost of selling and distributing one unit; thus the softwareness (and profitability) of an enterprise software company directly depends on how difficult it is to sell and implement the product.

So, what gives? Enterprise software is software, isn't it? If that cost of sales and distribution is ~0.1n, then it's not software, but it's not so bad. On the other hand, if the cost gets close to ~1.0n, then it's bad unless you're an established player preserving your marketshare. (This is why it is important to be first to market.) In an off-blog email exchange with Bryan, I think he put his finger on the reason, which I'll paraphrase: "That sounds like services, not software."

Building enterprise software products and supporting enterprise software products are both situations where the cost of n units is O(1), so a key element in selecting a business strategy is figuring out both how to make n sufficiently large while keeping the cost of sales and distribution managable.

Widely used open source is one way to do it, since n is large and sales and distribution are essentially free. Loosely:

  • The cost of sales is the cost of maintainging a community, which is arguably O(1).
  • The cost of distribution, if handled by professional services partners, is a manageable O(n) (like ~0.1n) for the overhead of partner management and training.
  • The O(n) revenue produced by the support business dominates the O(1) cost of engineering and manufacturing.

And that sounds like a software business.

There are other ways to do it, too, related to packaging, but I'll leave those for another entry.

(comment bubbles) 0 comments

All Posts contains 397 items in 57 pages of 7 items each:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57