Speaking at OSCON 2009

Paul R. Brown @ 2009-05-29T05:08:55Z

speaking @OSCON With Dan Diephouse, I'll be speaking at OSCON on July 23.

Taking the abstract literally, the talk looks like it is about building a Twitter clone with open source components, but it is not at all intended to be armchair quarterbacking about Twitter's early problems with availability. (We should all have these problems!) Rather, the talk is intended to be about some of the current crop of interesting open source distributed storage technologies — Cassandra, Voldemort, Redis (where the folks have already done some thinking about Twitter-like apps), CouchDB, HBase, Dynomite — as well as how to attack some of the operational problems (e.g., deployment, instrumentation, application updates) that come with using new tools in multi-node environments.

That's obviously quite a bit to fit into a relatively short speaking slot, but Dan and I plan to blog or otherwise publish material that won't fit.

(comment bubbles) 0 comments

Integrating Github and Redmine

Paul R. Brown @ 2009-05-27T05:37:21Z

I've been a fan and user of Atlassian's excellent Jira since the company was founded back in 2002, but I needed the ability to set up some quick-hit bug/task/wiki sites for smaller consulting projects and neither the month-to-month hosted model nor the enterprise license made good economic sense. I opted for the an install of Redmine, and while it's no Jira, I've been reasonably happy with it. (The one big headache was getting SMTP over TLS working.)

Redmine supports integration with Git repositories on a per-project basis and will link commits to issues based on the presence of keywords and issue identifiers (e.g., "refs #123"). The way the integration is implemented works well if the Git repository is hosted on the same machine as the Remine instance, but I host all customer and internal work on github. Here's a quick recipe to bridge the gap.

First, add an ssh key for the redmine user to your github account.

Next, create a home for the following shell script, e.g., /opt/redmine_extras/bin and a home for Git repositories on the server, e.g., /var/redmine/git_repositories and ensure that the redmine user has write privileges for the repositories. Here's the pull_git script:

#!/bin/bash
export REPOS=/var/redmine/git_repositories
export REDMINE_HOME=/opt/redmine-0.8.2
export LOGFILE=/var/log/redmine_extras.log

function log_prefix {
        echo -n `date '+%Y/%m/%d %H:%M:%S'`" ["$$"] ${2}"
}

for i in `ls -d ${REPOS}/*.git`; do 
  cd $i;
  log_prefix && echo 'Processing git repository from '${i}'...';
  /usr/local/bin/git --bare fetch origin :master
done

cd ${REDMINE_HOME}
log_prefix && echo 'Updating Redmine...'
/usr/local/bin/ruby script/runner "Repository.fetch_changesets" -e production

Then (I'm logged in as root) add the command to the redmine user's crontab:

# echo '*/10 * * * *    /opt/redmine_extras/bin/pull_git 2>&1 >> /var/log/redmine_extras.log'\
 | crontab -u redmine -

Now, for each repository, say foo and your github user is bar, you will track from Redmine, do:

# cd /var/redmine/git_repositories
# sudo -u redmine -H git clone --bare git@github.com:bar/foo.git
# cd foo.git
# sudo -u redmine -H git --bare remote add origin git@github:bar/foo.git

Ensure that the Redmine project points to the local copy of the Git repository, and the revisions should start getting syncronized every ten minutes.

(comment bubbles) 0 comments

How to Monitor Java Applications on EC2 with Cacti

Paul R. Brown @ 2009-05-19T05:53:21Z

As part of a scale-out effort for a customer moving from a single node hosted on Slicehost to a multi-node environment hosted in the US and EU on Amazon EC2, I wanted a way to introduce a combination of application and host-level monitoring for the nodes. I settled on the combination RRDTool graphs served by Cacti and an alive check provided by a third party (Monitis), but there was no immediately obvious way to bridge the gap between the Java services and the Cacti convenience wrapper around RRDTool.

This was before the recent announcement by Amazon of monitoring functionality for EC2 nodes, but that service wouldn't meet the primary use case of application versus host monitoring. A tool like JConsole didn't make sense because I was interested in getting a single portal view across the fleet and in having retrospective data to make visual day-to-day or week-to-week comparisons.

This post describes how to bring the pieces together, and the technique is equally applicable to non-Java systems — any system that can serve HTTP requests can be instrumented. In the end, about a day's worth of experimentation and work was enough to get me the level of instrumentation I was after.

Host Configuration Requirements

Each of the nodes in the fleet runs on a slightly modified CentOS 5.2 AMI (based on one (ami-1363877a) provided by Rightscale), and getting basic host information exposed over SNMP is straightforward:

$ yum install net-snmp
[... lots of output ...]
$ mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf-old
$ echo 'rocommunity public' > /etc/snmp/snmpd.conf
$ /etc/init.d/snmpd restart

The underlying assumption, of course, is that the instance was launched under a security group that exposes UDP ports 161 and 162 to the host that will be running Cacti. This can all be made to work without assigning elastic IP numbers to the nodes and to the Cacti host, but it's easier.

For the Cacti host, more or less any modern Linux distribution (e.g., Ubuntu or CentOS) will do, and I'd recommend following Eric Hammond's very nice tutorial about setting up MySQL on an EBS volume before doing the Cacti install. For the same reason it makes sense to have MySQL on the attached EBS volume (survive instance termination, support backups, etc.), it makes sense to store RRDTool's backing data there as well.

Instrumentation and Collection

The Java application in question (SmartFox) has no explicit support for exporting metrics and no MBeans exposed for access via JMX, but it does provide some API-level support for basic information and an embedded servlet container (Jetty, of course). (SmartFox does bundle a Flash-based administrative tool, but like JConsole it's single-node and does not provide much beyond in the way of retrospective data.)

After some poking around (i.e., reading PHP source code) in Cacti, I found that Cacti's standard "Script/Command" data input method consumes data as space-separated name/value pairs on a single line:

name1:value1 name2:value2 ...

So I put together a simple servlet to grab the server singleton object from the SmartFox API and print metrics out on a text/plain response. This could just as easily be done with an MBean instance looked up via the JVM's default JMX infrastructure or a metric facade injected into the servlet as part of the overall web application — the point is that the single line of name/value pairs is the required interface to Cacti.

The data is then accessed via a curl invocation templated for variables:

curl http://<host>:<port>/<webapp>/sfs-status?zone=<zone>

The fields in angle brackets are input fields that will be filled-in by other objects in Cacti, and the output fields for the data input method should be named to match the names in the name/value pairs from above.

The downside of this approach is that there is quite a bit of configuration that goes on top of this one-liner (graphs instantiate graph templates and pull from data sources that reference data templates that in turn reference data input methods, or something like that), but it more or less just works. (Even at that, it is less painful and more forgiving than some other tools I've worked with, e.g., ZenOSS.) A couple of hours of experimentation should be enough to get a decent set of basic graphs customized for the application at hand.

[an RRD graph]

As mentioned above, it goes without saying that the EC2 security groups for the instances need to be set up so that this data is not generally accessible but can still be seen by the Cacti host.

Tips and Tricks

The only real issues that I encountered in the process were some disconnects between what Cacti allows you to enter and what RRDTool accepts as input. Once you're done with the necessary setup or some tweaks, if your graphs either don't appear or disappear, there's a good chance that RRDTool doesn't like what Cacti is asking it to do. In that case, turn on the "graph debug" option to see what Cacti is sending to RRDTool and adjust your configuraiton accordingly.

(comment bubbles) 1 comment

Product Management for the Busy Entrepreneur

Paul R. Brown @ 2009-03-16T18:59:23Z

I was talking with a budding entrepreneur in the open source "big data" space (with my Entreprementor hat on), and we talked about his sales pipeline and potential customers. He had a list of a couple dozen companies that had expressed some interest, and that much was great. Just the same, I asked about the elephant in the room: Interest in what? Interest in your wonderful open source doohickey might get you Internet Famous like the Star Wars Kid but isn't likely to pay your bills let alone be something to build a company on.

This brings me to the subject of product management for the busy entrepreneur.

Product Versus Business

It's easy to get confused about the difference between a product and a business. A business is a machine that turns something you have into money. (One that produces more money than it consumes is a good business...) A product is a describable, sellable thing that your business can produce over and over and customers can consume over and over without too many changes. Products can be broad, like professional services, or very specific, like machine parts, but the aspects of commonality in description and delivery need to be there.

The point of a product is that the commonalities enable scale in a business's internal processes, from production to sales to accounting. It is also that commonality that makes a business an investment prospect because you can make reasonable inferences about capital in versus capital out (ergo value). Defining a product involves a bit of intuition and guesswork, but refining that definition is simple: Ask potential customers if they would spend money on it. I've never been able to understand the relative reluctance of some entrepreneurs to get on the phone or hit the street with an idea, maybe out of a reluctance to have their idea trashed by reality, but the customer's money is the one source of Truth. If it's difficult to explain, if it's not something that the customer's business can readily consume, or if the customer doesn't "get" it, then it needs to change.

Rows and Columns

Once you have a panel of potential customers assembled, it's time to sit down with a spreadsheet and figure things out. Customers go down column "A", and potential products go across row 1. Put an "x" and maybe a note in a cell if that customer would pay for that product, and then look for the column with the most x's. Alternatively, you can use the price that the customer would pay as the value for the cell in the column and try to make a more refined decision based on the profitability of the offerings, but the idea is the same — Get real data on what customers want.

There's an aspect to survey design that's important when interviewing a potential customer. To get a real response, ask specific questions and set the expectation with that potential customer that you'll very likely be back to get a check from them. You should expect to iterate on the process a few times, as customers may help you add columns to the spreadsheet, but you should avoid open-ended questions.

Real product management is quite a bit more involved and detailed but equally necessary as your product and base of customers grows and evolves. Nonetheless, this should be enough to get started.

Basic product management is one of those times where stating the obvious is useful: Data is helpful in making decisions.

(comment bubbles) 3 comments

Up to What I Am

Paul R. Brown @ 2009-02-25T06:47:16Z

After a short stint at Amazon.com in 2005, I took some time off, where "time off" means not being an entrepreneur or having open-ended commitments — other than at home. I had a lot of fun getting back to basics as a software developer building applications with some great customers (learning some new industries in the process), working closely with a few entrepreneurs, and with being a hands-on Dad for my kids.

At the start of 2009, with kid #2 up and crawling, kid #1 finally sleeping decently (most of the time, knock wood), and a fresh calendar year ahead, I sat down to think about what to do next. I made a list of things that I think are interesting and things that I do well and/or enjoy. Next, I brainstormed and ranked concepts for businesses with rank roughly defined by the combination of my level of interest, the impact of the idea, and the value both to and from my network and experience in making it a success.

All of the business concepts were subject to the following constraints:

  • Minimal startup costs. Getting going should take no more than an LLC ~($200), some assorted licenses (<$100), a domain (~$10), Google Apps for email, and a modicum of non-free infrastructure if required (no more than $50/month total — private repository on GitHub, and maybe CRM/SFA like PipelineDeals, etc.).
  • Short path to gauge interest and engage customers. Typical customers, partners, and advisors/boosters should all be present within my personal network and if not, easily identified, enumerated, and contacted
  • Strong collaborators. The reinforcement and feedback that comes from working closely (and occasionally butting heads with) collaborators is important.
  • Head start or unique angle. The business should have a built-in competitive advantage in the form of knowledge, relationships, or intellectual property.

I ended up with a few dozen things and about half that many business concepts. Some obvious things made the list of, e.g., open source, simpler and lighter middleware, big data, mathematics (including statistics and probability), functional languages, visualization, consuming less, and being data-driven in everyday life. Some less obvious things made the list, too, e.g., teaching/mentoring, generative music, ultra-local agriculture (in your yard or even home), and scholarly communications. I intend to revisit the list of things and businesses as the year unfolds and my perspective evolves.

The first two businesses that percolated to the top of the list were FasterXML and Fremont Analytics. Both are now their own LLCs and spinning up. I am interested in the combination of open source, middleware, and big data in the mold of Cassandra, CouchDB, and Voldemort, but I'm not ready to place a bet there just yet.

(comment bubbles) 0 comments

A Little Processing

Paul R. Brown @ 2009-02-24T09:40:30Z

I wanted a visual representation of the usual continued fraction expansion of the irrational number e that used concentric rings to represent the successive anapests of the tail of the expansion; with grouping added for emphasis:

2, 1,1,2, 1,1,4, 1,1,6, ...

The first ring would have four sectors; the second would have six, the third eight, etc. Something like this:

The question was how to get it drawn, and after a little thought, I settled on writing a Processing program to generate the image. The language doesn't include a primitive for drawing sectors, but it's possible to represent a second as a larger filled arc and then a smaller filled arc that covers the same angle but filled with background color:

fill(foreground);
arc(x0,y0,r1,r1,theta0,theta1);
fill(background);
arc(x0,y0,r0,r0,theta0,theta1);

Or, in my case, just stacking up pie charts with the smallest on top is sufficient. The source is here.

(comment bubbles) 0 comments

What Chess Club Should Teach You About Pitching

Paul R. Brown @ 2009-02-02T20:27:09Z

Cross-posted on nPost.

The fundamental goal of selling is to clearly communicate to your audience, and this applies equally to pitching a company to investors or pitching a product to potential customers. It's obvious that confusing the audience conflicts with that fundamental goal, but it can be surprisingly difficult to avoid.

Your goal should be to provide your audience with exactly the information they need to make a decision, no more and no less. It's easy enough to fix providing too little information — just provide more. Providing too much information, probably confusing your audience in the process, is much more difficult to cure, and this is where a lesson from chess club comes in.

Chess Clock by Keeping the feedback loop tight is the best way to throttle the flow of information, and using a chess clock — imagined or actual — is one way to do that. When you start talking, think about tapping the clock to start your turn, or if you're on the phone or otherwise not in front of the customer, feel free to use a stopwatch or countdown timer. Twenty to thirty seconds is a reasonable benchmark, and if you've got other people in on the pitch with you, it's up to you to set the ground rules for them to ensure that everyone's bound by the clock. It's enough time to communicate an idea but not so much time that your audience starts reaching for their Blackberries while you ramble.

(comment bubbles) 4 comments

All Posts contains 399 items in 57 pages of 7 items each:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57