Herein part six in my hobby project to rewrite my personal
publishing software in Haskell. In part
five (and its addendum),
I roughed-out a persistence and concurrency model for the back-end.
The next two pieces are rendering content (which will be done
programmatically using the Text.XHtml.Strict
module; that's a separate post) and integrating with a web server via
FastCGI. This post covers
FastCGI integration for Lighttpd and Apache2 in the form of
smoke-testing a simple FastCGI handler.
Units of Concurrency
For the concurrency model that I plan to use in the actual application, a single OS process is critically important, as multiple processes wouldn't be aware of who was doing what within the other processes. Multiple active threads within that one process are fine. Most web-based systems use a single process as a concurrency pinch point, but that process is usually the database as opposed to the web application.
Haskell in the form of GHC provides two flavors of concurrency,
which I'll refer to as forkIO and forkOS
(after the functions forkIO
and forkOS,
respectively). The forkIO flavor uses Haskell's
internally managed, lightweight threads, and the forkOS
flavor uses threads from the underlying operating system. (For some
perspective on what an OS thread really means, take a look at my post
on SMP Erlang on Mac OS.) The FastCGI binding library provides a
mechanism to use forkIO, forkOS, or some
other mechanism to assign a worker thread to a request, and I want to
compare the two fork flavors for performance and stability.
It's worth reading the fine print in the Control.Concurrent
documentation. For the present application, every thread does make
foreign calls as part of handling a FastCGI request and every request
is likely to complete in less than Haskell scheduler's default
granularity of 20ms. I'm less interested in performance and more
interested in looking for leaks, deadlocks, or other bad behavior.
Building the Right Network.FastCGI
Both the 1.0
and 3000.0.0
versions of the Network.FastCGI
appear in the Hackage
directory, but the darcs head version (3001.0.0 as
of this posting) is the one with the multi-threaded bindings exposed.
(For the uninitiated, darcs is a
distributed source code management system implemented in Haskell. The
darcs codebase is in literate Haskell, so it's an interesting read for
that if nothing else.) Darcs is available from the usual package
managers; I use MacPorts on the
Mac.
Get the latest Network.FastCGI from the darcs repository:
darcs get http://darcs.haskell.org/fastcgi/
The fastcgi.cabal file documents the dependencies, but a GHC 6.6.1 install is sufficient. Then build and install it the usual Cabal way:
cd fastcgi runghc Setup.hs configure runghc Setup.hs build sudo runghc Setup.hs install
One more step is needed to register the package with the compiler:
sudo runghc Setup.hs registerAnd then to make sure that it worked:
$ ghc-pkg list
/opt/local/lib/ghc-6.6.1/package.conf:
Cabal-1.1.6.2, GLUT-2.1.1, HGL-3.1.1, HUnit-1.1.1, OpenGL-2.2.1,
QuickCheck-1.0.1, X11-1.2.1, base-2.1.1, cgi-3001.1.1,
fastcgi-3000.0.0, fastcgi-3001.0.0, fgl-5.4.1, filepath-1.0,
(ghc-6.6.1), haskell-src-1.0.1, haskell98-1.0, html-1.0.1,
mtl-1.0.1, network-2.0.1, parsec-2.0, readline-1.0,
regex-base-0.72, regex-compat-0.71, regex-posix-0.71, rts-1.0,
stm-2.0, template-haskell-2.1, time-1.1.1, unix-2.1, xhtml-3000.0.2
It takes a bit more doing to get it going with the GHC tip (to-be
6.8) because the Data.ByteString modules have been
promoted into core GHC packages and rearranged a bit, but no
meaningful code changes beyond some of the import statements and the
fastcgi.cabal file are required. (I've sent a patch to the
maintainer.)
A Simple FastCGI Handler for Process/Thread Information
The following short Haskell program (test_IO.hs) sends back a plain text response with process and thread information:
import Control.Concurrent
import System.Posix.Process (getProcessID)
import Network.FastCGI
test :: CGI CGIResult
test = do setHeader "Content-type" "text/plain"
pid <- liftIO getProcessID
threadId <- liftIO myThreadId
let tid = concat $ drop 1 $ words $ show threadId
output $ unlines [ "Process ID: " ++ show pid,
"Thread ID: " ++ tid]
main = runFastCGIConcurrent' forkIO 10 test
(This is an adaptation of the printinput.hs
example that uses the multi-threaded API.) To build it:
$ ghc -threaded -package fastcgi --make -o test_IO.fcgi test_IO.hs [1 of 1] Compiling Main ( test_IO.hs, test_IO.o ) Linking test_IO.fcgi ...
The equivalent program (test_OS.hs) with
forkOS in place of forkIO does the job for
OS threads.
I can use these two FastCGI handlers with different possible
compiler version, web server, and FastCGI module combinations and see
how things do under some simulated loads. The only gotcha with this
approach is that some HTTP benchmarking tools use response byte counts
as an assertion of a correct response, and they will complain as the
thread ID goes from one digit to two to three, etc. My current
favorite is Jef Pozkanzer's simple http_load
with a tiny tweak to show
the response code if a byte count comes out off. Using a different
tool, e.g., ab
or httperf,
produces similar results.
The Web Servers: Apache2 and Lighttpd
There are probably other alternatives that I'm overlooking, but I'm going to try the two web servers that I'm familiar with, Lighttpd 1.4.15 and Apache 2.2.4, both on Mac OS X.
Configuring Lighttpd
A Lighttpd configuration file fragment for a FastCGI handler with a single process would be:
fastcgi.server = ( ".fcgi" =>
( "localhost" =>
(
"socket" => "/tmp/test.sock",
"bin-path" => "/path/to/test_OS.fcgi",
"min-procs" => 1,
"max-procs" => 1
)
)
)
See the Lighttpd FastCGI documentation for the full rundown on parameters.
Also, at least as of Lighttpd 1.4.15, which is the version that MacPorts installed for me, the following configuration change is necessary to avoid a bug:
server.event-handler = "poll"
(The default value is freebsd-kqueue; see the Lighttpd
performance documentation.)
After copying the file into place, we can spin-up Lighttpd and hit the URL:
$ lighttpd -f lighttpd.conf lighttpd -f lighttpd.conf $ curl http://localhost:8181/test.fcgi Process ID: 21139 Thread ID: 4 $ curl http://localhost:8181/test.fcgi Process ID: 21139 Thread ID: 5
The thread ID changes and the process ID doesn't, so things are good. For a bigger kick:
$ echo 'http://127.0.0.1:8181/test_OS.fcgi' > /tmp/lighttpd_OS $ http_load -parallel 20 -fetches 1000 /tmp/lighttpd_OS 2>&1 | grep -v 8181 1000 fetches, 20 max parallel, 33908 bytes, in 0.375528 seconds 33.908 mean bytes/connection 2662.92 fetches/sec, 90294.2 bytes/sec msecs/connect: 0.19518 mean, 1.036 max, 0.09 min msecs/first-response: 7.26042 mean, 25.558 max, 4.31 min 996 bad byte counts HTTP response codes: code 200 -- 1000
The 996 bad byte count errors are expected, since the responses for thread IDs 10 through 1005 have a different number of bytes than those for thread IDs 6,7,8, and 9. In any case, so far, so good:
$ curl http://localhost:8181/test_OS.fcgi Process ID: 21139 Thread ID: 1006
Configuring Apache2 with mod_fastcgi
The single-process configuration file fragment for Apache2 with mod_fastcgi is:
LoadModule fastcgi_module modules/mod_fastcgi.so
FastCgiConfig -maxClassProcesses 1 -processSlack 1
<Location /fastcgi>
SetHandler fastcgi-script
Options ExecCGI
allow from all
</Location>
This configuration passes the basic smoke test with no issues.
Under load, the forkIO version burns about half the CPU
and the same amount of memory as the forkOS version.
Both versions use three OS threads most of the time, and as expected
based on the comments above about the way that Haskell handles
scheduling, the forkOS version never uses more than four
OS threads no matter how hard the server is hit.
Configuring Apache2 with mod_fcgid
The configuration fragment for Apache2 with mod_fcgid is:
LoadModule fcgid_module modules/mod_fcgid.so MaxProcessCount 1 <Location /fcgid> SetHandler fcgid-script Options ExecCGI allow from all </Location>
With the same smoke testing approach as above (with a redirect to silence the byte count complaints):
$ echo 'http://127.0.0.1:7007/fcgid/test_OS.fcgi' > /tmp/fcgid_OS $ curl http://127.0.0.1:7007/fcgid/test_OS.fcgi Process ID: 16854 Thread ID: 4 $ http_load -parallel 20 -fetches 1000 /tmp/fcgid_OS 2>&1 | grep -v fcgid 1000 fetches, 20 max parallel, 34704 bytes, in 1.2484 seconds 34.704 mean bytes/connection 801.028 fetches/sec, 27798.9 bytes/sec msecs/connect: 0.294162 mean, 2.339 max, 0.051 min msecs/first-response: 12.8977 mean, 1009.92 max, 2.758 min 986 bad byte counts HTTP response codes: code 200 -- 998 code 500 -- 2 $ curl http://127.0.0.1:7007/fcgid/test_OS.fcgi Process ID: 16869 Thread ID: 7
Fail, since 16854 /= 16869, and based on the
mod_fcgid's stated goals of keeping FastCGI handlers
"fresh" by killing them at the first sign of an issue, not that
surprising.
Aggregated Results and Additional Observations
The results in these tables were generated using
http_load. For the "6000/min" test:
$ http_load -rate 100 -seconds 60 url_file 2>&1 | grep -v port
For the "60000/min" test:
$ http_load -rate 1000 -seconds 60 url_file 2>&1 | grep -v port
For the fixed rate tests, the number of nines is determined by the proportion of 200 responses out of the total number of responses (the others being 500 and 503). For the requests per second mark:
$ http_load -parallel 20 -seconds 10 url_file 2>&1 | grep -v port
First, with the current GHC version:
| GHC 6.6.1, 4-core G5 | |||
|---|---|---|---|
| Server | FastCGI Support | forkIO | forkOS |
| Lighttpd 1.4.15 | built-in | Just OK 6000/min - all good 60000/min - incomplete max ~3000 req/sec | JUST OK 6000/min - all good 60000/min - incomplete max ~2200 req/sec |
| Apache 2.2.4 | mod_fastcgi | GOOD 6000/min - all good 60000/min - three 9's max ~2700 req/sec | BEST 6000/min - all good 60000/min - four 9's max ~2100 req/sec |
| Apache 2.2.4 | mod_fcgid | FAIL Process not stable. | FAIL Process not stable. |
None of these really cause the machine to break a sweat, with the web server doing most of the work and the FastCGI handler never consuming more than 60% of a core and a couple megabytes of resident memory. An overnight run showed the mod_fastcgi and forkOS combination to perform flawlessly under moderate load for over 108 requests.
With the latest GHC release candidate used to compile both the FastCGI package and the handlers:
| GHC 6.9.20070918 (darcs tip), 4-core G5 | |||
|---|---|---|---|
| Server | FastCGI Support | forkIO | forkOS |
| Lighttpd 1.4.15 | built-in | GOOD 6000/min - all good 60000/min - three 9's ~3300 req/sec | GOOD 6000/min - all good 60000/min - three 9's ~2200 req/sec |
| Apache 2.2.4 | mod_fastcgi | JUST OK 6000/min - three 9's 60000/min - three 9's ~2500 req/sec | JUST OK 6000/min - three 9's 60000/min - four 9's ~1900 req/sec |
| Apache 2.2.4 | mod_fcgid | FAIL Process not stable. | FAIL Process not stable. |
Looks like GHC 6.6.1 and Apache2/mod_fastcgi is the winning combination.
Addendum
I got GHC 6.6.1 installed and configured the forkIO
and forkOS handlers on the User Mode Linux
server where I have this blog hosted, and it looks like
forkIO is a winner there, with process stability and
around 100 requests per second sustained throughput. With the
forkOS variant, the process IDs do tick up with each hit,
but that's a property of fork() on the kernel where one
process corresponds to one thread rather than being a result of a
restarted FastCGI handler.













