I’ve been hacking on WAL-E, a nice little Postgres backup system from Heroku which uses gevent for concurrency. Much of my changes are related to UNIX pipelines, and I’ve run into a subtle issue which not only affects gevent but also Eventlet (which is our coroutine library of choice at Smarkets).
Here’s a trivial example — an Eventletized version of an example in the Python manual:
You’ll get the following error:
cat: -: Resource temporarily unavailable
sort: write failed: standard output: Broken pipe
sort: write error
The problem is that you’re not expected to actually pipe data between separate processes.
Eventlet assumes that you’ll be using the p1.stdout
file descriptor from within your Python process, and it
helpfully marks it as non-blocking
for you so methods like communicate won’t block.
When you hand that file descriptor to cat
, the flags are preserved, and cat
isn’t happy when it tries to
read from what it thinks is a blocking socket and gets -EAGAIN
.
Gevent doesn’t have a patched version of the subprocess
library, but the pattern of patching stdin and stdout of Popen is
repeated in a lot of gevent-using code, including within WAL-E itself.
I’m not sure if there’s an easy fix for this; the code can’t know whether you’ll be using the pipe yourself or passing it onto another process. In any case I’d rather be explicit about changing the options.
I'm currently looking for contract work in London or remote — if you're interested, get in touch.
To comment on this post, mention me on twitter, or drop me an email.