I’ve been hacking on WAL-E, a nice little Postgres backup system from Heroku which uses gevent for concurrency. Much of my changes are related to UNIX pipelines, and I’ve run into a subtle issue which not only affects gevent but also Eventlet (which is our coroutine library of choice at Smarkets).
Here’s a trivial example — an Eventletized version of an example in the Python manual:
from eventlet.green.subprocess import Popen, PIPE fp = file('./input.file', 'r') # Should be reasonably large tf = file('./output.file', 'w') p1 = Popen(['sort'], stdin=fp, stdout=PIPE) p2 = Popen(['cat', '-'], stdin=p1.stdout, stdout=tf) p1.stdout.close() p1.wait() p2.wait()
You’ll get the following error:
cat: -: Resource temporarily unavailable sort: write failed: standard output: Broken pipe sort: write error
The problem is that you’re not expected to actually pipe data between separate processes.
Eventlet assumes that you’ll be using the
p1.stdout file descriptor from within your Python process, and it
helpfully marks it as non-blocking
for you so methods like communicate won’t block.
When you hand that file descriptor to
cat, the flags are preserved, and
cat isn’t happy when it tries to
read from what it thinks is a blocking socket and gets
Gevent doesn’t have a patched version of the
subprocess library, but the pattern of patching stdin and stdout of Popen is
repeated in a lot of gevent-using code, including within WAL-E itself.
I’m not sure if there’s an easy fix for this; the code can’t know whether you’ll be using the pipe yourself or passing it onto another process. In any case I’d rather be explicit about changing the options.