process_isolation
is a simple and elegant python module that lets
you run python modules in child processes but interact with them like
ordinary python modules.
Installation
Process isolation is implemented in pure python so installation is simple:
$ pip install process_isolation
Quickstart
Let's start with the hello world of process isolation:
from process_isolation import import_isolated
sys = import_isolated('sys')
sys.stdout.write('Hello world\n')
A few things happened here:
We imported the
process_isolation
module.A child process was forked off from the main python process and the
sys
module was imported into that process.The main python process requested that the child process run
sys.stdout.write('Hello world\n')
The child process wrote
Hello world
to standard output.
One reason to run code in an isolated process is to debug code that might crash at the C level, such as due to a segmentation fault rather than an ordinary python exception. Here is some dangerous code:
# buggy.py:
import types
def dragons_here():
types.FunctionType(types.CodeType(0, 0, 1, 0, 'd\x00\x00S', (), (), (), '', '', 1, ''),{})()
Running this code causes a hard abort (not a regular python exception), which makes it difficult to debug:
>>> import buggy
>>> buggy.dragons_here()
Segmentation fault: 11
However, inside an isolated process we can safely run this code without our entire python interpreter crashing:
from process_isolation import import_isolated, ProcessTerminationError
buggy = import_isolated('buggy')
try:
buggy.dragons_here()
except ProcessTerminationError as ex:
print 'There be dragons!'
Using process isolation
process_isolation
tries to be invisible whenever possible. In many
cases it is possible to simply replace
import X
with
X = import_isolated('X')
and leave all other code unchanged. Internally, process_isolation
shuttles data back and forward between the main python interpreter and
the forked child process. When you call a function from an isolated
module, that function runs in the isolated child process.
os = import_isolated('os')
os.rmdir('/tmp/foo') # this function will run in the isolated child process
The same is true when you instantiate a class -- after all, a constructor is really just a special kind of function.
collections = import_isolated('collections')
my_dict = collections.OrderedDict()
This code creates an OrderedDict
object residing in the isolated
process. To make sure the isolated process really is isolated, the
OrderedDict
will stay in the child process forever. my_dict
is
actually a proxy object that will shuttle member calls back and forth
to the child process. For all intents and purposes, you can treat
my_dict
just like a real OrderedDict
:
my_dict['abc'] = 123
print my_dict['abc']
print my_dict.viewvalues()
for key,value in my_dict.iteritems():
print key,value
try:
x = my_dict['xyz']
except KeyError:
print 'The dictionary does not contain xyz'
Under the hood, each of these calls involves some shuttling of data
back and forth between the child and server process. If anything were
to crash along the way, you would get a ProcessTerminatedError
instead of a hard crash, but other than that, everything should work
exactly as if there were no process isolation involved.
Copying objects between processes
Sometimes this proxying behaviour can be inconvenient or
inefficient. To get a copy of the real object behind the proxy, use
byvalue
:
from process_isolation import import_isolated, byvalue
collections = import_isolated('collections')
proxy_to_a_dict = collections.OrderedDict({'fred':11, 'tom':12})
the_real_deal = byvalue(collections.OrderedDict({'fred':11, 'tom':12}))
print type(proxy_to_a_dict)
print type(the_real_deal)
This will print:
>>> process_isolation.ObjectProxy
>>> collections.OrderedDict
byvalue
copies an object from the child process to the main
python interpreter, with the usual semantics of deep copies. Any
references to the original object will continue to refer to the
original object. If the original object is changed, those changes will
not show up in the copy residing in main python interpreter, and vice
versa.
Note that all calls to members of the_real_deal
will now execute in
the main python interpreter, so if one of those members causes a
segfault then the main python interpreter will crash, just as if you
ran the whole thing without involving process_isolation
at all.
Why process isolation?
Dealing with misbehaving C modules
We originally built process_isolation
to help benchmark a computer
vision library written in C. We had built a python interface to the
underlying C library using boost python and we wanted to use python to
manage the datasets, accumulate success rates, generate reports, and
so on. During development, it was not uncommon for our C library to
crash from time to time, but instead of getting an empty report
whenever any one test cases caused a crash, we wanted to record
exactly which inputs caused the crash, and then continue to run the
remaining tests. We built process_isolation
and used it to run all
the computer vision code in an isolated process, which allowed us to
give detailed error reports when something went wrong at the C level,
and to continue running the remaining tests afterwards.
We were also running our computer vision code interactively from
ipython. However, importing the computer vision module directly meant
that a crash at the C level would destroyed the entire ipython session
and all the working variables along with it. Anyone who has done
interactive experiments with numerical software will appreciate the
frustration of losing an hour of carefully constructed matrices just
before the command that would have completed whatever experiment was
being run. Using process_isolation
from ipython avoided this
possibility in a very robust way. At worst case, a command would raise
a ProcessTerminationError
, but all the variables and other session
state would remain intact.
Running untrusted code
Although there are many ways of running untrusted code in python, the
most secure way is to use a restricted environment enforced by the
operating system. process_isolation
is ideal for running some code
in a subprocess. Here is how to create a chroot
jail.
First we create an "untrusted" module to experiment with.
# untrusted.py: untrusted code lives here
import os
def ls_root():
return os.listdir('/')
Next we set up the chroot jail. Note that this code must be run with
superuser priveleges because the chroot
system call requires
superuser priveleges.
# run_untrusted_code.py
import os
import process_isolation
# Start a subprocess but do not import the untrusted module until we've installed the chroot jail
context = process_isolation.default_context()
context.ensure_started()
# Create a directoy in which to jail the untrusted module
os.mkdir('/tmp/chroot_jail')
# Create a file inside the chroot so that we can recognize the jail when we see it
with open('/tmp/chroot_jail/you_are_in_jail_muahaha','w'):
pass
try:
# Install the chroot
context.client.call(os.chroot, '/tmp/chroot_jail')
except OSError:
print 'This script must be run with superuser priveleges'
# Now we can safely import and run the untrusted module
untrusted = context.load_module('untrusted', path=['.'])
print untrusted.ls_root()
# Clean up
os.remove('/tmp/chroot_jail/you_are_in_jail_muahaha')
os.rmdir('/tmp/chroot_jail')
$ sudo python run_untrusted_code.py
['you_are_in_jail_muahaha']