Pickling multiprocessing Connection objects

For safe message-based communication between threads and processes in Python, I tend to use multiprocessing’s Queue and Pipe. A pattern often seen is using a queue for sending messages from multiple producers to a single consumer.

When a producer wants a response to its message, I create a Pipe and piggy-back one end of the Pipe (a Connection object) to the message. I use Python dicts as messages, and use the string “reply_to” as the dictionary key for the connection objects.

When the queue consumer processes a message, it doesn’t know who the sender is or how to reach him. Though, if the message has an attached Connection object, the consumer can–almost magically–respond to the sender, across thread and process boundaries.

All good? Nope.

Any message sent through the queues and pipes must be serializable, or picklable as we say in Pythonesque. The multiprocessing.Connection objects can be serialized, but not unserialized, which means that you will not see an exception when you create your message, but some time later in the consumer that tries to respond. The exception does not tell you much, unless you’ve seen it before:

Traceback (most recent call last):
  File "pipetest2.py", line 10, in <module>;
    print c1.recv();
TypeError: function takes at least 1 argument (0 given)

This has been a known bug in Python for two years. Googling the exception leads to StackOverflow question asking for a workaround.

I’ve usually added a version of the workaround to some util package in my projects; one function for pickling a connection, and one function for unpickling a connection. In my code I’ve been forced to manually pickle/unpickle Connection objects before putting them on a Queue or Pipe.

from multiprocessing.reduction import reduce_connection
import pickle

def pickle_connection(connection):
    return pickle.dumps(reduce_connection(connection))

def unpickle_connection(pickled_connection):
    (func, args) = pickle.loads(pickled_connection)
    return func(*args)

This works great most of the time, but not this time. In the Python actor model library Pykka I use Connection objects to implement futures for thread-based actors, similar to how I use gevent’s AsyncResult for gevent-based actors. When someone sets a value on the future, it is written to one end of a Pipe. When someone tries to read the future’s value, they block on the other end of the Pipe until there is something to get or a timeout is reached. The problem appeared when I tried to nest futures, which is likely to happen if an actor, in response to your message, returns a future result from another actor. I no longer have the opportunity to babysit every Connection object that goes into or comes out of another Connection. They need to be able to watch over themselves. As the Connection class is implemented in C and is rather closed to changes, my solution was to wrap the Connection objects:

import multiprocessing.reduction

class ConnectionWrapper(object):
    """
    Wrapper for :class:`multiprocessing.Connection` objects to make them
    picklable.
    """

    def __init__(self, connection):
        self._connection = connection

    def __reduce__(self):
        (conn_func, conn_args) = multiprocessing.reduction.reduce_connection(
            self._connection)
        wrapper_func = _ConnectionWrapperRebuilder(conn_func)
        return (wrapper_func, conn_args)

    def __getattr__(self, name):
        return getattr(self._connection, name)


class _ConnectionWrapperRebuilder(object):
    """
    Internal class used by :class:`ConnectionWrapper` to rewrap
    :class:`multiprocessing.Connection` objects when they are depickled.

    A function defined inside :meth:`ConnectionWrapper.__reduce__` which takes
    :attr:`conn_func` from its scope cannot be used, as functions must be
    defined at the module's top level to be picklable.
    """

    def __init__(self, inner_func):
        self._inner_func = inner_func

    def __call__(self, *args):
        connection = self._inner_func(*args)
        return _ConnectionWrapper(connection)

The ConnectionWrapper class simply implements __reduce__ on the wrapped Connection object using multiprocessing’s own reduce_connection function. To work like a real Connection object, it dispatches any attribute access to the wrapped connection by implementing __getattr__.

To make sure the connection remains wrapped even after a trip through pickle.dumps() and pickle.loads(), _ConnectionWrapperBuilder is used for rebuilding the connection and rewrapping it on deserialization.

Given this wrapper, you can make your own Pipe function which creates a new pipe and wraps the connection objects for you.

Hopefully this trick will be of help until the bug is fixed in Python.