Performance - Why `pickle.dump" chickle.Load`ipc is so slow, is there a quick substitute? - alternative, Chickle, Chickle.Load, dump, FAST, IPC, load, performance, Pickle, Pickle.dump, presence, slow, whether, why

I am using python subprocess for IPC. Now, let us assume that I have to use subprocess.Popen to spawn other processes, so I cannot use multiprocessing.Pipe for communication. First, I thought Is to use their STDIO stream with pickle.load pickle.dump (don’t worry about safety now).

However, I noticed that the transfer rate is very bad: The order volume is 750KB/s! This is slower than communicating through multiprocessing. The pipeline uses a factor of 95, and as far as I know, it also uses pickle. There is no benefit to using cPickle.

(Update: note, I realize that this is just The situation on python2! It works fine on python3.)

Why is it so slow? I suspect that the reason is the way to perform IO through python file objects instead of C file descriptors in .dump/.load. Maybe it has something to do with GIL?

Is there any cross-platform way to get the same speed as multiprocessing. Pipeline?

I have found that on linux, you can use _multiprocessing.Connection (or multiprocessing.connection.Connection on python3) to wrap the STDIO file descriptor of the child process and get what I want. However, This is not possible on win32, I don’t even know Mac.

Benchmark:

from __future__ import print_function
from timeit import default_timer
from subprocess import Popen, PIPE
import pickle
import sys
import os
import numpy
try:
 from _multiprocessing import Connection as _Connection
except ImportError:
 from multiprocessing.connection import Connection as _Connection

def main(args):
 if args:
 worker(connect(args [0], sys.stdin, sys.stdout))
 else:
 benchmark()

def worker(conn):
 while True:
 try:
 amount = conn.recv()
 except EOFError:
 break
 else:
 conn.send(numpy.random.random(amount))< br /> conn.close()

def benchmark():
 for amount in numpy.arange(11)*10000:
 pickle = parent('pickle', amount, 1)
 pipe = parent('pipe', amount, 1)
 print(pickle[0 ]/1000, pickle[1], pipe[1])

def parent(channel, amount, repeat):
 start = default_timer()
 proc = Popen([ sys.executable,'-u', __file__, channel],
 stdin=PIPE, stdout=PIPE)
 conn = connect(channel, proc.stdout, proc.stdin)
 for i in range(repeat):
 conn.send(amount)
 data = conn.recv()
 conn.close()
 end = default_timer()
 return data.nbytes, end-start

class PickleConnection(object):
 def __init__(self, recv, send):
 self._recv = recv
 self. _send = send
 def recv(self):
 return pickle.load(self._recv)
 def send(self, data):
 pickle.dump(data, self. _send)
 def close(self):
 self._recv.close()
 self._send.close()
< br />class PipeConnection(object):
 def __init__(self, recv_fd, send_fd):
 self._recv = _Connection(recv_fd)
 self._send = _Connection(send_fd)
 def recv(self):
 return self._recv.recv()
 def send(self, data):
 self._send.send(data)
 def close( self):
 self._recv.close()
 self._send.close()

def connect(channel, recv, send):
 recv_fd = os .dup(recv.fileno())
 send_fd = os.dup(send.fileno())
 recv.close()
 send.close()
 if channel = ='pipe':
 return PipeConnection(recv_fd, send_fd)
 elif channel =='pickle':
 return PickleConnection(os.fdopen(recv_fd,'rb', 0),
 os.fdopen(send_fd,'wb', 0))
 else:
 raise ValueError("Invalid channel: %s"% channel)

if __name__ == '__main__':
 main(sys.argv[1:])

Result:

Thanks for reading,

Thank you Mas

Update:

Okay, so I analyzed it as suggested by @martineau. For a single-run independent call with a fixed value of amount = 500000, get The following results.

In the parent process, the popular calls sorted by tottime are:

11916 function calls (11825 primitive calls) in 5.382 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
 35 4.471 0.128 4.471 0.128 {method'readline' of'file' objects} 
 52 0.693 0.013 0.693 0.013 {method'read' of'file' objects}
 4 0.062 0.016 0.063 0.016 {method'decode' of'str' objects}

Medium:

11978 function calls (11855 primitive calls) in 5.298 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
 52 4.476 0.086 4.476 0.086 {method'write' of'file' objects}
 73 0.552 0.008 0.552 0.008 {repr}
 3 0.112 0.037 0.112 0.037 {method'read' of'file' objects)

This worries me a lot, the use of read lines may be the cause of poor performance.

The following connections only use pickle.dumps / pickle.loads and write / read.

class DumpsConnection(object):
 def __init__(self, recv, send):
 self._recv = recv
 self._send = send
 def recv(self):
 raw_len = self._recvl(4)
 content_len = struct.unpack('>I', raw_len)[0]
 content = self._recvl(content_len)
 return pickle.loads(content)
 def send(self, data):
 content = pickle.dumps(data)
 self._send.write (struct.pack('>I', len(content)))
 self._send.write(content)
 def _recvl(self, size):
 data = b''< br /> while len(data)  packet = self._recv.read(size-len(data))
 if not packet:
 raise EOFError
 data + = packet
 return data
 def close(self):
 self._recv.close()
 self._send .close()

Actually, its speed is only 14 times worse than the multiprocessing speed. Pipeline. (Which is still bad)

Now analyze, in the parents:

11935 function calls (11844 primitive calls) in 1.749 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename: lineno(function)
 2 1.385 0.692 1.385 0.692 {method'read' of'file' objects}
 4 0.125 0.031 0.125 0.031 {method'decode' of'str' objects}
 4 0.056 0.014 0.228 0.057 pickle.py:961(load_string)

In the child:

11996 function calls (11873 primitive calls) in 1.627 seconds
 
Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
 73 1.099 0.015 1.099 0.015 {repr}
 3 0.231 0.077 0.231 0.077 { method'read' of'file' objects}
 2 0.055 0.028 0.055 0.028 {method'write' of'file' objects}

So, I still have no real clue but what to use .

pickle / cPickle has some problems with serializing numpy arrays:

In [14]: timeit cPickle.dumps(numpy.random.random(1000))
1000 loops, best of 3: 727 us per loop

In [15]: timeit numpy.random.random(1000 ).dumps()
10000 loops, best of 3: 31.6 us per loop

The problem only occurs in serialization, deserialization is very good:

In [16]: timeit cPickle.loads(numpy.random.random(1000).dumps())
10000 loops, best of 3: 40 us per loop

You You can use the marshal module, and the witch is even faster (but not safe):

In [19]: timeit marshal.loads(marshal.dumps(numpy.random.random(1000 )))
10000 loops, best of 3: 29.8 us per loop

Well I recommend msgpack, but it does not support numpy, and there is a lib that has it very slow, anyway, python -msgpack does not support buffer and zerocopy function, so it is impossible to effectively support numpy.

I am using python subprocess for IPC. Now, let us assume I have to use subprocess.Popen to spawn other processes, so I can’t use multiprocessing.Pipe to communicate. First, what I thought of was to use their STDIO stream with pickle.load pickle.dump (don’t worry about security now). < p>

However, I noticed that the transfer rate is very bad: the order volume on my machine is 750KB/s! This is slower than communicating through multiprocessing. The pipeline uses a factor of 95, and as far as I know, it also uses pickle. There is no benefit to using cPickle.

(Update: note, I realize that this is just The situation on python2! It works fine on python3.)

Why is it so slow? I suspect that the reason is the way to perform IO through python file objects instead of C file descriptors in .dump/.load. Maybe it has something to do with GIL?

Is there any cross-platform way to get the same speed as multiprocessing. Pipeline?

Benchmark:

from __future__ import print_function
from timeit import default_timer
from subprocess import Popen, PIPE
import pickle
import sys
import os
import numpy
try:
 from _multiprocessing import Connection as _Connection
except ImportError:
 from multiprocessing.connection import Connection as _Connection

def main(args):
 if args:
 worker(connect(args [0], sys.stdin, sys.stdout))
 else:
 benchmark()

def worker(conn):
 while True:
 try:
 amount = conn.recv()
 except EOFError:
 break
 else:
 conn.send(numpy.random.random(amount))< br /> conn.close()

def benchmark():
 for amount in nump y.arange(11)*10000:
 pickle = parent('pickle', amount, 1)
 pipe = parent('pipe', amount, 1)
 print(pickle[0 ]/1000, pickle[1], pipe[1])

def parent(channel, amount, repeat):
 start = default_timer()
 proc = Popen([ sys.executable,'-u', __file__, channel],
 stdin=PIPE, stdout=PIPE)
 conn = connect(channel, proc.stdout, proc.stdin)
 for i in range(repeat):
 conn.send(amount)
 data = conn.recv()
 conn.close()
 end = default_timer()
 return data.nbytes, end-start

class PickleConnection(object):
 def __init__(self, recv, send):
 self._recv = recv
 self. _send = send
 def recv(self):
 return pickle.load(self._recv)
 def send(self, data):
 pickle.dump(data, self. _send)
 def close(self):
 self._recv.close()
 self._send.close()

class PipeConnection(object):
 def __init__(self, recv_fd, send_fd):
 self._recv = _Connection(recv_fd)
 self._send = _Connection(send_fd)
 def recv(self):
 return self._recv.recv()
 def send(self, data):
 self._send.send(data)
 def close(self) :
 self._recv.close()
 self._send.close()

def connect(channel, recv, send):
 recv_fd = os.dup (recv.fileno())
 send_fd = os.dup(send.fileno())
 recv.close()
 send.close()
 if channel == ' pipe':
 return PipeConnection(recv_fd, send_fd)
 elif channel =='pickle':
 return PickleConnection(os.fdopen(recv_fd,'rb', 0),
 os.fdopen(send_fd,'wb', 0))
 else:
 raise ValueError("Invalid channel: %s"% channel)

if __name__ =='__main__ ':
 main(sys.argv[1:])

Result:

Thank you very much for reading,

Thomas

Update:

Okay, so I analyzed it as suggested by @martineau. For a single run independent call with a fixed value of amount = 500000, the following results were obtained.

In the parent process, the popular calls sorted by tottime are:

11916 function calls (11825 primitive calls) in 5.382 seconds

 Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
 35 4.471 0.128 4.471 0.128 {method'readline' of'file' objects}
 52 0.693 0.013 0.693 0.013 {method'read' of'file' objects}
 4 0.062 0.016 0.063 0.016 {method'decode' of'str' objects}

In the sub-process:

11978 function calls (11855 primitive calls) in 5.298 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno (function)
 52 4.476 0.086 4.476 0.086 {method'write' of'file' objects}
 73 0.552 0.008 0.552 0.008 {repr}
 3 0.112 0.037 0.112 0.037 {method'read' of 'file' objects}

This worries me a lot. The use of read lines may be the cause of poor performance.

The following connections only use pickle.dumps / pickle.loads and write / read.

class DumpsConnection(object):
 def __init__(self, recv, send):
 self._recv = recv
 self._send = send
 def recv(self):
 raw_len = self._recvl(4)
 content_len = struct.unpack('>I', raw_len)[0]
 content = self._recvl(content_len) 
 return pickle.loads(content)
 def send(self, data):
 content = pickle.dumps(data)
 self._send.write(struct.pack(' >I', len(content)))
 self._send.write(content)
 def _recvl(self, size):
 data = b''
 while len( data)  packet = self._recv.read(size-len(data))
 if not packet:
 raise EOFError
 data += packet
 return data
 def close(self):
 self._recv.close()
 self._send.close()

Real In fact, its speed is only 14 times worse than the multiprocessing speed. Pipeline. (Which is still bad)

Now analyze, in the parents:

 11935 function calls (11844 primitive calls) in 1.749 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
 2 1.385 0.692 1.385 0.692 {method'read' of'file' objects}
 4 0.125 0.031 0.125 0.031 {method'decode' of'str' objects}
 4 0.056 0.014 0.228 0.057 pickle.py:961(load_string)

In the child:

11996 function calls (11873 primitive calls) in 1.627 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
 73 1.099 0.015 1.099 0.015 {repr}
 3 0.231 0.077 0.231 0.077 {method'read' of'file' objects}< br /> 2 0.055 0.028 0.055 0.028 {method'write' of'file' objects}

So, I still don’t have a real clue, but what to use.

Pickle/cPickle has some problems with serializing numpy arrays:

In [14]: timeit cPickle.dumps(numpy.random.random(1000))
1000 loops, best of 3: 727 us per loop

In [15]: timeit numpy.random.random(1000).dumps()
10000 loops, best of 3: 31.6 us per loop

The problem only occurs Serialization and deserialization are good:

In [16]: timeit cPickle.loads(numpy.random.random(1000).dumps())
 10000 loops, best of 3: 40 us per loop

You can use marshal module, witch is even faster (but not safe):

In [ 19]: timeit marshal.loads(marshal.dumps(numpy.random.random(1000)))
10000 loops, best of 3: 29.8 us per loop

Well, I recommend msgpack, But it does not have numpy support, and a lib that has it is very slow. Anyway, python-msgpack does not support buffers and does not have a zerocopy function, so it is impossible to effectively support numpy.

Performance – Why `pickle.dump” chickle.Load`ipc is so slow, is there a quick substitute?

Leave a Comment Cancel reply