Performance – When I use a thread, why will my Perl script slowly decompress the file?

So I run perl 5.10 on core 2 duo macbook pro compiled with thread support: usethreads = define, useithreads = define. I have a simple script to read 4 gzip files , Each file contains 750,000 lines. I am using Compress::Zlib to decompress and read the file. I have 2 implementations and the only difference between them is to include the use of threads. Other than that, both scripts run The same subroutine to read. Therefore, in pseudo code, a non-threaded program performs this operation:

read_gzipped(file1);
read_gzipped (file2);
read_gzipped(file3);
read_gzipped(file4);

Thread versions are as follows:

my thr0 = threads->new(\$read_gzipped,'file1')
my thr1 = threads->new(\$read_gzipped,'file1')
my thr2 = threads->new(\$read_gzipped,' file1')
my thr3 = threads->new(\$read_gzipped,'file1')

thr0->join()
thr1->join()
thr2->join()
thr3->join()

Now, the threaded version runs almost 2 times faster than the non-threaded script. This is obviously not the result I want. Who can Explain what I am doing wrong here?

My guess is that the bottleneck of GZIP operation is disk access. If you have four threads in the platform Competing for disk access on the hard disk, then this will greatly slow down the speed. The disk head must be moved to different files in rapid succession. If you only process one file at a time, the head can stay near that file and the disk cache will be more accurate. < /div>

So I run Perl 5.10 on core 2 duo macbook pro compiled with thread support: usethreads = define, useithreads = define. I have a simple script to read 4 gzip files, Each file contains 750,000 lines. I am using Compress::Zlib to decompress and read the file. I have 2 implementations and the only difference between them is the use of threads. Other than that, both scripts run the same The subroutine to read. Therefore, in the pseudo code, the non-threaded program performs this operation:

read_gzipped(file1);
read_gzipped( file2);
read_gzipped(file3);
read_gzipped(file4);

Thread versions are as follows:

my thr0 = threads ->new(\$read_gzipped,'file1')
my thr1 = threads->new(\$read_gzipped,'file1')
my thr2 = threads->new(\$read_gzipped,'file1 ')
my thr3 = threads->new(\$read_gzipped,'file1')

thr0->join()
thr1->join()
thr2->join()
thr3->join()

Now, the threaded version runs almost 2 times faster than the non-threaded script. This is obviously not the result I want. Can anyone explain What am I doing wrong here?

My guess is that the bottleneck of GZIP operation is disk access. If you have four threads competing for disk access on the platter hard disk, this will slow down greatly Speed. The disk head must be moved to different files in rapid succession. If you only process one file at a time, the head can stay near that file and the disk cache will be more accurate.

Leave a Comment

Your email address will not be published.