Calgary Corpus
Table 1: Amount of memory, total encoding time (in seconds), and average compression ratio (in bpb), for several lengths of (|dictionary|, |LAB|) on the Calgary Corpus, using “best” compression. GZip “fast” obtains Time=0.5 and bpb=3.20 while GZip “best” does Time=1.2 and bpb=2.79. The best encoding time is underlined.
Figure 1: Time-memory trade-off between our SA Algorithm (A#) and Binary Tree (T#) for the Calgary Corpus, on the 8 encoding tests of Table 1. We include GZip for comparison: Gf is GZip “fast”, Gb is GZip “best”. Our encoders use less memory than the BT-encoder. For some compression settings, they are close to GZip encoding time.
Silesia Corpus Table 2: Amount of memory, total encoding time (in seconds) and average compression ratio (in bpb), for several lengths of (|dictionary|, |LAB|) on the Silesia Corpus, using “best” compression. GZip “fast” obtains Time=19.5 and bpb=3.32 while GZip “best” does Time=74.4 and bpb=2.98. The best encoding time is underlined.
Figure 2: Time-memory trade-off between our SA Algorithm (A#) and Binary Tree (T#) for the Silesia Corpus, on the 8 encoding tests of Table 1. We include GZip for comparison: Gf is GZip “fast”, Gb is GZip “best”.
Some Detailed Experiments of our SA encoder
Calgary Corpus
LZSS SA Algorithm on Calgary Corpus (Dict.,LAB)=(2048,512) --------------------------------------------------------------------------------------------------------------------- File Original Packed Time Bpb Check --------------------------------------------------------------------------------------------------------------------- bib 111274 77349 0.094 5.561 OK book1 768784 608507 0.687 6.332 OK book2 610866 430924 0.531 5.643 OK geo 102410 104504 0.093 8.164 OK news 3 77119 293007 0.328 6.216 OK obj1 21514 16373 0.031 6.088 OK obj2 246824 148807 0.172 4.823 OK paper1 53171 36477 0.047 5.488 OK paper2 82209 58301 0.078 5.673 OK paper3 46536 34554 0.047 5.940 OK paper4 13296 9770 0.015 5.878 OK paper5 11964 8953 0.015 5.987 OK paper6 38115 26353 0.031 5.531 OK pic 513226 82911 0.531 1.292 OK progc 39621 27418 0.032 5.536 OK progl 71656 35899 0.063 4.008 OK progp 49389 24578 0.031 3.981 OK trans 93705 53479 0.062 4.566 OK --------------------------------------------------------------------------------------------------------------------- Total Packed = 2078164 Total Time = 2.888 Average Time = 0.160 Average Bpb = 5.373 Total Mem. = 11270 bytes
LZSS SA Algorithm on Calgary Corpus (Dict.,LAB)=(8192,512) ------------------------------------------------------------------------------------------------------------ File Original Packed Time Bpb Check ------------------------------------------------------------------------------------------------------------ bib 111274 63735 0.078 4.582 OK book1 768784 528337 0.797 5.498 OK book2 610866 364087 0.578 4.768 OK geo 102410 102838 0.110 8.033 OK news 377119 259336 0.328 5.501 OK obj1 21514 16741 0.047 6.225 OK obj2 246824 140748 0.172 4.562 OK paper1 53171 31365 0.047 4.719 OK paper2 82209 50346 0.078 4.899 OK paper3 46536 30141 0.063 5.182 OK paper4 13296 9183 0.016 5.525 OK paper5 11964 8577 0.016 5.735 OK paper6 38115 22805 0.047 4.787 OK pic 513226 85474 1.407 1.332 OK progc 39621 23763 0.031 4.798 OK progl 71656 29159 0.063 3.255 OK progp 49389 19719 0.047 3.194 OK trans 93705 37327 0.062 3.187 OK ------------------------------------------------------------------------------------------------------------ Total Packed = 1823681 Total Time = 3.987 Average Time = 0.221 Average Bpb = 4.766 Total Mem. = 74762 bytes
Silesia Corpus LZSS SA Algorithm on Silesia Corpus (Dict.,LAB)=(2048,512) ----------------------------------------------------------------------------------------------------------- File Original Packed Time Bpb Check ----------------------------------------------------------------------------------------------------------- dickens 10192448 7627228 9.609 5.987 OK mozilla 51220482 30862871 40.703 4.820 OK mr 9970566 6443750 8.406 5.170 OK nci 33553447 9111824 21.718 2.172 OK ooffice 6152194 5036342 5.000 6.549 OK osdb 10085686 9455648 9.187 7.500 OK reymont 6627204 3780591 5.062 4.564 OK samba 21606402 11099679 15.219 4.110 OK sao 7251946 7104230 6.672 7.837 OK webster 41458705 24381498 33.672 4.705 OK xml 5345282 1665844 3.375 2.493 OK x-ray 8474242 9439387 7.859 8.911 OK ----------------------------------------------------------------------------------------------------------- Total Packed = 126008892 Total Time = 166.482 Average Time = 13.873 Average Bpb = 5.402 Total Mem. = 11270 bytes
LZSS SA Algorithm on Silesia Corpus (Dict.,LAB)=(16384,1024) ----------------------------------------------------------------------------------------------------------- File Original Packed Time Bpb Check ------------------------------------------------------------------------------------------------------------ dickens 10192448 6345041 12.734 4.980 OK mozilla 51220482 31213367 55.371 4.875 OK mr 9970566 6340350 12.439 5.087 OK nci 33553447 5347453 22.147 1.275 OK ooffice 6152194 5171200 5.735 6.724 OK osdb 10085686 6253754 6.689 4.960 OK reymont 6627204 3041540 6.017 3.672 OK samba 21606402 9649370 16.007 3.573 OK sao 7251946 7195505 5.878 7.938 OK webster 41458705 20431101 37.436 3.942 OK xml 5345282 1166151 2.714 1.745 OK x-ray 8474242 11548922 6.769 10.903 OK ------------------------------------------------------------------------------------------------------------ Total Packed = 113703754 Total Time = 189.936 Average Time = 15.828 Average Bpb = 4.973 Total Mem. = 153102 bytes
Canterbury Corpus
LZSS SA Algorithm on Canterbury Corpus (Dict.,LAB)=(8192,1024) ------------------------------------------------------------------------------------------------------------ File Original Packed Time Bpb Check ------------------------------------------------------------------------------------------------------------ alice29.txt 152091 96229 0.141 5.062 OK arj241a.exe 223858 269397 0.156 9.627 OK asyoulik.txt 125181 88615 0.109 5.663 OK bible.txt 4047394 2171012 3.297 4.291 OK cp.htm 24605 14223 0.016 4.624 OK E.coli 4638692 2112222 10.625 3.643 OK fields.c 11152 6393 0.000 4.586 OK grammar.lsp 3723 3046 0.000 6.545 OK kennedy.xls 1029746 492467 0.500 3.826 OK lcet10.txt 426756 258989 0.375 4.855 OK plrabn12.txt 481863 343675 0.469 5.706 OK ptt5 513218 91364 1.297 1.424 OK sum 38242 26082 0.047 5.456 OK world192.txt 2473402 1682806 1.922 5.443 OK xargs.1 4229 3345 0.000 6.328 OK ----------------------------------------------------------------------------------------------------------- Total Packed = 7659865 Total Time = 18.954 Average Time = 1.264 Average Bpb = 5.139 Total Mem. = 79374 bytes
Main Conclusions
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||