Remove unneeded directory.
This commit is contained in:
parent
76ad4d0a6e
commit
8d712c8c3e
|
@ -1,167 +0,0 @@
|
|||
|
||||
|
||||
0.9.0
|
||||
~~~~~
|
||||
First version.
|
||||
|
||||
|
||||
0.9.0a
|
||||
~~~~~~
|
||||
Removed 'ranlib' from Makefile, since most modern Unix-es
|
||||
don't need it, or even know about it.
|
||||
|
||||
|
||||
0.9.0b
|
||||
~~~~~~
|
||||
Fixed a problem with error reporting in bzip2.c. This does not effect
|
||||
the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the
|
||||
program proper) compress and decompress correctly, but give misleading
|
||||
error messages (internal panics) when an I/O error occurs, instead of
|
||||
reporting the problem correctly. This shouldn't give any data loss
|
||||
(as far as I can see), but is confusing.
|
||||
|
||||
Made the inline declarations disappear for non-GCC compilers.
|
||||
|
||||
|
||||
0.9.0c
|
||||
~~~~~~
|
||||
Fixed some problems in the library pertaining to some boundary cases.
|
||||
This makes the library behave more correctly in those situations. The
|
||||
fixes apply only to features (calls and parameters) not used by
|
||||
bzip2.c, so the non-fixedness of them in previous versions has no
|
||||
effect on reliability of bzip2.c.
|
||||
|
||||
In bzlib.c:
|
||||
* made zero-length BZ_FLUSH work correctly in bzCompress().
|
||||
* fixed bzWrite/bzRead to ignore zero-length requests.
|
||||
* fixed bzread to correctly handle read requests after EOF.
|
||||
* wrong parameter order in call to bzDecompressInit in
|
||||
bzBuffToBuffDecompress. Fixed.
|
||||
|
||||
In compress.c:
|
||||
* changed setting of nGroups in sendMTFValues() so as to
|
||||
do a bit better on small files. This _does_ effect
|
||||
bzip2.c.
|
||||
|
||||
|
||||
0.9.5a
|
||||
~~~~~~
|
||||
Major change: add a fallback sorting algorithm (blocksort.c)
|
||||
to give reasonable behaviour even for very repetitive inputs.
|
||||
Nuked --repetitive-best and --repetitive-fast since they are
|
||||
no longer useful.
|
||||
|
||||
Minor changes: mostly a whole bunch of small changes/
|
||||
bugfixes in the driver (bzip2.c). Changes pertaining to the
|
||||
user interface are:
|
||||
|
||||
allow decompression of symlink'd files to stdout
|
||||
decompress/test files even without .bz2 extension
|
||||
give more accurate error messages for I/O errors
|
||||
when compressing/decompressing to stdout, don't catch control-C
|
||||
read flags from BZIP2 and BZIP environment variables
|
||||
decline to break hard links to a file unless forced with -f
|
||||
allow -c flag even with no filenames
|
||||
preserve file ownerships as far as possible
|
||||
make -s -1 give the expected block size (100k)
|
||||
add a flag -q --quiet to suppress nonessential warnings
|
||||
stop decoding flags after --, so files beginning in - can be handled
|
||||
resolved inconsistent naming: bzcat or bz2cat ?
|
||||
bzip2 --help now returns 0
|
||||
|
||||
Programming-level changes are:
|
||||
|
||||
fixed syntax error in GET_LL4 for Borland C++ 5.02
|
||||
let bzBuffToBuffDecompress return BZ_DATA_ERROR{_MAGIC}
|
||||
fix overshoot of mode-string end in bzopen_or_bzdopen
|
||||
wrapped bzlib.h in #ifdef __cplusplus ... extern "C" { ... }
|
||||
close file handles under all error conditions
|
||||
added minor mods so it compiles with DJGPP out of the box
|
||||
fixed Makefile so it doesn't give problems with BSD make
|
||||
fix uninitialised memory reads in dlltest.c
|
||||
|
||||
0.9.5b
|
||||
~~~~~~
|
||||
Open stdin/stdout in binary mode for DJGPP.
|
||||
|
||||
0.9.5c
|
||||
~~~~~~
|
||||
Changed BZ_N_OVERSHOOT to be ... + 2 instead of ... + 1. The + 1
|
||||
version could cause the sorted order to be wrong in some extremely
|
||||
obscure cases. Also changed setting of quadrant in blocksort.c.
|
||||
|
||||
0.9.5d
|
||||
~~~~~~
|
||||
The only functional change is to make bzlibVersion() in the library
|
||||
return the correct string. This has no effect whatsoever on the
|
||||
functioning of the bzip2 program or library. Added a couple of casts
|
||||
so the library compiles without warnings at level 3 in MS Visual
|
||||
Studio 6.0. Included a Y2K statement in the file Y2K_INFO. All other
|
||||
changes are minor documentation changes.
|
||||
|
||||
1.0
|
||||
~~~
|
||||
Several minor bugfixes and enhancements:
|
||||
|
||||
* Large file support. The library uses 64-bit counters to
|
||||
count the volume of data passing through it. bzip2.c
|
||||
is now compiled with -D_FILE_OFFSET_BITS=64 to get large
|
||||
file support from the C library. -v correctly prints out
|
||||
file sizes greater than 4 gigabytes. All these changes have
|
||||
been made without assuming a 64-bit platform or a C compiler
|
||||
which supports 64-bit ints, so, except for the C library
|
||||
aspect, they are fully portable.
|
||||
|
||||
* Decompression robustness. The library/program should be
|
||||
robust to any corruption of compressed data, detecting and
|
||||
handling _all_ corruption, instead of merely relying on
|
||||
the CRCs. What this means is that the program should
|
||||
never crash, given corrupted data, and the library should
|
||||
always return BZ_DATA_ERROR.
|
||||
|
||||
* Fixed an obscure race-condition bug only ever observed on
|
||||
Solaris, in which, if you were very unlucky and issued
|
||||
control-C at exactly the wrong time, both input and output
|
||||
files would be deleted.
|
||||
|
||||
* Don't run out of file handles on test/decompression when
|
||||
large numbers of files have invalid magic numbers.
|
||||
|
||||
* Avoid library namespace pollution. Prefix all exported
|
||||
symbols with BZ2_.
|
||||
|
||||
* Minor sorting enhancements from my DCC2000 paper.
|
||||
|
||||
* Advance the version number to 1.0, so as to counteract the
|
||||
(false-in-this-case) impression some people have that programs
|
||||
with version numbers less than 1.0 are in someway, experimental,
|
||||
pre-release versions.
|
||||
|
||||
* Create an initial Makefile-libbz2_so to build a shared library.
|
||||
Yes, I know I should really use libtool et al ...
|
||||
|
||||
* Make the program exit with 2 instead of 0 when decompression
|
||||
fails due to a bad magic number (ie, an invalid bzip2 header).
|
||||
Also exit with 1 (as the manual claims :-) whenever a diagnostic
|
||||
message would have been printed AND the corresponding operation
|
||||
is aborted, for example
|
||||
bzip2: Output file xx already exists.
|
||||
When a diagnostic message is printed but the operation is not
|
||||
aborted, for example
|
||||
bzip2: Can't guess original name for wurble -- using wurble.out
|
||||
then the exit value 0 is returned, unless some other problem is
|
||||
also detected.
|
||||
|
||||
I think it corresponds more closely to what the manual claims now.
|
||||
|
||||
|
||||
1.0.1
|
||||
~~~~~
|
||||
* Modified dlltest.c so it uses the new BZ2_ naming scheme.
|
||||
* Modified makefile-msc to fix minor build probs on Win2k.
|
||||
* Updated README.COMPILATION.PROBLEMS.
|
||||
|
||||
There are no functionality changes or bug fixes relative to version
|
||||
1.0.0. This is just a documentation update + a fix for minor Win32
|
||||
build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
|
||||
utterly pointless. Don't bother.
|
|
@ -1,15 +0,0 @@
|
|||
2002-07-06 Christopher Faylor <cgf@redhat.com>
|
||||
|
||||
* Makefile.in: Use MINGW stuff from Makefile.common.
|
||||
|
||||
Wed Apr 18 23:54:53 2001 Christopher Faylor <cgf@cygnus.com>
|
||||
|
||||
* Makefile.in: Add -U_WIN32 to CFLAGS compile line to avoid
|
||||
inappropriate Windows-isms.
|
||||
|
||||
Wed Apr 18 15:58:28 2001 Christopher Faylor <cgf@cygnus.com>
|
||||
|
||||
Initial checkin from net sources.
|
||||
Makefile.in: New file.
|
||||
configure.in: New file.
|
||||
configure: New file.
|
|
@ -1,39 +0,0 @@
|
|||
|
||||
This program, "bzip2" and associated library "libbzip2", are
|
||||
copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
|
@ -1,141 +0,0 @@
|
|||
|
||||
SHELL=/bin/sh
|
||||
CC=gcc
|
||||
BIGFILES=-D_FILE_OFFSET_BITS=64
|
||||
CFLAGS=-Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES)
|
||||
|
||||
OBJS= blocksort.o \
|
||||
huffman.o \
|
||||
crctable.o \
|
||||
randtable.o \
|
||||
compress.o \
|
||||
decompress.o \
|
||||
bzlib.o
|
||||
|
||||
all: libbz2.a bzip2 bzip2recover test
|
||||
|
||||
bzip2: libbz2.a bzip2.o
|
||||
$(CC) $(CFLAGS) -o bzip2 bzip2.o -L. -lbz2
|
||||
|
||||
bzip2recover: bzip2recover.o
|
||||
$(CC) $(CFLAGS) -o bzip2recover bzip2recover.o
|
||||
|
||||
libbz2.a: $(OBJS)
|
||||
rm -f libbz2.a
|
||||
ar cq libbz2.a $(OBJS)
|
||||
@if ( test -f /usr/bin/ranlib -o -f /bin/ranlib -o \
|
||||
-f /usr/ccs/bin/ranlib ) ; then \
|
||||
echo ranlib libbz2.a ; \
|
||||
ranlib libbz2.a ; \
|
||||
fi
|
||||
|
||||
test: bzip2
|
||||
@cat words1
|
||||
./bzip2 -1 < sample1.ref > sample1.rb2
|
||||
./bzip2 -2 < sample2.ref > sample2.rb2
|
||||
./bzip2 -3 < sample3.ref > sample3.rb2
|
||||
./bzip2 -d < sample1.bz2 > sample1.tst
|
||||
./bzip2 -d < sample2.bz2 > sample2.tst
|
||||
./bzip2 -ds < sample3.bz2 > sample3.tst
|
||||
cmp sample1.bz2 sample1.rb2
|
||||
cmp sample2.bz2 sample2.rb2
|
||||
cmp sample3.bz2 sample3.rb2
|
||||
cmp sample1.tst sample1.ref
|
||||
cmp sample2.tst sample2.ref
|
||||
cmp sample3.tst sample3.ref
|
||||
@cat words3
|
||||
|
||||
PREFIX=/usr
|
||||
|
||||
install: bzip2 bzip2recover
|
||||
if ( test ! -d $(PREFIX)/bin ) ; then mkdir $(PREFIX)/bin ; fi
|
||||
if ( test ! -d $(PREFIX)/lib ) ; then mkdir $(PREFIX)/lib ; fi
|
||||
if ( test ! -d $(PREFIX)/man ) ; then mkdir $(PREFIX)/man ; fi
|
||||
if ( test ! -d $(PREFIX)/man/man1 ) ; then mkdir $(PREFIX)/man/man1 ; fi
|
||||
if ( test ! -d $(PREFIX)/include ) ; then mkdir $(PREFIX)/include ; fi
|
||||
cp -f bzip2 $(PREFIX)/bin/bzip2
|
||||
cp -f bzip2 $(PREFIX)/bin/bunzip2
|
||||
cp -f bzip2 $(PREFIX)/bin/bzcat
|
||||
cp -f bzip2recover $(PREFIX)/bin/bzip2recover
|
||||
chmod a+x $(PREFIX)/bin/bzip2
|
||||
chmod a+x $(PREFIX)/bin/bunzip2
|
||||
chmod a+x $(PREFIX)/bin/bzcat
|
||||
chmod a+x $(PREFIX)/bin/bzip2recover
|
||||
cp -f bzip2.1 $(PREFIX)/man/man1
|
||||
chmod a+r $(PREFIX)/man/man1/bzip2.1
|
||||
cp -f bzlib.h $(PREFIX)/include
|
||||
chmod a+r $(PREFIX)/include/bzlib.h
|
||||
cp -f libbz2.a $(PREFIX)/lib
|
||||
chmod a+r $(PREFIX)/lib/libbz2.a
|
||||
|
||||
clean:
|
||||
rm -f *.o libbz2.a bzip2 bzip2recover \
|
||||
sample1.rb2 sample2.rb2 sample3.rb2 \
|
||||
sample1.tst sample2.tst sample3.tst
|
||||
|
||||
blocksort.o: blocksort.c
|
||||
@cat words0
|
||||
$(CC) $(CFLAGS) -c blocksort.c
|
||||
huffman.o: huffman.c
|
||||
$(CC) $(CFLAGS) -c huffman.c
|
||||
crctable.o: crctable.c
|
||||
$(CC) $(CFLAGS) -c crctable.c
|
||||
randtable.o: randtable.c
|
||||
$(CC) $(CFLAGS) -c randtable.c
|
||||
compress.o: compress.c
|
||||
$(CC) $(CFLAGS) -c compress.c
|
||||
decompress.o: decompress.c
|
||||
$(CC) $(CFLAGS) -c decompress.c
|
||||
bzlib.o: bzlib.c
|
||||
$(CC) $(CFLAGS) -c bzlib.c
|
||||
bzip2.o: bzip2.c
|
||||
$(CC) $(CFLAGS) -c bzip2.c
|
||||
bzip2recover.o: bzip2recover.c
|
||||
$(CC) $(CFLAGS) -c bzip2recover.c
|
||||
|
||||
DISTNAME=bzip2-1.0.1
|
||||
tarfile:
|
||||
rm -f $(DISTNAME)
|
||||
ln -sf . $(DISTNAME)
|
||||
tar cvf $(DISTNAME).tar \
|
||||
$(DISTNAME)/blocksort.c \
|
||||
$(DISTNAME)/huffman.c \
|
||||
$(DISTNAME)/crctable.c \
|
||||
$(DISTNAME)/randtable.c \
|
||||
$(DISTNAME)/compress.c \
|
||||
$(DISTNAME)/decompress.c \
|
||||
$(DISTNAME)/bzlib.c \
|
||||
$(DISTNAME)/bzip2.c \
|
||||
$(DISTNAME)/bzip2recover.c \
|
||||
$(DISTNAME)/bzlib.h \
|
||||
$(DISTNAME)/bzlib_private.h \
|
||||
$(DISTNAME)/Makefile \
|
||||
$(DISTNAME)/manual.texi \
|
||||
$(DISTNAME)/manual.ps \
|
||||
$(DISTNAME)/LICENSE \
|
||||
$(DISTNAME)/bzip2.1 \
|
||||
$(DISTNAME)/bzip2.1.preformatted \
|
||||
$(DISTNAME)/bzip2.txt \
|
||||
$(DISTNAME)/words0 \
|
||||
$(DISTNAME)/words1 \
|
||||
$(DISTNAME)/words2 \
|
||||
$(DISTNAME)/words3 \
|
||||
$(DISTNAME)/sample1.ref \
|
||||
$(DISTNAME)/sample2.ref \
|
||||
$(DISTNAME)/sample3.ref \
|
||||
$(DISTNAME)/sample1.bz2 \
|
||||
$(DISTNAME)/sample2.bz2 \
|
||||
$(DISTNAME)/sample3.bz2 \
|
||||
$(DISTNAME)/dlltest.c \
|
||||
$(DISTNAME)/*.html \
|
||||
$(DISTNAME)/README \
|
||||
$(DISTNAME)/README.COMPILATION.PROBLEMS \
|
||||
$(DISTNAME)/CHANGES \
|
||||
$(DISTNAME)/libbz2.def \
|
||||
$(DISTNAME)/libbz2.dsp \
|
||||
$(DISTNAME)/dlltest.dsp \
|
||||
$(DISTNAME)/makefile.msc \
|
||||
$(DISTNAME)/Y2K_INFO \
|
||||
$(DISTNAME)/unzcrash.c \
|
||||
$(DISTNAME)/spewG.c \
|
||||
$(DISTNAME)/Makefile-libbz2_so
|
|
@ -1,43 +0,0 @@
|
|||
|
||||
# This Makefile builds a shared version of the library,
|
||||
# libbz2.so.1.0.1, with soname libbz2.so.1.0,
|
||||
# at least on x86-Linux (RedHat 5.2),
|
||||
# with gcc-2.7.2.3. Please see the README file for some
|
||||
# important info about building the library like this.
|
||||
|
||||
SHELL=/bin/sh
|
||||
CC=gcc
|
||||
BIGFILES=-D_FILE_OFFSET_BITS=64
|
||||
CFLAGS=-fpic -fPIC -Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES)
|
||||
|
||||
OBJS= blocksort.o \
|
||||
huffman.o \
|
||||
crctable.o \
|
||||
randtable.o \
|
||||
compress.o \
|
||||
decompress.o \
|
||||
bzlib.o
|
||||
|
||||
all: $(OBJS)
|
||||
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
|
||||
$(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.1
|
||||
rm -f libbz2.so.1.0
|
||||
ln -s libbz2.so.1.0.1 libbz2.so.1.0
|
||||
|
||||
clean:
|
||||
rm -f $(OBJS) bzip2.o libbz2.so.1.0.1 libbz2.so.1.0 bzip2-shared
|
||||
|
||||
blocksort.o: blocksort.c
|
||||
$(CC) $(CFLAGS) -c blocksort.c
|
||||
huffman.o: huffman.c
|
||||
$(CC) $(CFLAGS) -c huffman.c
|
||||
crctable.o: crctable.c
|
||||
$(CC) $(CFLAGS) -c crctable.c
|
||||
randtable.o: randtable.c
|
||||
$(CC) $(CFLAGS) -c randtable.c
|
||||
compress.o: compress.c
|
||||
$(CC) $(CFLAGS) -c compress.c
|
||||
decompress.o: decompress.c
|
||||
$(CC) $(CFLAGS) -c decompress.c
|
||||
bzlib.o: bzlib.c
|
||||
$(CC) $(CFLAGS) -c bzlib.c
|
|
@ -1,107 +0,0 @@
|
|||
# Copyright (c) 2000, Red Hat, Inc.
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation; either version 2 of the License, or
|
||||
# (at your option) any later version.
|
||||
#
|
||||
# A copy of the GNU General Public License can be found at
|
||||
# http://www.gnu.org/
|
||||
#
|
||||
# Written by Christopher Faylor <cgf@redhat.com>
|
||||
#
|
||||
# Makefile for Cygwin installer
|
||||
|
||||
SHELL :=@SHELL@
|
||||
|
||||
srcdir:=@srcdir@
|
||||
VPATH:=@srcdir@
|
||||
prefix:=@prefix@
|
||||
exec_prefix:=@exec_prefix@
|
||||
|
||||
bindir:=@bindir@
|
||||
etcdir:=$(exec_prefix)/etc
|
||||
|
||||
program_transform_name :=@program_transform_name@
|
||||
|
||||
INSTALL:=@INSTALL@
|
||||
INSTALL_PROGRAM:=@INSTALL_PROGRAM@
|
||||
INSTALL_DATA:=@INSTALL_DATA@
|
||||
|
||||
EXEEXT:=@EXEEXT@
|
||||
EXEEXT_FOR_BUILD :=@EXEEXT_FOR_BUILD@
|
||||
|
||||
CC:=@CC@
|
||||
CC_FOR_TARGET:=$(CC)
|
||||
CXX:=@CXX@
|
||||
|
||||
CFLAGS:=@CFLAGS@ -nostdinc
|
||||
CXXFLAGS:=@CXXFLAGS@ -fno-exceptions -fno-rtti
|
||||
CXX:=@CXX@
|
||||
|
||||
WINDRES:=@WINDRES@
|
||||
OBJCOPY:=@OBJCOPY@
|
||||
AR:=@AR@
|
||||
RANLIB:=@RANLIB@
|
||||
|
||||
include $(srcdir)/../Makefile.common
|
||||
|
||||
|
||||
ZLIB:=zlib/libzcygw.a
|
||||
BZ2LIB:=$(bupdir)/bz2lib/libbz2.a
|
||||
libmingw32.a:=$(mingw_build)/libmingw32.a
|
||||
libuser32:=$(w32api_lib)/libuser32.a
|
||||
libkernel32:=$(w32api_lib)/libkernel32.a
|
||||
|
||||
ALL_DEP_LDLIBS:=$(ZLIB) $(BZ2LIB) $(w32api_lib)/libole32.a $(w32api_lib)/libwsock32.a \
|
||||
$(w32api_lib)/libnetapi32.a $(w32api_lib)/libadvapi32.a \
|
||||
$(w32api_lib)/libuuid.a $(libkernel32) $(w32api_lib)/libuser32.a \
|
||||
$(libmingw32)
|
||||
|
||||
ALL_LDLIBS:=${patsubst $(mingw_build)/lib%.a,-l%,\
|
||||
${patsubst $(w32api_lib)/lib%.a,-l%,\
|
||||
${filter-out $(libmingw32),\
|
||||
${filter-out $(libuser32),\
|
||||
${filter-out $(libkernel32), $(ALL_DEP_LDLIBS)}}}}}
|
||||
|
||||
ALL_LDFLAGS:=${filter-out -I%, \
|
||||
${filter-out -W%, \
|
||||
-B$(w32api_lib)/ -B${mingw_build}/ $(MINGW_CFLAGS) $(LDFLAGS)}}
|
||||
|
||||
LIBS:=libbz2.a
|
||||
|
||||
OBJS:=blocksort.o bzlib.o compress.o decompress.o huffman.o crctable.o randtable.o
|
||||
|
||||
.SUFFIXES:
|
||||
.NOEXPORT:
|
||||
|
||||
.PHONY: all install clean realclean
|
||||
|
||||
all: Makefile $(LIBS)
|
||||
|
||||
$(LIBS): $(OBJS)
|
||||
$(AR) cru $@ $?
|
||||
$(RANLIB) $@
|
||||
|
||||
clean:
|
||||
rm -f *.o *.rc *.a
|
||||
|
||||
realclean: clean
|
||||
rm -f Makefile config.cache
|
||||
|
||||
install: all
|
||||
: Nothing to install
|
||||
|
||||
%.o: %.c
|
||||
ifdef VERBOSE
|
||||
$(CC) $c -o $(@D)/$(basename $@)$o -MD $(MINGW_CFLAGS) $<
|
||||
else
|
||||
@echo $(CC) $c -o $(@D)/$(basename $@)$o $(MINGW_CFLAGS) ... $^;\
|
||||
$(CC) $c -o $(@D)/$(basename $@)$o -MD $(MINGW_CFLAGS) $<
|
||||
endif
|
||||
|
||||
|
||||
D=$(wildcard *.d)
|
||||
ifneq ($D,)
|
||||
include $D
|
||||
endif
|
|
@ -1,166 +0,0 @@
|
|||
|
||||
This is the README for bzip2, a block-sorting file compressor, version
|
||||
1.0. This version is fully compatible with the previous public
|
||||
releases, bzip2-0.1pl2, bzip2-0.9.0 and bzip2-0.9.5.
|
||||
|
||||
bzip2-1.0 is distributed under a BSD-style license. For details,
|
||||
see the file LICENSE.
|
||||
|
||||
Complete documentation is available in Postscript form (manual.ps) or
|
||||
html (manual_toc.html). A plain-text version of the manual page is
|
||||
available as bzip2.txt. A statement about Y2K issues is now included
|
||||
in the file Y2K_INFO.
|
||||
|
||||
|
||||
HOW TO BUILD -- UNIX
|
||||
|
||||
Type `make'. This builds the library libbz2.a and then the
|
||||
programs bzip2 and bzip2recover. Six self-tests are run.
|
||||
If the self-tests complete ok, carry on to installation:
|
||||
|
||||
To install in /usr/bin, /usr/lib, /usr/man and /usr/include, type
|
||||
make install
|
||||
To install somewhere else, eg, /xxx/yyy/{bin,lib,man,include}, type
|
||||
make install PREFIX=/xxx/yyy
|
||||
If you are (justifiably) paranoid and want to see what 'make install'
|
||||
is going to do, you can first do
|
||||
make -n install or
|
||||
make -n install PREFIX=/xxx/yyy respectively.
|
||||
The -n instructs make to show the commands it would execute, but
|
||||
not actually execute them.
|
||||
|
||||
|
||||
HOW TO BUILD -- UNIX, shared library libbz2.so.
|
||||
|
||||
Do 'make -f Makefile-libbz2_so'. This Makefile seems to work for
|
||||
Linux-ELF (RedHat 5.2 on an x86 box), with gcc. I make no claims
|
||||
that it works for any other platform, though I suspect it probably
|
||||
will work for most platforms employing both ELF and gcc.
|
||||
|
||||
bzip2-shared, a client of the shared library, is also build, but
|
||||
not self-tested. So I suggest you also build using the normal
|
||||
Makefile, since that conducts a self-test.
|
||||
|
||||
Important note for people upgrading .so's from 0.9.0/0.9.5 to
|
||||
version 1.0. All the functions in the library have been renamed,
|
||||
from (eg) bzCompress to BZ2_bzCompress, to avoid namespace pollution.
|
||||
Unfortunately this means that the libbz2.so created by
|
||||
Makefile-libbz2_so will not work with any program which used an
|
||||
older version of the library. Sorry. I do encourage library
|
||||
clients to make the effort to upgrade to use version 1.0, since
|
||||
it is both faster and more robust than previous versions.
|
||||
|
||||
|
||||
HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc.
|
||||
|
||||
It's difficult for me to support compilation on all these platforms.
|
||||
My approach is to collect binaries for these platforms, and put them
|
||||
on the master web page (http://sourceware.cygnus.com/bzip2). Look
|
||||
there. However (FWIW), bzip2-1.0 is very standard ANSI C and should
|
||||
compile unmodified with MS Visual C. For Win32, there is one
|
||||
important caveat: in bzip2.c, you must set BZ_UNIX to 0 and
|
||||
BZ_LCCWIN32 to 1 before building. If you have difficulties building,
|
||||
you might want to read README.COMPILATION.PROBLEMS.
|
||||
|
||||
|
||||
VALIDATION
|
||||
|
||||
Correct operation, in the sense that a compressed file can always be
|
||||
decompressed to reproduce the original, is obviously of paramount
|
||||
importance. To validate bzip2, I used a modified version of Mark
|
||||
Nelson's churn program. Churn is an automated test driver which
|
||||
recursively traverses a directory structure, using bzip2 to compress
|
||||
and then decompress each file it encounters, and checking that the
|
||||
decompressed data is the same as the original. There are more details
|
||||
in Section 4 of the user guide.
|
||||
|
||||
|
||||
|
||||
Please read and be aware of the following:
|
||||
|
||||
WARNING:
|
||||
|
||||
This program (attempts to) compress data by performing several
|
||||
non-trivial transformations on it. Unless you are 100% familiar
|
||||
with *all* the algorithms contained herein, and with the
|
||||
consequences of modifying them, you should NOT meddle with the
|
||||
compression or decompression machinery. Incorrect changes can and
|
||||
very likely *will* lead to disastrous loss of data.
|
||||
|
||||
|
||||
DISCLAIMER:
|
||||
|
||||
I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE
|
||||
USE OF THIS PROGRAM, HOWSOEVER CAUSED.
|
||||
|
||||
Every compression of a file implies an assumption that the
|
||||
compressed file can be decompressed to reproduce the original.
|
||||
Great efforts in design, coding and testing have been made to
|
||||
ensure that this program works correctly. However, the complexity
|
||||
of the algorithms, and, in particular, the presence of various
|
||||
special cases in the code which occur with very low but non-zero
|
||||
probability make it impossible to rule out the possibility of bugs
|
||||
remaining in the program. DO NOT COMPRESS ANY DATA WITH THIS
|
||||
PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER
|
||||
SMALL, THAT THE DATA WILL NOT BE RECOVERABLE.
|
||||
|
||||
That is not to say this program is inherently unreliable. Indeed,
|
||||
I very much hope the opposite is true. bzip2 has been carefully
|
||||
constructed and extensively tested.
|
||||
|
||||
|
||||
PATENTS:
|
||||
|
||||
To the best of my knowledge, bzip2 does not use any patented
|
||||
algorithms. However, I do not have the resources available to
|
||||
carry out a full patent search. Therefore I cannot give any
|
||||
guarantee of the above statement.
|
||||
|
||||
End of legalities.
|
||||
|
||||
|
||||
WHAT'S NEW IN 0.9.0 (as compared to 0.1pl2) ?
|
||||
|
||||
* Approx 10% faster compression, 30% faster decompression
|
||||
* -t (test mode) is a lot quicker
|
||||
* Can decompress concatenated compressed files
|
||||
* Programming interface, so programs can directly read/write .bz2 files
|
||||
* Less restrictive (BSD-style) licensing
|
||||
* Flag handling more compatible with GNU gzip
|
||||
* Much more documentation, i.e., a proper user manual
|
||||
* Hopefully, improved portability (at least of the library)
|
||||
|
||||
WHAT'S NEW IN 0.9.5 ?
|
||||
|
||||
* Compression speed is much less sensitive to the input
|
||||
data than in previous versions. Specifically, the very
|
||||
slow performance caused by repetitive data is fixed.
|
||||
* Many small improvements in file and flag handling.
|
||||
* A Y2K statement.
|
||||
|
||||
WHAT'S NEW IN 1.0
|
||||
|
||||
See the CHANGES file.
|
||||
|
||||
I hope you find bzip2 useful. Feel free to contact me at
|
||||
jseward@acm.org
|
||||
if you have any suggestions or queries. Many people mailed me with
|
||||
comments, suggestions and patches after the releases of bzip-0.15,
|
||||
bzip-0.21, bzip2-0.1pl2 and bzip2-0.9.0, and the changes in bzip2 are
|
||||
largely a result of this feedback. I thank you for your comments.
|
||||
|
||||
At least for the time being, bzip2's "home" is (or can be reached via)
|
||||
http://www.muraroa.demon.co.uk.
|
||||
|
||||
Julian Seward
|
||||
jseward@acm.org
|
||||
|
||||
Cambridge, UK
|
||||
18 July 1996 (version 0.15)
|
||||
25 August 1996 (version 0.21)
|
||||
7 August 1997 (bzip2, version 0.1)
|
||||
29 August 1997 (bzip2, version 0.1pl2)
|
||||
23 August 1998 (bzip2, version 0.9.0)
|
||||
8 June 1999 (bzip2, version 0.9.5)
|
||||
4 Sept 1999 (bzip2, version 0.9.5d)
|
||||
5 May 2000 (bzip2, version 1.0pre8)
|
|
@ -1,130 +0,0 @@
|
|||
|
||||
bzip2-1.0 should compile without problems on the vast majority of
|
||||
platforms. Using the supplied Makefile, I've built and tested it
|
||||
myself for x86-linux, sparc-solaris, alpha-linux, x86-cygwin32 and
|
||||
alpha-tru64unix. With makefile.msc, Visual C++ 6.0 and nmake, you can
|
||||
build a native Win32 version too. Large file support seems to work
|
||||
correctly on at least alpha-tru64unix and x86-cygwin32 (on Windows
|
||||
2000).
|
||||
|
||||
When I say "large file" I mean a file of size 2,147,483,648 (2^31)
|
||||
bytes or above. Many older OSs can't handle files above this size,
|
||||
but many newer ones can. Large files are pretty huge -- most files
|
||||
you'll encounter are not Large Files.
|
||||
|
||||
Earlier versions of bzip2 (0.1, 0.9.0, 0.9.5) compiled on a wide
|
||||
variety of platforms without difficulty, and I hope this version will
|
||||
continue in that tradition. However, in order to support large files,
|
||||
I've had to include the define -D_FILE_OFFSET_BITS=64 in the Makefile.
|
||||
This can cause problems.
|
||||
|
||||
The technique of adding -D_FILE_OFFSET_BITS=64 to get large file
|
||||
support is, as far as I know, the Recommended Way to get correct large
|
||||
file support. For more details, see the Large File Support
|
||||
Specification, published by the Large File Summit, at
|
||||
http://www.sas.com/standard/large.file/
|
||||
|
||||
As a general comment, if you get compilation errors which you think
|
||||
are related to large file support, try removing the above define from
|
||||
the Makefile, ie, delete the line
|
||||
BIGFILES=-D_FILE_OFFSET_BITS=64
|
||||
from the Makefile, and do 'make clean ; make'. This will give you a
|
||||
version of bzip2 without large file support, which, for most
|
||||
applications, is probably not a problem.
|
||||
|
||||
Alternatively, try some of the platform-specific hints listed below.
|
||||
|
||||
You can use the spewG.c program to generate huge files to test bzip2's
|
||||
large file support, if you are feeling paranoid. Be aware though that
|
||||
any compilation problems which affect bzip2 will also affect spewG.c,
|
||||
alas.
|
||||
|
||||
|
||||
Known problems as of 1.0pre8:
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* HP/UX 10.20 and 11.00, using gcc (2.7.2.3 and 2.95.2): A large
|
||||
number of warnings appear, including the following:
|
||||
|
||||
/usr/include/sys/resource.h: In function `getrlimit':
|
||||
/usr/include/sys/resource.h:168:
|
||||
warning: implicit declaration of function `__getrlimit64'
|
||||
/usr/include/sys/resource.h: In function `setrlimit':
|
||||
/usr/include/sys/resource.h:170:
|
||||
warning: implicit declaration of function `__setrlimit64'
|
||||
|
||||
This would appear to be a problem with large file support, header
|
||||
files and gcc. gcc may or may not give up at this point. If it
|
||||
fails, you might be able to improve matters by adding
|
||||
-D__STDC_EXT__=1
|
||||
to the BIGFILES variable in the Makefile (ie, change its definition
|
||||
to
|
||||
BIGFILES=-D_FILE_OFFSET_BITS=64 -D__STDC_EXT__=1
|
||||
|
||||
Even if gcc does produce a binary which appears to work (ie passes
|
||||
its self-tests), you might want to test it to see if it works properly
|
||||
on large files.
|
||||
|
||||
|
||||
* HP/UX 10.20 and 11.00, using HP's cc compiler.
|
||||
|
||||
No specific problems for this combination, except that you'll need to
|
||||
specify the -Ae flag, and zap the gcc-specific stuff
|
||||
-Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce.
|
||||
You should retain -D_FILE_OFFSET_BITS=64 in order to get large
|
||||
file support -- which is reported to work ok for this HP/UX + cc
|
||||
combination.
|
||||
|
||||
|
||||
* SunOS 4.1.X.
|
||||
|
||||
Amazingly, there are still people out there using this venerable old
|
||||
banger. I shouldn't be too rude -- I started life on SunOS, and
|
||||
it was a pretty darn good OS, way back then. Anyway:
|
||||
|
||||
SunOS doesn't seem to have strerror(), so you'll have to use
|
||||
perror(), perhaps by doing adding this (warning: UNTESTED CODE):
|
||||
|
||||
char* strerror ( int errnum )
|
||||
{
|
||||
if (errnum < 0 || errnum >= sys_nerr)
|
||||
return "Unknown error";
|
||||
else
|
||||
return sys_errlist[errnum];
|
||||
}
|
||||
|
||||
Or you could comment out the relevant calls to strerror; they're
|
||||
not mission-critical. Or you could upgrade to Solaris. Ha ha ha!
|
||||
(what?? you think I've got Bad Attitude?)
|
||||
|
||||
|
||||
* Making a shared library on Solaris. (Not really a compilation
|
||||
problem, but many people ask ...)
|
||||
|
||||
Firstly, if you have Solaris 8, either you have libbz2.so already
|
||||
on your system, or you can install it from the Solaris CD.
|
||||
|
||||
Secondly, be aware that there are potential naming conflicts
|
||||
between the .so file supplied with Solaris 8, and the .so file
|
||||
which Makefile-libbz2_so will make. Makefile-libbz2_so creates
|
||||
a .so which has the names which I intend to be "official" as
|
||||
of version 1.0.0 and onwards. Unfortunately, the .so in
|
||||
Solaris 8 appeared before I decided on the final names, so
|
||||
the two libraries are incompatible. We have since communicated
|
||||
and I hope that the problems will have been solved in the next
|
||||
version of Solaris, whenever that might appear.
|
||||
|
||||
All that said: you might be able to get somewhere
|
||||
by finding the line in Makefile-libbz2_so which says
|
||||
|
||||
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
|
||||
|
||||
and replacing with
|
||||
|
||||
($CC) -G -shared -o libbz2.so.1.0.1 -h libbz2.so.1.0 $(OBJS)
|
||||
|
||||
If gcc objects to the combination -fpic -fPIC, get rid of
|
||||
the second one, leaving just "-fpic".
|
||||
|
||||
|
||||
That's the end of the currently known compilation problems.
|
|
@ -1,34 +0,0 @@
|
|||
|
||||
Y2K status of bzip2 and libbzip2, versions 0.1, 0.9.0 and 0.9.5
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Informally speaking:
|
||||
bzip2 is a compression program built on top of libbzip2,
|
||||
a library which does the real work of compression and
|
||||
decompression. As far as I am aware, libbzip2 does not have
|
||||
any date-related code at all.
|
||||
|
||||
bzip2 itself copies dates from source to destination files
|
||||
when compressing or decompressing, using the 'stat' and 'utime'
|
||||
UNIX system calls. It doesn't examine, manipulate or store the
|
||||
dates in any way. So as far as I can see, there shouldn't be any
|
||||
problem with bzip2 providing 'stat' and 'utime' work correctly
|
||||
on your system.
|
||||
|
||||
On non-unix platforms (those for which BZ_UNIX in bzip2.c is
|
||||
not set to 1), bzip2 doesn't even do the date copying.
|
||||
|
||||
Overall, informally speaking, I don't think bzip2 or libbzip2
|
||||
have a Y2K problem.
|
||||
|
||||
Formally speaking:
|
||||
I am not prepared to offer you any assurance whatsoever
|
||||
regarding Y2K issues in my software. You alone assume the
|
||||
entire risk of using the software. The disclaimer of liability
|
||||
in the LICENSE file in the bzip2 source distribution continues
|
||||
to apply on this issue as with every other issue pertaining
|
||||
to the software.
|
||||
|
||||
Julian Seward
|
||||
Cambridge, UK
|
||||
25 August 1999
|
|
@ -1,137 +0,0 @@
|
|||
dnl aclocal.m4 generated automatically by aclocal 1.4
|
||||
|
||||
dnl Copyright (C) 1994, 1995-8, 1999 Free Software Foundation, Inc.
|
||||
dnl This file is free software; the Free Software Foundation
|
||||
dnl gives unlimited permission to copy and/or distribute it,
|
||||
dnl with or without modifications, as long as this notice is preserved.
|
||||
|
||||
dnl This program is distributed in the hope that it will be useful,
|
||||
dnl but WITHOUT ANY WARRANTY, to the extent permitted by law; without
|
||||
dnl even the implied warranty of MERCHANTABILITY or FITNESS FOR A
|
||||
dnl PARTICULAR PURPOSE.
|
||||
|
||||
# Do all the work for Automake. This macro actually does too much --
|
||||
# some checks are only needed if your package does certain things.
|
||||
# But this isn't really a big deal.
|
||||
|
||||
# serial 1
|
||||
|
||||
dnl Usage:
|
||||
dnl AM_INIT_AUTOMAKE(package,version, [no-define])
|
||||
|
||||
AC_DEFUN(AM_INIT_AUTOMAKE,
|
||||
[AC_REQUIRE([AC_PROG_INSTALL])
|
||||
PACKAGE=[$1]
|
||||
AC_SUBST(PACKAGE)
|
||||
VERSION=[$2]
|
||||
AC_SUBST(VERSION)
|
||||
dnl test to see if srcdir already configured
|
||||
if test "`cd $srcdir && pwd`" != "`pwd`" && test -f $srcdir/config.status; then
|
||||
AC_MSG_ERROR([source directory already configured; run "make distclean" there first])
|
||||
fi
|
||||
ifelse([$3],,
|
||||
AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE", [Name of package])
|
||||
AC_DEFINE_UNQUOTED(VERSION, "$VERSION", [Version number of package]))
|
||||
AC_REQUIRE([AM_SANITY_CHECK])
|
||||
AC_REQUIRE([AC_ARG_PROGRAM])
|
||||
dnl FIXME This is truly gross.
|
||||
missing_dir=`cd $ac_aux_dir && pwd`
|
||||
AM_MISSING_PROG(ACLOCAL, aclocal, $missing_dir)
|
||||
AM_MISSING_PROG(AUTOCONF, autoconf, $missing_dir)
|
||||
AM_MISSING_PROG(AUTOMAKE, automake, $missing_dir)
|
||||
AM_MISSING_PROG(AUTOHEADER, autoheader, $missing_dir)
|
||||
AM_MISSING_PROG(MAKEINFO, makeinfo, $missing_dir)
|
||||
AC_REQUIRE([AC_PROG_MAKE_SET])])
|
||||
|
||||
#
|
||||
# Check to make sure that the build environment is sane.
|
||||
#
|
||||
|
||||
AC_DEFUN(AM_SANITY_CHECK,
|
||||
[AC_MSG_CHECKING([whether build environment is sane])
|
||||
# Just in case
|
||||
sleep 1
|
||||
echo timestamp > conftestfile
|
||||
# Do `set' in a subshell so we don't clobber the current shell's
|
||||
# arguments. Must try -L first in case configure is actually a
|
||||
# symlink; some systems play weird games with the mod time of symlinks
|
||||
# (eg FreeBSD returns the mod time of the symlink's containing
|
||||
# directory).
|
||||
if (
|
||||
set X `ls -Lt $srcdir/configure conftestfile 2> /dev/null`
|
||||
if test "[$]*" = "X"; then
|
||||
# -L didn't work.
|
||||
set X `ls -t $srcdir/configure conftestfile`
|
||||
fi
|
||||
if test "[$]*" != "X $srcdir/configure conftestfile" \
|
||||
&& test "[$]*" != "X conftestfile $srcdir/configure"; then
|
||||
|
||||
# If neither matched, then we have a broken ls. This can happen
|
||||
# if, for instance, CONFIG_SHELL is bash and it inherits a
|
||||
# broken ls alias from the environment. This has actually
|
||||
# happened. Such a system could not be considered "sane".
|
||||
AC_MSG_ERROR([ls -t appears to fail. Make sure there is not a broken
|
||||
alias in your environment])
|
||||
fi
|
||||
|
||||
test "[$]2" = conftestfile
|
||||
)
|
||||
then
|
||||
# Ok.
|
||||
:
|
||||
else
|
||||
AC_MSG_ERROR([newly created file is older than distributed files!
|
||||
Check your system clock])
|
||||
fi
|
||||
rm -f conftest*
|
||||
AC_MSG_RESULT(yes)])
|
||||
|
||||
dnl AM_MISSING_PROG(NAME, PROGRAM, DIRECTORY)
|
||||
dnl The program must properly implement --version.
|
||||
AC_DEFUN(AM_MISSING_PROG,
|
||||
[AC_MSG_CHECKING(for working $2)
|
||||
# Run test in a subshell; some versions of sh will print an error if
|
||||
# an executable is not found, even if stderr is redirected.
|
||||
# Redirect stdin to placate older versions of autoconf. Sigh.
|
||||
if ($2 --version) < /dev/null > /dev/null 2>&1; then
|
||||
$1=$2
|
||||
AC_MSG_RESULT(found)
|
||||
else
|
||||
$1="$3/missing $2"
|
||||
AC_MSG_RESULT(missing)
|
||||
fi
|
||||
AC_SUBST($1)])
|
||||
|
||||
# Add --enable-maintainer-mode option to configure.
|
||||
# From Jim Meyering
|
||||
|
||||
# serial 1
|
||||
|
||||
AC_DEFUN(AM_MAINTAINER_MODE,
|
||||
[AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
|
||||
dnl maintainer-mode is disabled by default
|
||||
AC_ARG_ENABLE(maintainer-mode,
|
||||
[ --enable-maintainer-mode enable make rules and dependencies not useful
|
||||
(and sometimes confusing) to the casual installer],
|
||||
USE_MAINTAINER_MODE=$enableval,
|
||||
USE_MAINTAINER_MODE=no)
|
||||
AC_MSG_RESULT($USE_MAINTAINER_MODE)
|
||||
AM_CONDITIONAL(MAINTAINER_MODE, test $USE_MAINTAINER_MODE = yes)
|
||||
MAINT=$MAINTAINER_MODE_TRUE
|
||||
AC_SUBST(MAINT)dnl
|
||||
]
|
||||
)
|
||||
|
||||
# Define a conditional.
|
||||
|
||||
AC_DEFUN(AM_CONDITIONAL,
|
||||
[AC_SUBST($1_TRUE)
|
||||
AC_SUBST($1_FALSE)
|
||||
if $2; then
|
||||
$1_TRUE=
|
||||
$1_FALSE='#'
|
||||
else
|
||||
$1_TRUE='#'
|
||||
$1_FALSE=
|
||||
fi])
|
||||
|
File diff suppressed because it is too large
Load Diff
|
@ -1,439 +0,0 @@
|
|||
.PU
|
||||
.TH bzip2 1
|
||||
.SH NAME
|
||||
bzip2, bunzip2 \- a block-sorting file compressor, v1.0
|
||||
.br
|
||||
bzcat \- decompresses files to stdout
|
||||
.br
|
||||
bzip2recover \- recovers data from damaged bzip2 files
|
||||
|
||||
.SH SYNOPSIS
|
||||
.ll +8
|
||||
.B bzip2
|
||||
.RB [ " \-cdfkqstvzVL123456789 " ]
|
||||
[
|
||||
.I "filenames \&..."
|
||||
]
|
||||
.ll -8
|
||||
.br
|
||||
.B bunzip2
|
||||
.RB [ " \-fkvsVL " ]
|
||||
[
|
||||
.I "filenames \&..."
|
||||
]
|
||||
.br
|
||||
.B bzcat
|
||||
.RB [ " \-s " ]
|
||||
[
|
||||
.I "filenames \&..."
|
||||
]
|
||||
.br
|
||||
.B bzip2recover
|
||||
.I "filename"
|
||||
|
||||
.SH DESCRIPTION
|
||||
.I bzip2
|
||||
compresses files using the Burrows-Wheeler block sorting
|
||||
text compression algorithm, and Huffman coding. Compression is
|
||||
generally considerably better than that achieved by more conventional
|
||||
LZ77/LZ78-based compressors, and approaches the performance of the PPM
|
||||
family of statistical compressors.
|
||||
|
||||
The command-line options are deliberately very similar to
|
||||
those of
|
||||
.I GNU gzip,
|
||||
but they are not identical.
|
||||
|
||||
.I bzip2
|
||||
expects a list of file names to accompany the
|
||||
command-line flags. Each file is replaced by a compressed version of
|
||||
itself, with the name "original_name.bz2".
|
||||
Each compressed file
|
||||
has the same modification date, permissions, and, when possible,
|
||||
ownership as the corresponding original, so that these properties can
|
||||
be correctly restored at decompression time. File name handling is
|
||||
naive in the sense that there is no mechanism for preserving original
|
||||
file names, permissions, ownerships or dates in filesystems which lack
|
||||
these concepts, or have serious file name length restrictions, such as
|
||||
MS-DOS.
|
||||
|
||||
.I bzip2
|
||||
and
|
||||
.I bunzip2
|
||||
will by default not overwrite existing
|
||||
files. If you want this to happen, specify the \-f flag.
|
||||
|
||||
If no file names are specified,
|
||||
.I bzip2
|
||||
compresses from standard
|
||||
input to standard output. In this case,
|
||||
.I bzip2
|
||||
will decline to
|
||||
write compressed output to a terminal, as this would be entirely
|
||||
incomprehensible and therefore pointless.
|
||||
|
||||
.I bunzip2
|
||||
(or
|
||||
.I bzip2 \-d)
|
||||
decompresses all
|
||||
specified files. Files which were not created by
|
||||
.I bzip2
|
||||
will be detected and ignored, and a warning issued.
|
||||
.I bzip2
|
||||
attempts to guess the filename for the decompressed file
|
||||
from that of the compressed file as follows:
|
||||
|
||||
filename.bz2 becomes filename
|
||||
filename.bz becomes filename
|
||||
filename.tbz2 becomes filename.tar
|
||||
filename.tbz becomes filename.tar
|
||||
anyothername becomes anyothername.out
|
||||
|
||||
If the file does not end in one of the recognised endings,
|
||||
.I .bz2,
|
||||
.I .bz,
|
||||
.I .tbz2
|
||||
or
|
||||
.I .tbz,
|
||||
.I bzip2
|
||||
complains that it cannot
|
||||
guess the name of the original file, and uses the original name
|
||||
with
|
||||
.I .out
|
||||
appended.
|
||||
|
||||
As with compression, supplying no
|
||||
filenames causes decompression from
|
||||
standard input to standard output.
|
||||
|
||||
.I bunzip2
|
||||
will correctly decompress a file which is the
|
||||
concatenation of two or more compressed files. The result is the
|
||||
concatenation of the corresponding uncompressed files. Integrity
|
||||
testing (\-t)
|
||||
of concatenated
|
||||
compressed files is also supported.
|
||||
|
||||
You can also compress or decompress files to the standard output by
|
||||
giving the \-c flag. Multiple files may be compressed and
|
||||
decompressed like this. The resulting outputs are fed sequentially to
|
||||
stdout. Compression of multiple files
|
||||
in this manner generates a stream
|
||||
containing multiple compressed file representations. Such a stream
|
||||
can be decompressed correctly only by
|
||||
.I bzip2
|
||||
version 0.9.0 or
|
||||
later. Earlier versions of
|
||||
.I bzip2
|
||||
will stop after decompressing
|
||||
the first file in the stream.
|
||||
|
||||
.I bzcat
|
||||
(or
|
||||
.I bzip2 -dc)
|
||||
decompresses all specified files to
|
||||
the standard output.
|
||||
|
||||
.I bzip2
|
||||
will read arguments from the environment variables
|
||||
.I BZIP2
|
||||
and
|
||||
.I BZIP,
|
||||
in that order, and will process them
|
||||
before any arguments read from the command line. This gives a
|
||||
convenient way to supply default arguments.
|
||||
|
||||
Compression is always performed, even if the compressed
|
||||
file is slightly
|
||||
larger than the original. Files of less than about one hundred bytes
|
||||
tend to get larger, since the compression mechanism has a constant
|
||||
overhead in the region of 50 bytes. Random data (including the output
|
||||
of most file compressors) is coded at about 8.05 bits per byte, giving
|
||||
an expansion of around 0.5%.
|
||||
|
||||
As a self-check for your protection,
|
||||
.I
|
||||
bzip2
|
||||
uses 32-bit CRCs to
|
||||
make sure that the decompressed version of a file is identical to the
|
||||
original. This guards against corruption of the compressed data, and
|
||||
against undetected bugs in
|
||||
.I bzip2
|
||||
(hopefully very unlikely). The
|
||||
chances of data corruption going undetected is microscopic, about one
|
||||
chance in four billion for each file processed. Be aware, though, that
|
||||
the check occurs upon decompression, so it can only tell you that
|
||||
something is wrong. It can't help you
|
||||
recover the original uncompressed
|
||||
data. You can use
|
||||
.I bzip2recover
|
||||
to try to recover data from
|
||||
damaged files.
|
||||
|
||||
Return values: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
|
||||
compressed file, 3 for an internal consistency error (eg, bug) which
|
||||
caused
|
||||
.I bzip2
|
||||
to panic.
|
||||
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
.B \-c --stdout
|
||||
Compress or decompress to standard output.
|
||||
.TP
|
||||
.B \-d --decompress
|
||||
Force decompression.
|
||||
.I bzip2,
|
||||
.I bunzip2
|
||||
and
|
||||
.I bzcat
|
||||
are
|
||||
really the same program, and the decision about what actions to take is
|
||||
done on the basis of which name is used. This flag overrides that
|
||||
mechanism, and forces
|
||||
.I bzip2
|
||||
to decompress.
|
||||
.TP
|
||||
.B \-z --compress
|
||||
The complement to \-d: forces compression, regardless of the
|
||||
invokation name.
|
||||
.TP
|
||||
.B \-t --test
|
||||
Check integrity of the specified file(s), but don't decompress them.
|
||||
This really performs a trial decompression and throws away the result.
|
||||
.TP
|
||||
.B \-f --force
|
||||
Force overwrite of output files. Normally,
|
||||
.I bzip2
|
||||
will not overwrite
|
||||
existing output files. Also forces
|
||||
.I bzip2
|
||||
to break hard links
|
||||
to files, which it otherwise wouldn't do.
|
||||
.TP
|
||||
.B \-k --keep
|
||||
Keep (don't delete) input files during compression
|
||||
or decompression.
|
||||
.TP
|
||||
.B \-s --small
|
||||
Reduce memory usage, for compression, decompression and testing. Files
|
||||
are decompressed and tested using a modified algorithm which only
|
||||
requires 2.5 bytes per block byte. This means any file can be
|
||||
decompressed in 2300k of memory, albeit at about half the normal speed.
|
||||
|
||||
During compression, \-s selects a block size of 200k, which limits
|
||||
memory use to around the same figure, at the expense of your compression
|
||||
ratio. In short, if your machine is low on memory (8 megabytes or
|
||||
less), use \-s for everything. See MEMORY MANAGEMENT below.
|
||||
.TP
|
||||
.B \-q --quiet
|
||||
Suppress non-essential warning messages. Messages pertaining to
|
||||
I/O errors and other critical events will not be suppressed.
|
||||
.TP
|
||||
.B \-v --verbose
|
||||
Verbose mode -- show the compression ratio for each file processed.
|
||||
Further \-v's increase the verbosity level, spewing out lots of
|
||||
information which is primarily of interest for diagnostic purposes.
|
||||
.TP
|
||||
.B \-L --license -V --version
|
||||
Display the software version, license terms and conditions.
|
||||
.TP
|
||||
.B \-1 to \-9
|
||||
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
|
||||
effect when decompressing. See MEMORY MANAGEMENT below.
|
||||
.TP
|
||||
.B \--
|
||||
Treats all subsequent arguments as file names, even if they start
|
||||
with a dash. This is so you can handle files with names beginning
|
||||
with a dash, for example: bzip2 \-- \-myfilename.
|
||||
.TP
|
||||
.B \--repetitive-fast --repetitive-best
|
||||
These flags are redundant in versions 0.9.5 and above. They provided
|
||||
some coarse control over the behaviour of the sorting algorithm in
|
||||
earlier versions, which was sometimes useful. 0.9.5 and above have an
|
||||
improved algorithm which renders these flags irrelevant.
|
||||
|
||||
.SH MEMORY MANAGEMENT
|
||||
.I bzip2
|
||||
compresses large files in blocks. The block size affects
|
||||
both the compression ratio achieved, and the amount of memory needed for
|
||||
compression and decompression. The flags \-1 through \-9
|
||||
specify the block size to be 100,000 bytes through 900,000 bytes (the
|
||||
default) respectively. At decompression time, the block size used for
|
||||
compression is read from the header of the compressed file, and
|
||||
.I bunzip2
|
||||
then allocates itself just enough memory to decompress
|
||||
the file. Since block sizes are stored in compressed files, it follows
|
||||
that the flags \-1 to \-9 are irrelevant to and so ignored
|
||||
during decompression.
|
||||
|
||||
Compression and decompression requirements,
|
||||
in bytes, can be estimated as:
|
||||
|
||||
Compression: 400k + ( 8 x block size )
|
||||
|
||||
Decompression: 100k + ( 4 x block size ), or
|
||||
100k + ( 2.5 x block size )
|
||||
|
||||
Larger block sizes give rapidly diminishing marginal returns. Most of
|
||||
the compression comes from the first two or three hundred k of block
|
||||
size, a fact worth bearing in mind when using
|
||||
.I bzip2
|
||||
on small machines.
|
||||
It is also important to appreciate that the decompression memory
|
||||
requirement is set at compression time by the choice of block size.
|
||||
|
||||
For files compressed with the default 900k block size,
|
||||
.I bunzip2
|
||||
will require about 3700 kbytes to decompress. To support decompression
|
||||
of any file on a 4 megabyte machine,
|
||||
.I bunzip2
|
||||
has an option to
|
||||
decompress using approximately half this amount of memory, about 2300
|
||||
kbytes. Decompression speed is also halved, so you should use this
|
||||
option only where necessary. The relevant flag is -s.
|
||||
|
||||
In general, try and use the largest block size memory constraints allow,
|
||||
since that maximises the compression achieved. Compression and
|
||||
decompression speed are virtually unaffected by block size.
|
||||
|
||||
Another significant point applies to files which fit in a single block
|
||||
-- that means most files you'd encounter using a large block size. The
|
||||
amount of real memory touched is proportional to the size of the file,
|
||||
since the file is smaller than a block. For example, compressing a file
|
||||
20,000 bytes long with the flag -9 will cause the compressor to
|
||||
allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
|
||||
kbytes of it. Similarly, the decompressor will allocate 3700k but only
|
||||
touch 100k + 20000 * 4 = 180 kbytes.
|
||||
|
||||
Here is a table which summarises the maximum memory usage for different
|
||||
block sizes. Also recorded is the total compressed size for 14 files of
|
||||
the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
|
||||
column gives some feel for how compression varies with block size.
|
||||
These figures tend to understate the advantage of larger block sizes for
|
||||
larger files, since the Corpus is dominated by smaller files.
|
||||
|
||||
Compress Decompress Decompress Corpus
|
||||
Flag usage usage -s usage Size
|
||||
|
||||
-1 1200k 500k 350k 914704
|
||||
-2 2000k 900k 600k 877703
|
||||
-3 2800k 1300k 850k 860338
|
||||
-4 3600k 1700k 1100k 846899
|
||||
-5 4400k 2100k 1350k 845160
|
||||
-6 5200k 2500k 1600k 838626
|
||||
-7 6100k 2900k 1850k 834096
|
||||
-8 6800k 3300k 2100k 828642
|
||||
-9 7600k 3700k 2350k 828642
|
||||
|
||||
.SH RECOVERING DATA FROM DAMAGED FILES
|
||||
.I bzip2
|
||||
compresses files in blocks, usually 900kbytes long. Each
|
||||
block is handled independently. If a media or transmission error causes
|
||||
a multi-block .bz2
|
||||
file to become damaged, it may be possible to
|
||||
recover data from the undamaged blocks in the file.
|
||||
|
||||
The compressed representation of each block is delimited by a 48-bit
|
||||
pattern, which makes it possible to find the block boundaries with
|
||||
reasonable certainty. Each block also carries its own 32-bit CRC, so
|
||||
damaged blocks can be distinguished from undamaged ones.
|
||||
|
||||
.I bzip2recover
|
||||
is a simple program whose purpose is to search for
|
||||
blocks in .bz2 files, and write each block out into its own .bz2
|
||||
file. You can then use
|
||||
.I bzip2
|
||||
\-t
|
||||
to test the
|
||||
integrity of the resulting files, and decompress those which are
|
||||
undamaged.
|
||||
|
||||
.I bzip2recover
|
||||
takes a single argument, the name of the damaged file,
|
||||
and writes a number of files "rec0001file.bz2",
|
||||
"rec0002file.bz2", etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example,
|
||||
"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
|
||||
the correct order.
|
||||
|
||||
.I bzip2recover
|
||||
should be of most use dealing with large .bz2
|
||||
files, as these will contain many blocks. It is clearly
|
||||
futile to use it on damaged single-block files, since a
|
||||
damaged block cannot be recovered. If you wish to minimise
|
||||
any potential data loss through media or transmission errors,
|
||||
you might consider compressing with a smaller
|
||||
block size.
|
||||
|
||||
.SH PERFORMANCE NOTES
|
||||
The sorting phase of compression gathers together similar strings in the
|
||||
file. Because of this, files containing very long runs of repeated
|
||||
symbols, like "aabaabaabaab ..." (repeated several hundred times) may
|
||||
compress more slowly than normal. Versions 0.9.5 and above fare much
|
||||
better than previous versions in this respect. The ratio between
|
||||
worst-case and average-case compression time is in the region of 10:1.
|
||||
For previous versions, this figure was more like 100:1. You can use the
|
||||
\-vvvv option to monitor progress in great detail, if you want.
|
||||
|
||||
Decompression speed is unaffected by these phenomena.
|
||||
|
||||
.I bzip2
|
||||
usually allocates several megabytes of memory to operate
|
||||
in, and then charges all over it in a fairly random fashion. This means
|
||||
that performance, both for compressing and decompressing, is largely
|
||||
determined by the speed at which your machine can service cache misses.
|
||||
Because of this, small changes to the code to reduce the miss rate have
|
||||
been observed to give disproportionately large performance improvements.
|
||||
I imagine
|
||||
.I bzip2
|
||||
will perform best on machines with very large caches.
|
||||
|
||||
.SH CAVEATS
|
||||
I/O error messages are not as helpful as they could be.
|
||||
.I bzip2
|
||||
tries hard to detect I/O errors and exit cleanly, but the details of
|
||||
what the problem is sometimes seem rather misleading.
|
||||
|
||||
This manual page pertains to version 1.0 of
|
||||
.I bzip2.
|
||||
Compressed
|
||||
data created by this version is entirely forwards and backwards
|
||||
compatible with the previous public releases, versions 0.1pl2, 0.9.0
|
||||
and 0.9.5,
|
||||
but with the following exception: 0.9.0 and above can correctly
|
||||
decompress multiple concatenated compressed files. 0.1pl2 cannot do
|
||||
this; it will stop after decompressing just the first file in the
|
||||
stream.
|
||||
|
||||
.I bzip2recover
|
||||
uses 32-bit integers to represent bit positions in
|
||||
compressed files, so it cannot handle compressed files more than 512
|
||||
megabytes long. This could easily be fixed.
|
||||
|
||||
.SH AUTHOR
|
||||
Julian Seward, jseward@acm.org.
|
||||
|
||||
http://sourceware.cygnus.com/bzip2
|
||||
http://www.muraroa.demon.co.uk
|
||||
|
||||
The ideas embodied in
|
||||
.I bzip2
|
||||
are due to (at least) the following
|
||||
people: Michael Burrows and David Wheeler (for the block sorting
|
||||
transformation), David Wheeler (again, for the Huffman coder), Peter
|
||||
Fenwick (for the structured coding model in the original
|
||||
.I bzip,
|
||||
and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
|
||||
(for the arithmetic coder in the original
|
||||
.I bzip).
|
||||
I am much
|
||||
indebted for their help, support and advice. See the manual in the
|
||||
source distribution for pointers to sources of documentation. Christian
|
||||
von Roques encouraged me to look for faster sorting algorithms, so as to
|
||||
speed up compression. Bela Lubkin encouraged me to improve the
|
||||
worst-case compression performance. Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and were generally
|
||||
helpful.
|
|
@ -1,462 +0,0 @@
|
|||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
bzip2, bunzip2 - a block-sorting file compressor, v1.0
|
||||
bzcat - decompresses files to stdout
|
||||
bzip2recover - recovers data from damaged bzip2 files
|
||||
|
||||
|
||||
SSYYNNOOPPSSIISS
|
||||
bbzziipp22 [ --ccddffkkqqssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
|
||||
bbuunnzziipp22 [ --ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
|
||||
bbzzccaatt [ --ss ] [ _f_i_l_e_n_a_m_e_s _._._. ]
|
||||
bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
|
||||
|
||||
|
||||
DDEESSCCRRIIPPTTIIOONN
|
||||
_b_z_i_p_2 compresses files using the Burrows-Wheeler block
|
||||
sorting text compression algorithm, and Huffman coding.
|
||||
Compression is generally considerably better than that
|
||||
achieved by more conventional LZ77/LZ78-based compressors,
|
||||
and approaches the performance of the PPM family of sta-
|
||||
tistical compressors.
|
||||
|
||||
The command-line options are deliberately very similar to
|
||||
those of _G_N_U _g_z_i_p_, but they are not identical.
|
||||
|
||||
_b_z_i_p_2 expects a list of file names to accompany the com-
|
||||
mand-line flags. Each file is replaced by a compressed
|
||||
version of itself, with the name "original_name.bz2".
|
||||
Each compressed file has the same modification date, per-
|
||||
missions, and, when possible, ownership as the correspond-
|
||||
ing original, so that these properties can be correctly
|
||||
restored at decompression time. File name handling is
|
||||
naive in the sense that there is no mechanism for preserv-
|
||||
ing original file names, permissions, ownerships or dates
|
||||
in filesystems which lack these concepts, or have serious
|
||||
file name length restrictions, such as MS-DOS.
|
||||
|
||||
_b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing
|
||||
files. If you want this to happen, specify the -f flag.
|
||||
|
||||
If no file names are specified, _b_z_i_p_2 compresses from
|
||||
standard input to standard output. In this case, _b_z_i_p_2
|
||||
will decline to write compressed output to a terminal, as
|
||||
this would be entirely incomprehensible and therefore
|
||||
pointless.
|
||||
|
||||
_b_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d_) decompresses all specified files.
|
||||
Files which were not created by _b_z_i_p_2 will be detected and
|
||||
ignored, and a warning issued. _b_z_i_p_2 attempts to guess
|
||||
the filename for the decompressed file from that of the
|
||||
compressed file as follows:
|
||||
|
||||
filename.bz2 becomes filename
|
||||
filename.bz becomes filename
|
||||
filename.tbz2 becomes filename.tar
|
||||
|
||||
|
||||
|
||||
1
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
filename.tbz becomes filename.tar
|
||||
anyothername becomes anyothername.out
|
||||
|
||||
If the file does not end in one of the recognised endings,
|
||||
_._b_z_2_, _._b_z_, _._t_b_z_2 or _._t_b_z_, _b_z_i_p_2 complains that it cannot
|
||||
guess the name of the original file, and uses the original
|
||||
name with _._o_u_t appended.
|
||||
|
||||
As with compression, supplying no filenames causes decom-
|
||||
pression from standard input to standard output.
|
||||
|
||||
_b_u_n_z_i_p_2 will correctly decompress a file which is the con-
|
||||
catenation of two or more compressed files. The result is
|
||||
the concatenation of the corresponding uncompressed files.
|
||||
Integrity testing (-t) of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
You can also compress or decompress files to the standard
|
||||
output by giving the -c flag. Multiple files may be com-
|
||||
pressed and decompressed like this. The resulting outputs
|
||||
are fed sequentially to stdout. Compression of multiple
|
||||
files in this manner generates a stream containing multi-
|
||||
ple compressed file representations. Such a stream can be
|
||||
decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
|
||||
later. Earlier versions of _b_z_i_p_2 will stop after decom-
|
||||
pressing the first file in the stream.
|
||||
|
||||
_b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to
|
||||
the standard output.
|
||||
|
||||
_b_z_i_p_2 will read arguments from the environment variables
|
||||
_B_Z_I_P_2 and _B_Z_I_P_, in that order, and will process them
|
||||
before any arguments read from the command line. This
|
||||
gives a convenient way to supply default arguments.
|
||||
|
||||
Compression is always performed, even if the compressed
|
||||
file is slightly larger than the original. Files of less
|
||||
than about one hundred bytes tend to get larger, since the
|
||||
compression mechanism has a constant overhead in the
|
||||
region of 50 bytes. Random data (including the output of
|
||||
most file compressors) is coded at about 8.05 bits per
|
||||
byte, giving an expansion of around 0.5%.
|
||||
|
||||
As a self-check for your protection, _b_z_i_p_2 uses 32-bit
|
||||
CRCs to make sure that the decompressed version of a file
|
||||
is identical to the original. This guards against corrup-
|
||||
tion of the compressed data, and against undetected bugs
|
||||
in _b_z_i_p_2 (hopefully very unlikely). The chances of data
|
||||
corruption going undetected is microscopic, about one
|
||||
chance in four billion for each file processed. Be aware,
|
||||
though, that the check occurs upon decompression, so it
|
||||
can only tell you that something is wrong. It can't help
|
||||
you recover the original uncompressed data. You can use
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
|
||||
|
||||
|
||||
|
||||
2
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
Return values: 0 for a normal exit, 1 for environmental
|
||||
problems (file not found, invalid flags, I/O errors, &c),
|
||||
2 to indicate a corrupt compressed file, 3 for an internal
|
||||
consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
|
||||
|
||||
|
||||
OOPPTTIIOONNSS
|
||||
--cc ----ssttddoouutt
|
||||
Compress or decompress to standard output.
|
||||
|
||||
--dd ----ddeeccoommpprreessss
|
||||
Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are
|
||||
really the same program, and the decision about
|
||||
what actions to take is done on the basis of which
|
||||
name is used. This flag overrides that mechanism,
|
||||
and forces _b_z_i_p_2 to decompress.
|
||||
|
||||
--zz ----ccoommpprreessss
|
||||
The complement to -d: forces compression, regard-
|
||||
less of the invokation name.
|
||||
|
||||
--tt ----tteesstt
|
||||
Check integrity of the specified file(s), but don't
|
||||
decompress them. This really performs a trial
|
||||
decompression and throws away the result.
|
||||
|
||||
--ff ----ffoorrccee
|
||||
Force overwrite of output files. Normally, _b_z_i_p_2
|
||||
will not overwrite existing output files. Also
|
||||
forces _b_z_i_p_2 to break hard links to files, which it
|
||||
otherwise wouldn't do.
|
||||
|
||||
--kk ----kkeeeepp
|
||||
Keep (don't delete) input files during compression
|
||||
or decompression.
|
||||
|
||||
--ss ----ssmmaallll
|
||||
Reduce memory usage, for compression, decompression
|
||||
and testing. Files are decompressed and tested
|
||||
using a modified algorithm which only requires 2.5
|
||||
bytes per block byte. This means any file can be
|
||||
decompressed in 2300k of memory, albeit at about
|
||||
half the normal speed.
|
||||
|
||||
During compression, -s selects a block size of
|
||||
200k, which limits memory use to around the same
|
||||
figure, at the expense of your compression ratio.
|
||||
In short, if your machine is low on memory (8
|
||||
megabytes or less), use -s for everything. See
|
||||
MEMORY MANAGEMENT below.
|
||||
|
||||
--qq ----qquuiieett
|
||||
Suppress non-essential warning messages. Messages
|
||||
pertaining to I/O errors and other critical events
|
||||
|
||||
|
||||
|
||||
3
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
will not be suppressed.
|
||||
|
||||
--vv ----vveerrbboossee
|
||||
Verbose mode -- show the compression ratio for each
|
||||
file processed. Further -v's increase the ver-
|
||||
bosity level, spewing out lots of information which
|
||||
is primarily of interest for diagnostic purposes.
|
||||
|
||||
--LL ----lliicceennssee --VV ----vveerrssiioonn
|
||||
Display the software version, license terms and
|
||||
conditions.
|
||||
|
||||
--11 ttoo --99
|
||||
Set the block size to 100 k, 200 k .. 900 k when
|
||||
compressing. Has no effect when decompressing.
|
||||
See MEMORY MANAGEMENT below.
|
||||
|
||||
---- Treats all subsequent arguments as file names, even
|
||||
if they start with a dash. This is so you can han-
|
||||
dle files with names beginning with a dash, for
|
||||
example: bzip2 -- -myfilename.
|
||||
|
||||
----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt
|
||||
These flags are redundant in versions 0.9.5 and
|
||||
above. They provided some coarse control over the
|
||||
behaviour of the sorting algorithm in earlier ver-
|
||||
sions, which was sometimes useful. 0.9.5 and above
|
||||
have an improved algorithm which renders these
|
||||
flags irrelevant.
|
||||
|
||||
|
||||
MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
|
||||
_b_z_i_p_2 compresses large files in blocks. The block size
|
||||
affects both the compression ratio achieved, and the
|
||||
amount of memory needed for compression and decompression.
|
||||
The flags -1 through -9 specify the block size to be
|
||||
100,000 bytes through 900,000 bytes (the default) respec-
|
||||
tively. At decompression time, the block size used for
|
||||
compression is read from the header of the compressed
|
||||
file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
|
||||
to decompress the file. Since block sizes are stored in
|
||||
compressed files, it follows that the flags -1 to -9 are
|
||||
irrelevant to and so ignored during decompression.
|
||||
|
||||
Compression and decompression requirements, in bytes, can
|
||||
be estimated as:
|
||||
|
||||
Compression: 400k + ( 8 x block size )
|
||||
|
||||
Decompression: 100k + ( 4 x block size ), or
|
||||
100k + ( 2.5 x block size )
|
||||
|
||||
Larger block sizes give rapidly diminishing marginal
|
||||
returns. Most of the compression comes from the first two
|
||||
|
||||
|
||||
|
||||
4
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
or three hundred k of block size, a fact worth bearing in
|
||||
mind when using _b_z_i_p_2 on small machines. It is also
|
||||
important to appreciate that the decompression memory
|
||||
requirement is set at compression time by the choice of
|
||||
block size.
|
||||
|
||||
For files compressed with the default 900k block size,
|
||||
_b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
|
||||
support decompression of any file on a 4 megabyte machine,
|
||||
_b_u_n_z_i_p_2 has an option to decompress using approximately
|
||||
half this amount of memory, about 2300 kbytes. Decompres-
|
||||
sion speed is also halved, so you should use this option
|
||||
only where necessary. The relevant flag is -s.
|
||||
|
||||
In general, try and use the largest block size memory con-
|
||||
straints allow, since that maximises the compression
|
||||
achieved. Compression and decompression speed are virtu-
|
||||
ally unaffected by block size.
|
||||
|
||||
Another significant point applies to files which fit in a
|
||||
single block -- that means most files you'd encounter
|
||||
using a large block size. The amount of real memory
|
||||
touched is proportional to the size of the file, since the
|
||||
file is smaller than a block. For example, compressing a
|
||||
file 20,000 bytes long with the flag -9 will cause the
|
||||
compressor to allocate around 7600k of memory, but only
|
||||
touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
|
||||
decompressor will allocate 3700k but only touch 100k +
|
||||
20000 * 4 = 180 kbytes.
|
||||
|
||||
Here is a table which summarises the maximum memory usage
|
||||
for different block sizes. Also recorded is the total
|
||||
compressed size for 14 files of the Calgary Text Compres-
|
||||
sion Corpus totalling 3,141,622 bytes. This column gives
|
||||
some feel for how compression varies with block size.
|
||||
These figures tend to understate the advantage of larger
|
||||
block sizes for larger files, since the Corpus is domi-
|
||||
nated by smaller files.
|
||||
|
||||
Compress Decompress Decompress Corpus
|
||||
Flag usage usage -s usage Size
|
||||
|
||||
-1 1200k 500k 350k 914704
|
||||
-2 2000k 900k 600k 877703
|
||||
-3 2800k 1300k 850k 860338
|
||||
-4 3600k 1700k 1100k 846899
|
||||
-5 4400k 2100k 1350k 845160
|
||||
-6 5200k 2500k 1600k 838626
|
||||
-7 6100k 2900k 1850k 834096
|
||||
-8 6800k 3300k 2100k 828642
|
||||
-9 7600k 3700k 2350k 828642
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
5
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
|
||||
_b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
|
||||
Each block is handled independently. If a media or trans-
|
||||
mission error causes a multi-block .bz2 file to become
|
||||
damaged, it may be possible to recover data from the
|
||||
undamaged blocks in the file.
|
||||
|
||||
The compressed representation of each block is delimited
|
||||
by a 48-bit pattern, which makes it possible to find the
|
||||
block boundaries with reasonable certainty. Each block
|
||||
also carries its own 32-bit CRC, so damaged blocks can be
|
||||
distinguished from undamaged ones.
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r is a simple program whose purpose is to
|
||||
search for blocks in .bz2 files, and write each block out
|
||||
into its own .bz2 file. You can then use _b_z_i_p_2 -t to test
|
||||
the integrity of the resulting files, and decompress those
|
||||
which are undamaged.
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam-
|
||||
aged file, and writes a number of files "rec0001file.bz2",
|
||||
"rec0002file.bz2", etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example, "bzip2
|
||||
-dc rec*file.bz2 > recovered_data" -- lists the files in
|
||||
the correct order.
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
|
||||
files, as these will contain many blocks. It is clearly
|
||||
futile to use it on damaged single-block files, since a
|
||||
damaged block cannot be recovered. If you wish to min-
|
||||
imise any potential data loss through media or transmis-
|
||||
sion errors, you might consider compressing with a smaller
|
||||
block size.
|
||||
|
||||
|
||||
PPEERRFFOORRMMAANNCCEE NNOOTTEESS
|
||||
The sorting phase of compression gathers together similar
|
||||
strings in the file. Because of this, files containing
|
||||
very long runs of repeated symbols, like "aabaabaabaab
|
||||
..." (repeated several hundred times) may compress more
|
||||
slowly than normal. Versions 0.9.5 and above fare much
|
||||
better than previous versions in this respect. The ratio
|
||||
between worst-case and average-case compression time is in
|
||||
the region of 10:1. For previous versions, this figure
|
||||
was more like 100:1. You can use the -vvvv option to mon-
|
||||
itor progress in great detail, if you want.
|
||||
|
||||
Decompression speed is unaffected by these phenomena.
|
||||
|
||||
_b_z_i_p_2 usually allocates several megabytes of memory to
|
||||
operate in, and then charges all over it in a fairly ran-
|
||||
dom fashion. This means that performance, both for com-
|
||||
pressing and decompressing, is largely determined by the
|
||||
|
||||
|
||||
|
||||
6
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
speed at which your machine can service cache misses.
|
||||
Because of this, small changes to the code to reduce the
|
||||
miss rate have been observed to give disproportionately
|
||||
large performance improvements. I imagine _b_z_i_p_2 will per-
|
||||
form best on machines with very large caches.
|
||||
|
||||
|
||||
CCAAVVEEAATTSS
|
||||
I/O error messages are not as helpful as they could be.
|
||||
_b_z_i_p_2 tries hard to detect I/O errors and exit cleanly,
|
||||
but the details of what the problem is sometimes seem
|
||||
rather misleading.
|
||||
|
||||
This manual page pertains to version 1.0 of _b_z_i_p_2_. Com-
|
||||
pressed data created by this version is entirely forwards
|
||||
and backwards compatible with the previous public
|
||||
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
|
||||
following exception: 0.9.0 and above can correctly decom-
|
||||
press multiple concatenated compressed files. 0.1pl2 can-
|
||||
not do this; it will stop after decompressing just the
|
||||
first file in the stream.
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
|
||||
tions in compressed files, so it cannot handle compressed
|
||||
files more than 512 megabytes long. This could easily be
|
||||
fixed.
|
||||
|
||||
|
||||
AAUUTTHHOORR
|
||||
Julian Seward, jseward@acm.org.
|
||||
|
||||
http://sourceware.cygnus.com/bzip2
|
||||
http://www.muraroa.demon.co.uk
|
||||
|
||||
The ideas embodied in _b_z_i_p_2 are due to (at least) the fol-
|
||||
lowing people: Michael Burrows and David Wheeler (for the
|
||||
block sorting transformation), David Wheeler (again, for
|
||||
the Huffman coder), Peter Fenwick (for the structured cod-
|
||||
ing model in the original _b_z_i_p_, and many refinements), and
|
||||
Alistair Moffat, Radford Neal and Ian Witten (for the
|
||||
arithmetic coder in the original _b_z_i_p_)_. I am much
|
||||
indebted for their help, support and advice. See the man-
|
||||
ual in the source distribution for pointers to sources of
|
||||
documentation. Christian von Roques encouraged me to look
|
||||
for faster sorting algorithms, so as to speed up compres-
|
||||
sion. Bela Lubkin encouraged me to improve the worst-case
|
||||
compression performance. Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and
|
||||
were generally helpful.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
7
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
|
@ -1,376 +0,0 @@
|
|||
|
||||
|
||||
NAME
|
||||
bzip2, bunzip2 - a block-sorting file compressor, v1.0
|
||||
bzcat - decompresses files to stdout
|
||||
bzip2recover - recovers data from damaged bzip2 files
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
bzip2 [ -cdfkqstvzVL123456789 ] [ filenames ... ]
|
||||
bunzip2 [ -fkvsVL ] [ filenames ... ]
|
||||
bzcat [ -s ] [ filenames ... ]
|
||||
bzip2recover filename
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
bzip2 compresses files using the Burrows-Wheeler block
|
||||
sorting text compression algorithm, and Huffman coding.
|
||||
Compression is generally considerably better than that
|
||||
achieved by more conventional LZ77/LZ78-based compressors,
|
||||
and approaches the performance of the PPM family of sta-
|
||||
tistical compressors.
|
||||
|
||||
The command-line options are deliberately very similar to
|
||||
those of GNU gzip, but they are not identical.
|
||||
|
||||
bzip2 expects a list of file names to accompany the com-
|
||||
mand-line flags. Each file is replaced by a compressed
|
||||
version of itself, with the name "original_name.bz2".
|
||||
Each compressed file has the same modification date, per-
|
||||
missions, and, when possible, ownership as the correspond-
|
||||
ing original, so that these properties can be correctly
|
||||
restored at decompression time. File name handling is
|
||||
naive in the sense that there is no mechanism for preserv-
|
||||
ing original file names, permissions, ownerships or dates
|
||||
in filesystems which lack these concepts, or have serious
|
||||
file name length restrictions, such as MS-DOS.
|
||||
|
||||
bzip2 and bunzip2 will by default not overwrite existing
|
||||
files. If you want this to happen, specify the -f flag.
|
||||
|
||||
If no file names are specified, bzip2 compresses from
|
||||
standard input to standard output. In this case, bzip2
|
||||
will decline to write compressed output to a terminal, as
|
||||
this would be entirely incomprehensible and therefore
|
||||
pointless.
|
||||
|
||||
bunzip2 (or bzip2 -d) decompresses all specified files.
|
||||
Files which were not created by bzip2 will be detected and
|
||||
ignored, and a warning issued. bzip2 attempts to guess
|
||||
the filename for the decompressed file from that of the
|
||||
compressed file as follows:
|
||||
|
||||
filename.bz2 becomes filename
|
||||
filename.bz becomes filename
|
||||
filename.tbz2 becomes filename.tar
|
||||
filename.tbz becomes filename.tar
|
||||
anyothername becomes anyothername.out
|
||||
|
||||
If the file does not end in one of the recognised endings,
|
||||
.bz2, .bz, .tbz2 or .tbz, bzip2 complains that it cannot
|
||||
guess the name of the original file, and uses the original
|
||||
name with .out appended.
|
||||
|
||||
As with compression, supplying no filenames causes decom-
|
||||
pression from standard input to standard output.
|
||||
|
||||
bunzip2 will correctly decompress a file which is the con-
|
||||
catenation of two or more compressed files. The result is
|
||||
the concatenation of the corresponding uncompressed files.
|
||||
Integrity testing (-t) of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
You can also compress or decompress files to the standard
|
||||
output by giving the -c flag. Multiple files may be com-
|
||||
pressed and decompressed like this. The resulting outputs
|
||||
are fed sequentially to stdout. Compression of multiple
|
||||
files in this manner generates a stream containing multi-
|
||||
ple compressed file representations. Such a stream can be
|
||||
decompressed correctly only by bzip2 version 0.9.0 or
|
||||
later. Earlier versions of bzip2 will stop after decom-
|
||||
pressing the first file in the stream.
|
||||
|
||||
bzcat (or bzip2 -dc) decompresses all specified files to
|
||||
the standard output.
|
||||
|
||||
bzip2 will read arguments from the environment variables
|
||||
BZIP2 and BZIP, in that order, and will process them
|
||||
before any arguments read from the command line. This
|
||||
gives a convenient way to supply default arguments.
|
||||
|
||||
Compression is always performed, even if the compressed
|
||||
file is slightly larger than the original. Files of less
|
||||
than about one hundred bytes tend to get larger, since the
|
||||
compression mechanism has a constant overhead in the
|
||||
region of 50 bytes. Random data (including the output of
|
||||
most file compressors) is coded at about 8.05 bits per
|
||||
byte, giving an expansion of around 0.5%.
|
||||
|
||||
As a self-check for your protection, bzip2 uses 32-bit
|
||||
CRCs to make sure that the decompressed version of a file
|
||||
is identical to the original. This guards against corrup-
|
||||
tion of the compressed data, and against undetected bugs
|
||||
in bzip2 (hopefully very unlikely). The chances of data
|
||||
corruption going undetected is microscopic, about one
|
||||
chance in four billion for each file processed. Be aware,
|
||||
though, that the check occurs upon decompression, so it
|
||||
can only tell you that something is wrong. It can't help
|
||||
you recover the original uncompressed data. You can use
|
||||
bzip2recover to try to recover data from damaged files.
|
||||
|
||||
Return values: 0 for a normal exit, 1 for environmental
|
||||
problems (file not found, invalid flags, I/O errors, &c),
|
||||
2 to indicate a corrupt compressed file, 3 for an internal
|
||||
consistency error (eg, bug) which caused bzip2 to panic.
|
||||
|
||||
|
||||
OPTIONS
|
||||
-c --stdout
|
||||
Compress or decompress to standard output.
|
||||
|
||||
-d --decompress
|
||||
Force decompression. bzip2, bunzip2 and bzcat are
|
||||
really the same program, and the decision about
|
||||
what actions to take is done on the basis of which
|
||||
name is used. This flag overrides that mechanism,
|
||||
and forces bzip2 to decompress.
|
||||
|
||||
-z --compress
|
||||
The complement to -d: forces compression, regard-
|
||||
less of the invokation name.
|
||||
|
||||
-t --test
|
||||
Check integrity of the specified file(s), but don't
|
||||
decompress them. This really performs a trial
|
||||
decompression and throws away the result.
|
||||
|
||||
-f --force
|
||||
Force overwrite of output files. Normally, bzip2
|
||||
will not overwrite existing output files. Also
|
||||
forces bzip2 to break hard links to files, which it
|
||||
otherwise wouldn't do.
|
||||
|
||||
-k --keep
|
||||
Keep (don't delete) input files during compression
|
||||
or decompression.
|
||||
|
||||
-s --small
|
||||
Reduce memory usage, for compression, decompression
|
||||
and testing. Files are decompressed and tested
|
||||
using a modified algorithm which only requires 2.5
|
||||
bytes per block byte. This means any file can be
|
||||
decompressed in 2300k of memory, albeit at about
|
||||
half the normal speed.
|
||||
|
||||
During compression, -s selects a block size of
|
||||
200k, which limits memory use to around the same
|
||||
figure, at the expense of your compression ratio.
|
||||
In short, if your machine is low on memory (8
|
||||
megabytes or less), use -s for everything. See
|
||||
MEMORY MANAGEMENT below.
|
||||
|
||||
-q --quiet
|
||||
Suppress non-essential warning messages. Messages
|
||||
pertaining to I/O errors and other critical events
|
||||
will not be suppressed.
|
||||
|
||||
-v --verbose
|
||||
Verbose mode -- show the compression ratio for each
|
||||
file processed. Further -v's increase the ver-
|
||||
bosity level, spewing out lots of information which
|
||||
is primarily of interest for diagnostic purposes.
|
||||
|
||||
-L --license -V --version
|
||||
Display the software version, license terms and
|
||||
conditions.
|
||||
|
||||
-1 to -9
|
||||
Set the block size to 100 k, 200 k .. 900 k when
|
||||
compressing. Has no effect when decompressing.
|
||||
See MEMORY MANAGEMENT below.
|
||||
|
||||
-- Treats all subsequent arguments as file names, even
|
||||
if they start with a dash. This is so you can han-
|
||||
dle files with names beginning with a dash, for
|
||||
example: bzip2 -- -myfilename.
|
||||
|
||||
--repetitive-fast --repetitive-best
|
||||
These flags are redundant in versions 0.9.5 and
|
||||
above. They provided some coarse control over the
|
||||
behaviour of the sorting algorithm in earlier ver-
|
||||
sions, which was sometimes useful. 0.9.5 and above
|
||||
have an improved algorithm which renders these
|
||||
flags irrelevant.
|
||||
|
||||
|
||||
MEMORY MANAGEMENT
|
||||
bzip2 compresses large files in blocks. The block size
|
||||
affects both the compression ratio achieved, and the
|
||||
amount of memory needed for compression and decompression.
|
||||
The flags -1 through -9 specify the block size to be
|
||||
100,000 bytes through 900,000 bytes (the default) respec-
|
||||
tively. At decompression time, the block size used for
|
||||
compression is read from the header of the compressed
|
||||
file, and bunzip2 then allocates itself just enough memory
|
||||
to decompress the file. Since block sizes are stored in
|
||||
compressed files, it follows that the flags -1 to -9 are
|
||||
irrelevant to and so ignored during decompression.
|
||||
|
||||
Compression and decompression requirements, in bytes, can
|
||||
be estimated as:
|
||||
|
||||
Compression: 400k + ( 8 x block size )
|
||||
|
||||
Decompression: 100k + ( 4 x block size ), or
|
||||
100k + ( 2.5 x block size )
|
||||
|
||||
Larger block sizes give rapidly diminishing marginal
|
||||
returns. Most of the compression comes from the first two
|
||||
or three hundred k of block size, a fact worth bearing in
|
||||
mind when using bzip2 on small machines. It is also
|
||||
important to appreciate that the decompression memory
|
||||
requirement is set at compression time by the choice of
|
||||
block size.
|
||||
|
||||
For files compressed with the default 900k block size,
|
||||
bunzip2 will require about 3700 kbytes to decompress. To
|
||||
support decompression of any file on a 4 megabyte machine,
|
||||
bunzip2 has an option to decompress using approximately
|
||||
half this amount of memory, about 2300 kbytes. Decompres-
|
||||
sion speed is also halved, so you should use this option
|
||||
only where necessary. The relevant flag is -s.
|
||||
|
||||
In general, try and use the largest block size memory con-
|
||||
straints allow, since that maximises the compression
|
||||
achieved. Compression and decompression speed are virtu-
|
||||
ally unaffected by block size.
|
||||
|
||||
Another significant point applies to files which fit in a
|
||||
single block -- that means most files you'd encounter
|
||||
using a large block size. The amount of real memory
|
||||
touched is proportional to the size of the file, since the
|
||||
file is smaller than a block. For example, compressing a
|
||||
file 20,000 bytes long with the flag -9 will cause the
|
||||
compressor to allocate around 7600k of memory, but only
|
||||
touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
|
||||
decompressor will allocate 3700k but only touch 100k +
|
||||
20000 * 4 = 180 kbytes.
|
||||
|
||||
Here is a table which summarises the maximum memory usage
|
||||
for different block sizes. Also recorded is the total
|
||||
compressed size for 14 files of the Calgary Text Compres-
|
||||
sion Corpus totalling 3,141,622 bytes. This column gives
|
||||
some feel for how compression varies with block size.
|
||||
These figures tend to understate the advantage of larger
|
||||
block sizes for larger files, since the Corpus is domi-
|
||||
nated by smaller files.
|
||||
|
||||
Compress Decompress Decompress Corpus
|
||||
Flag usage usage -s usage Size
|
||||
|
||||
-1 1200k 500k 350k 914704
|
||||
-2 2000k 900k 600k 877703
|
||||
-3 2800k 1300k 850k 860338
|
||||
-4 3600k 1700k 1100k 846899
|
||||
-5 4400k 2100k 1350k 845160
|
||||
-6 5200k 2500k 1600k 838626
|
||||
-7 6100k 2900k 1850k 834096
|
||||
-8 6800k 3300k 2100k 828642
|
||||
-9 7600k 3700k 2350k 828642
|
||||
|
||||
|
||||
RECOVERING DATA FROM DAMAGED FILES
|
||||
bzip2 compresses files in blocks, usually 900kbytes long.
|
||||
Each block is handled independently. If a media or trans-
|
||||
mission error causes a multi-block .bz2 file to become
|
||||
damaged, it may be possible to recover data from the
|
||||
undamaged blocks in the file.
|
||||
|
||||
The compressed representation of each block is delimited
|
||||
by a 48-bit pattern, which makes it possible to find the
|
||||
block boundaries with reasonable certainty. Each block
|
||||
also carries its own 32-bit CRC, so damaged blocks can be
|
||||
distinguished from undamaged ones.
|
||||
|
||||
bzip2recover is a simple program whose purpose is to
|
||||
search for blocks in .bz2 files, and write each block out
|
||||
into its own .bz2 file. You can then use bzip2 -t to test
|
||||
the integrity of the resulting files, and decompress those
|
||||
which are undamaged.
|
||||
|
||||
bzip2recover takes a single argument, the name of the dam-
|
||||
aged file, and writes a number of files "rec0001file.bz2",
|
||||
"rec0002file.bz2", etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example, "bzip2
|
||||
-dc rec*file.bz2 > recovered_data" -- lists the files in
|
||||
the correct order.
|
||||
|
||||
bzip2recover should be of most use dealing with large .bz2
|
||||
files, as these will contain many blocks. It is clearly
|
||||
futile to use it on damaged single-block files, since a
|
||||
damaged block cannot be recovered. If you wish to min-
|
||||
imise any potential data loss through media or transmis-
|
||||
sion errors, you might consider compressing with a smaller
|
||||
block size.
|
||||
|
||||
|
||||
PERFORMANCE NOTES
|
||||
The sorting phase of compression gathers together similar
|
||||
strings in the file. Because of this, files containing
|
||||
very long runs of repeated symbols, like "aabaabaabaab
|
||||
..." (repeated several hundred times) may compress more
|
||||
slowly than normal. Versions 0.9.5 and above fare much
|
||||
better than previous versions in this respect. The ratio
|
||||
between worst-case and average-case compression time is in
|
||||
the region of 10:1. For previous versions, this figure
|
||||
was more like 100:1. You can use the -vvvv option to mon-
|
||||
itor progress in great detail, if you want.
|
||||
|
||||
Decompression speed is unaffected by these phenomena.
|
||||
|
||||
bzip2 usually allocates several megabytes of memory to
|
||||
operate in, and then charges all over it in a fairly ran-
|
||||
dom fashion. This means that performance, both for com-
|
||||
pressing and decompressing, is largely determined by the
|
||||
speed at which your machine can service cache misses.
|
||||
Because of this, small changes to the code to reduce the
|
||||
miss rate have been observed to give disproportionately
|
||||
large performance improvements. I imagine bzip2 will per-
|
||||
form best on machines with very large caches.
|
||||
|
||||
|
||||
CAVEATS
|
||||
I/O error messages are not as helpful as they could be.
|
||||
bzip2 tries hard to detect I/O errors and exit cleanly,
|
||||
but the details of what the problem is sometimes seem
|
||||
rather misleading.
|
||||
|
||||
This manual page pertains to version 1.0 of bzip2. Com-
|
||||
pressed data created by this version is entirely forwards
|
||||
and backwards compatible with the previous public
|
||||
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
|
||||
following exception: 0.9.0 and above can correctly decom-
|
||||
press multiple concatenated compressed files. 0.1pl2 can-
|
||||
not do this; it will stop after decompressing just the
|
||||
first file in the stream.
|
||||
|
||||
bzip2recover uses 32-bit integers to represent bit posi-
|
||||
tions in compressed files, so it cannot handle compressed
|
||||
files more than 512 megabytes long. This could easily be
|
||||
fixed.
|
||||
|
||||
|
||||
AUTHOR
|
||||
Julian Seward, jseward@acm.org.
|
||||
|
||||
http://sourceware.cygnus.com/bzip2
|
||||
http://www.muraroa.demon.co.uk
|
||||
|
||||
The ideas embodied in bzip2 are due to (at least) the fol-
|
||||
lowing people: Michael Burrows and David Wheeler (for the
|
||||
block sorting transformation), David Wheeler (again, for
|
||||
the Huffman coder), Peter Fenwick (for the structured cod-
|
||||
ing model in the original bzip, and many refinements), and
|
||||
Alistair Moffat, Radford Neal and Ian Witten (for the
|
||||
arithmetic coder in the original bzip). I am much
|
||||
indebted for their help, support and advice. See the man-
|
||||
ual in the source distribution for pointers to sources of
|
||||
documentation. Christian von Roques encouraged me to look
|
||||
for faster sorting algorithms, so as to speed up compres-
|
||||
sion. Bela Lubkin encouraged me to improve the worst-case
|
||||
compression performance. Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and
|
||||
were generally helpful.
|
||||
|
|
@ -1,435 +0,0 @@
|
|||
|
||||
/*-----------------------------------------------------------*/
|
||||
/*--- Block recoverer program for bzip2 ---*/
|
||||
/*--- bzip2recover.c ---*/
|
||||
/*-----------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This program is bzip2recover, a program to attempt data
|
||||
salvage from damaged files created by the accompanying
|
||||
bzip2-1.0 program.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
--*/
|
||||
|
||||
/*--
|
||||
This program is a complete hack and should be rewritten
|
||||
properly. It isn't very complicated.
|
||||
--*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <errno.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
typedef unsigned int UInt32;
|
||||
typedef int Int32;
|
||||
typedef unsigned char UChar;
|
||||
typedef char Char;
|
||||
typedef unsigned char Bool;
|
||||
#define True ((Bool)1)
|
||||
#define False ((Bool)0)
|
||||
|
||||
|
||||
Char inFileName[2000];
|
||||
Char outFileName[2000];
|
||||
Char progName[2000];
|
||||
|
||||
UInt32 bytesOut = 0;
|
||||
UInt32 bytesIn = 0;
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
/*--- I/O errors ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void readError ( void )
|
||||
{
|
||||
fprintf ( stderr,
|
||||
"%s: I/O error reading `%s', possible reason follows.\n",
|
||||
progName, inFileName );
|
||||
perror ( progName );
|
||||
fprintf ( stderr, "%s: warning: output file(s) may be incomplete.\n",
|
||||
progName );
|
||||
exit ( 1 );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void writeError ( void )
|
||||
{
|
||||
fprintf ( stderr,
|
||||
"%s: I/O error reading `%s', possible reason follows.\n",
|
||||
progName, inFileName );
|
||||
perror ( progName );
|
||||
fprintf ( stderr, "%s: warning: output file(s) may be incomplete.\n",
|
||||
progName );
|
||||
exit ( 1 );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void mallocFail ( Int32 n )
|
||||
{
|
||||
fprintf ( stderr,
|
||||
"%s: malloc failed on request for %d bytes.\n",
|
||||
progName, n );
|
||||
fprintf ( stderr, "%s: warning: output file(s) may be incomplete.\n",
|
||||
progName );
|
||||
exit ( 1 );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
/*--- Bit stream I/O ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
typedef
|
||||
struct {
|
||||
FILE* handle;
|
||||
Int32 buffer;
|
||||
Int32 buffLive;
|
||||
Char mode;
|
||||
}
|
||||
BitStream;
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
BitStream* bsOpenReadStream ( FILE* stream )
|
||||
{
|
||||
BitStream *bs = malloc ( sizeof(BitStream) );
|
||||
if (bs == NULL) mallocFail ( sizeof(BitStream) );
|
||||
bs->handle = stream;
|
||||
bs->buffer = 0;
|
||||
bs->buffLive = 0;
|
||||
bs->mode = 'r';
|
||||
return bs;
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
BitStream* bsOpenWriteStream ( FILE* stream )
|
||||
{
|
||||
BitStream *bs = malloc ( sizeof(BitStream) );
|
||||
if (bs == NULL) mallocFail ( sizeof(BitStream) );
|
||||
bs->handle = stream;
|
||||
bs->buffer = 0;
|
||||
bs->buffLive = 0;
|
||||
bs->mode = 'w';
|
||||
return bs;
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void bsPutBit ( BitStream* bs, Int32 bit )
|
||||
{
|
||||
if (bs->buffLive == 8) {
|
||||
Int32 retVal = putc ( (UChar) bs->buffer, bs->handle );
|
||||
if (retVal == EOF) writeError();
|
||||
bytesOut++;
|
||||
bs->buffLive = 1;
|
||||
bs->buffer = bit & 0x1;
|
||||
} else {
|
||||
bs->buffer = ( (bs->buffer << 1) | (bit & 0x1) );
|
||||
bs->buffLive++;
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
/*--
|
||||
Returns 0 or 1, or 2 to indicate EOF.
|
||||
--*/
|
||||
Int32 bsGetBit ( BitStream* bs )
|
||||
{
|
||||
if (bs->buffLive > 0) {
|
||||
bs->buffLive --;
|
||||
return ( ((bs->buffer) >> (bs->buffLive)) & 0x1 );
|
||||
} else {
|
||||
Int32 retVal = getc ( bs->handle );
|
||||
if ( retVal == EOF ) {
|
||||
if (errno != 0) readError();
|
||||
return 2;
|
||||
}
|
||||
bs->buffLive = 7;
|
||||
bs->buffer = retVal;
|
||||
return ( ((bs->buffer) >> 7) & 0x1 );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void bsClose ( BitStream* bs )
|
||||
{
|
||||
Int32 retVal;
|
||||
|
||||
if ( bs->mode == 'w' ) {
|
||||
while ( bs->buffLive < 8 ) {
|
||||
bs->buffLive++;
|
||||
bs->buffer <<= 1;
|
||||
};
|
||||
retVal = putc ( (UChar) (bs->buffer), bs->handle );
|
||||
if (retVal == EOF) writeError();
|
||||
bytesOut++;
|
||||
retVal = fflush ( bs->handle );
|
||||
if (retVal == EOF) writeError();
|
||||
}
|
||||
retVal = fclose ( bs->handle );
|
||||
if (retVal == EOF) {
|
||||
if (bs->mode == 'w') writeError(); else readError();
|
||||
}
|
||||
free ( bs );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void bsPutUChar ( BitStream* bs, UChar c )
|
||||
{
|
||||
Int32 i;
|
||||
for (i = 7; i >= 0; i--)
|
||||
bsPutBit ( bs, (((UInt32) c) >> i) & 0x1 );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void bsPutUInt32 ( BitStream* bs, UInt32 c )
|
||||
{
|
||||
Int32 i;
|
||||
|
||||
for (i = 31; i >= 0; i--)
|
||||
bsPutBit ( bs, (c >> i) & 0x1 );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
Bool endsInBz2 ( Char* name )
|
||||
{
|
||||
Int32 n = strlen ( name );
|
||||
if (n <= 4) return False;
|
||||
return
|
||||
(name[n-4] == '.' &&
|
||||
name[n-3] == 'b' &&
|
||||
name[n-2] == 'z' &&
|
||||
name[n-1] == '2');
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
/*--- ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
#define BLOCK_HEADER_HI 0x00003141UL
|
||||
#define BLOCK_HEADER_LO 0x59265359UL
|
||||
|
||||
#define BLOCK_ENDMARK_HI 0x00001772UL
|
||||
#define BLOCK_ENDMARK_LO 0x45385090UL
|
||||
|
||||
|
||||
UInt32 bStart[20000];
|
||||
UInt32 bEnd[20000];
|
||||
UInt32 rbStart[20000];
|
||||
UInt32 rbEnd[20000];
|
||||
|
||||
Int32 main ( Int32 argc, Char** argv )
|
||||
{
|
||||
FILE* inFile;
|
||||
FILE* outFile;
|
||||
BitStream* bsIn, *bsWr;
|
||||
Int32 currBlock, b, wrBlock;
|
||||
UInt32 bitsRead;
|
||||
Int32 rbCtr;
|
||||
|
||||
|
||||
UInt32 buffHi, buffLo, blockCRC;
|
||||
Char* p;
|
||||
|
||||
strcpy ( progName, argv[0] );
|
||||
inFileName[0] = outFileName[0] = 0;
|
||||
|
||||
fprintf ( stderr, "bzip2recover 1.0: extracts blocks from damaged .bz2 files.\n" );
|
||||
|
||||
if (argc != 2) {
|
||||
fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n",
|
||||
progName, progName );
|
||||
exit(1);
|
||||
}
|
||||
|
||||
strcpy ( inFileName, argv[1] );
|
||||
|
||||
inFile = fopen ( inFileName, "rb" );
|
||||
if (inFile == NULL) {
|
||||
fprintf ( stderr, "%s: can't read `%s'\n", progName, inFileName );
|
||||
exit(1);
|
||||
}
|
||||
|
||||
bsIn = bsOpenReadStream ( inFile );
|
||||
fprintf ( stderr, "%s: searching for block boundaries ...\n", progName );
|
||||
|
||||
bitsRead = 0;
|
||||
buffHi = buffLo = 0;
|
||||
currBlock = 0;
|
||||
bStart[currBlock] = 0;
|
||||
|
||||
rbCtr = 0;
|
||||
|
||||
while (True) {
|
||||
b = bsGetBit ( bsIn );
|
||||
bitsRead++;
|
||||
if (b == 2) {
|
||||
if (bitsRead >= bStart[currBlock] &&
|
||||
(bitsRead - bStart[currBlock]) >= 40) {
|
||||
bEnd[currBlock] = bitsRead-1;
|
||||
if (currBlock > 0)
|
||||
fprintf ( stderr, " block %d runs from %d to %d (incomplete)\n",
|
||||
currBlock, bStart[currBlock], bEnd[currBlock] );
|
||||
} else
|
||||
currBlock--;
|
||||
break;
|
||||
}
|
||||
buffHi = (buffHi << 1) | (buffLo >> 31);
|
||||
buffLo = (buffLo << 1) | (b & 1);
|
||||
if ( ( (buffHi & 0x0000ffff) == BLOCK_HEADER_HI
|
||||
&& buffLo == BLOCK_HEADER_LO)
|
||||
||
|
||||
( (buffHi & 0x0000ffff) == BLOCK_ENDMARK_HI
|
||||
&& buffLo == BLOCK_ENDMARK_LO)
|
||||
) {
|
||||
if (bitsRead > 49)
|
||||
bEnd[currBlock] = bitsRead-49; else
|
||||
bEnd[currBlock] = 0;
|
||||
if (currBlock > 0 &&
|
||||
(bEnd[currBlock] - bStart[currBlock]) >= 130) {
|
||||
fprintf ( stderr, " block %d runs from %d to %d\n",
|
||||
rbCtr+1, bStart[currBlock], bEnd[currBlock] );
|
||||
rbStart[rbCtr] = bStart[currBlock];
|
||||
rbEnd[rbCtr] = bEnd[currBlock];
|
||||
rbCtr++;
|
||||
}
|
||||
currBlock++;
|
||||
|
||||
bStart[currBlock] = bitsRead;
|
||||
}
|
||||
}
|
||||
|
||||
bsClose ( bsIn );
|
||||
|
||||
/*-- identified blocks run from 1 to rbCtr inclusive. --*/
|
||||
|
||||
if (rbCtr < 1) {
|
||||
fprintf ( stderr,
|
||||
"%s: sorry, I couldn't find any block boundaries.\n",
|
||||
progName );
|
||||
exit(1);
|
||||
};
|
||||
|
||||
fprintf ( stderr, "%s: splitting into blocks\n", progName );
|
||||
|
||||
inFile = fopen ( inFileName, "rb" );
|
||||
if (inFile == NULL) {
|
||||
fprintf ( stderr, "%s: can't open `%s'\n", progName, inFileName );
|
||||
exit(1);
|
||||
}
|
||||
bsIn = bsOpenReadStream ( inFile );
|
||||
|
||||
/*-- placate gcc's dataflow analyser --*/
|
||||
blockCRC = 0; bsWr = 0;
|
||||
|
||||
bitsRead = 0;
|
||||
outFile = NULL;
|
||||
wrBlock = 0;
|
||||
while (True) {
|
||||
b = bsGetBit(bsIn);
|
||||
if (b == 2) break;
|
||||
buffHi = (buffHi << 1) | (buffLo >> 31);
|
||||
buffLo = (buffLo << 1) | (b & 1);
|
||||
if (bitsRead == 47+rbStart[wrBlock])
|
||||
blockCRC = (buffHi << 16) | (buffLo >> 16);
|
||||
|
||||
if (outFile != NULL && bitsRead >= rbStart[wrBlock]
|
||||
&& bitsRead <= rbEnd[wrBlock]) {
|
||||
bsPutBit ( bsWr, b );
|
||||
}
|
||||
|
||||
bitsRead++;
|
||||
|
||||
if (bitsRead == rbEnd[wrBlock]+1) {
|
||||
if (outFile != NULL) {
|
||||
bsPutUChar ( bsWr, 0x17 ); bsPutUChar ( bsWr, 0x72 );
|
||||
bsPutUChar ( bsWr, 0x45 ); bsPutUChar ( bsWr, 0x38 );
|
||||
bsPutUChar ( bsWr, 0x50 ); bsPutUChar ( bsWr, 0x90 );
|
||||
bsPutUInt32 ( bsWr, blockCRC );
|
||||
bsClose ( bsWr );
|
||||
}
|
||||
if (wrBlock >= rbCtr) break;
|
||||
wrBlock++;
|
||||
} else
|
||||
if (bitsRead == rbStart[wrBlock]) {
|
||||
outFileName[0] = 0;
|
||||
sprintf ( outFileName, "rec%4d", wrBlock+1 );
|
||||
for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0';
|
||||
strcat ( outFileName, inFileName );
|
||||
if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" );
|
||||
|
||||
fprintf ( stderr, " writing block %d to `%s' ...\n",
|
||||
wrBlock+1, outFileName );
|
||||
|
||||
outFile = fopen ( outFileName, "wb" );
|
||||
if (outFile == NULL) {
|
||||
fprintf ( stderr, "%s: can't write `%s'\n",
|
||||
progName, outFileName );
|
||||
exit(1);
|
||||
}
|
||||
bsWr = bsOpenWriteStream ( outFile );
|
||||
bsPutUChar ( bsWr, 'B' ); bsPutUChar ( bsWr, 'Z' );
|
||||
bsPutUChar ( bsWr, 'h' ); bsPutUChar ( bsWr, '9' );
|
||||
bsPutUChar ( bsWr, 0x31 ); bsPutUChar ( bsWr, 0x41 );
|
||||
bsPutUChar ( bsWr, 0x59 ); bsPutUChar ( bsWr, 0x26 );
|
||||
bsPutUChar ( bsWr, 0x53 ); bsPutUChar ( bsWr, 0x59 );
|
||||
}
|
||||
}
|
||||
|
||||
fprintf ( stderr, "%s: finished\n", progName );
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*-----------------------------------------------------------*/
|
||||
/*--- end bzip2recover.c ---*/
|
||||
/*-----------------------------------------------------------*/
|
File diff suppressed because it is too large
Load Diff
|
@ -1,319 +0,0 @@
|
|||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- Public header file for the library. ---*/
|
||||
/*--- bzlib.h ---*/
|
||||
/*-------------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
||||
This program is based on (at least) the work of:
|
||||
Mike Burrows
|
||||
David Wheeler
|
||||
Peter Fenwick
|
||||
Alistair Moffat
|
||||
Radford Neal
|
||||
Ian H. Witten
|
||||
Robert Sedgewick
|
||||
Jon L. Bentley
|
||||
|
||||
For more information on these sources, see the manual.
|
||||
--*/
|
||||
|
||||
|
||||
#ifndef _BZLIB_H
|
||||
#define _BZLIB_H
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#define BZ_RUN 0
|
||||
#define BZ_FLUSH 1
|
||||
#define BZ_FINISH 2
|
||||
|
||||
#define BZ_OK 0
|
||||
#define BZ_RUN_OK 1
|
||||
#define BZ_FLUSH_OK 2
|
||||
#define BZ_FINISH_OK 3
|
||||
#define BZ_STREAM_END 4
|
||||
#define BZ_SEQUENCE_ERROR (-1)
|
||||
#define BZ_PARAM_ERROR (-2)
|
||||
#define BZ_MEM_ERROR (-3)
|
||||
#define BZ_DATA_ERROR (-4)
|
||||
#define BZ_DATA_ERROR_MAGIC (-5)
|
||||
#define BZ_IO_ERROR (-6)
|
||||
#define BZ_UNEXPECTED_EOF (-7)
|
||||
#define BZ_OUTBUFF_FULL (-8)
|
||||
#define BZ_CONFIG_ERROR (-9)
|
||||
|
||||
typedef
|
||||
struct {
|
||||
char *next_in;
|
||||
unsigned int avail_in;
|
||||
unsigned int total_in_lo32;
|
||||
unsigned int total_in_hi32;
|
||||
|
||||
char *next_out;
|
||||
unsigned int avail_out;
|
||||
unsigned int total_out_lo32;
|
||||
unsigned int total_out_hi32;
|
||||
|
||||
void *state;
|
||||
|
||||
void *(*bzalloc)(void *,int,int);
|
||||
void (*bzfree)(void *,void *);
|
||||
void *opaque;
|
||||
}
|
||||
bz_stream;
|
||||
|
||||
|
||||
#ifndef BZ_IMPORT
|
||||
#define BZ_EXPORT
|
||||
#endif
|
||||
|
||||
#ifdef _WIN32
|
||||
# include <stdio.h>
|
||||
# include <windows.h>
|
||||
# ifdef small
|
||||
/* windows.h define small to char */
|
||||
# undef small
|
||||
# endif
|
||||
# ifdef BZ_EXPORT
|
||||
# define BZ_API(func) WINAPI func
|
||||
# define BZ_EXTERN extern
|
||||
# else
|
||||
/* import windows dll dynamically */
|
||||
# define BZ_API(func) (WINAPI * func)
|
||||
# define BZ_EXTERN
|
||||
# endif
|
||||
#else
|
||||
# define BZ_API(func) func
|
||||
# define BZ_EXTERN extern
|
||||
#endif
|
||||
|
||||
|
||||
/*-- Core (low-level) library functions --*/
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzCompressInit) (
|
||||
bz_stream* strm,
|
||||
int blockSize100k,
|
||||
int verbosity,
|
||||
int workFactor
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzCompress) (
|
||||
bz_stream* strm,
|
||||
int action
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzCompressEnd) (
|
||||
bz_stream* strm
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzDecompressInit) (
|
||||
bz_stream *strm,
|
||||
int verbosity,
|
||||
int small
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzDecompress) (
|
||||
bz_stream* strm
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzDecompressEnd) (
|
||||
bz_stream *strm
|
||||
);
|
||||
|
||||
|
||||
|
||||
/*-- High(er) level library functions --*/
|
||||
|
||||
#ifndef BZ_NO_STDIO
|
||||
#define BZ_MAX_UNUSED 5000
|
||||
|
||||
typedef void BZFILE;
|
||||
|
||||
BZ_EXTERN BZFILE* BZ_API(BZ2_bzReadOpen) (
|
||||
int* bzerror,
|
||||
FILE* f,
|
||||
int verbosity,
|
||||
int small,
|
||||
void* unused,
|
||||
int nUnused
|
||||
);
|
||||
|
||||
BZ_EXTERN void BZ_API(BZ2_bzReadClose) (
|
||||
int* bzerror,
|
||||
BZFILE* b
|
||||
);
|
||||
|
||||
BZ_EXTERN void BZ_API(BZ2_bzReadGetUnused) (
|
||||
int* bzerror,
|
||||
BZFILE* b,
|
||||
void** unused,
|
||||
int* nUnused
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzRead) (
|
||||
int* bzerror,
|
||||
BZFILE* b,
|
||||
void* buf,
|
||||
int len
|
||||
);
|
||||
|
||||
BZ_EXTERN BZFILE* BZ_API(BZ2_bzWriteOpen) (
|
||||
int* bzerror,
|
||||
FILE* f,
|
||||
int blockSize100k,
|
||||
int verbosity,
|
||||
int workFactor
|
||||
);
|
||||
|
||||
BZ_EXTERN void BZ_API(BZ2_bzWrite) (
|
||||
int* bzerror,
|
||||
BZFILE* b,
|
||||
void* buf,
|
||||
int len
|
||||
);
|
||||
|
||||
BZ_EXTERN void BZ_API(BZ2_bzWriteClose) (
|
||||
int* bzerror,
|
||||
BZFILE* b,
|
||||
int abandon,
|
||||
unsigned int* nbytes_in,
|
||||
unsigned int* nbytes_out
|
||||
);
|
||||
|
||||
BZ_EXTERN void BZ_API(BZ2_bzWriteClose64) (
|
||||
int* bzerror,
|
||||
BZFILE* b,
|
||||
int abandon,
|
||||
unsigned int* nbytes_in_lo32,
|
||||
unsigned int* nbytes_in_hi32,
|
||||
unsigned int* nbytes_out_lo32,
|
||||
unsigned int* nbytes_out_hi32
|
||||
);
|
||||
#endif
|
||||
|
||||
|
||||
/*-- Utility functions --*/
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzBuffToBuffCompress) (
|
||||
char* dest,
|
||||
unsigned int* destLen,
|
||||
char* source,
|
||||
unsigned int sourceLen,
|
||||
int blockSize100k,
|
||||
int verbosity,
|
||||
int workFactor
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzBuffToBuffDecompress) (
|
||||
char* dest,
|
||||
unsigned int* destLen,
|
||||
char* source,
|
||||
unsigned int sourceLen,
|
||||
int small,
|
||||
int verbosity
|
||||
);
|
||||
|
||||
|
||||
/*--
|
||||
Code contributed by Yoshioka Tsuneo
|
||||
(QWF00133@niftyserve.or.jp/tsuneo-y@is.aist-nara.ac.jp),
|
||||
to support better zlib compatibility.
|
||||
This code is not _officially_ part of libbzip2 (yet);
|
||||
I haven't tested it, documented it, or considered the
|
||||
threading-safeness of it.
|
||||
If this code breaks, please contact both Yoshioka and me.
|
||||
--*/
|
||||
|
||||
BZ_EXTERN const char * BZ_API(BZ2_bzlibVersion) (
|
||||
void
|
||||
);
|
||||
|
||||
#ifndef BZ_NO_STDIO
|
||||
BZ_EXTERN BZFILE * BZ_API(BZ2_bzopen) (
|
||||
const char *path,
|
||||
const char *mode
|
||||
);
|
||||
|
||||
BZ_EXTERN BZFILE * BZ_API(BZ2_bzdopen) (
|
||||
int fd,
|
||||
const char *mode
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzread) (
|
||||
BZFILE* b,
|
||||
void* buf,
|
||||
int len
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzwrite) (
|
||||
BZFILE* b,
|
||||
void* buf,
|
||||
int len
|
||||
);
|
||||
|
||||
BZ_EXTERN int BZ_API(BZ2_bzflush) (
|
||||
BZFILE* b
|
||||
);
|
||||
|
||||
BZ_EXTERN void BZ_API(BZ2_bzclose) (
|
||||
BZFILE* b
|
||||
);
|
||||
|
||||
BZ_EXTERN const char * BZ_API(BZ2_bzerror) (
|
||||
BZFILE *b,
|
||||
int *errnum
|
||||
);
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- end bzlib.h ---*/
|
||||
/*-------------------------------------------------------------*/
|
|
@ -1,530 +0,0 @@
|
|||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- Private header file for the library. ---*/
|
||||
/*--- bzlib_private.h ---*/
|
||||
/*-------------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
||||
This program is based on (at least) the work of:
|
||||
Mike Burrows
|
||||
David Wheeler
|
||||
Peter Fenwick
|
||||
Alistair Moffat
|
||||
Radford Neal
|
||||
Ian H. Witten
|
||||
Robert Sedgewick
|
||||
Jon L. Bentley
|
||||
|
||||
For more information on these sources, see the manual.
|
||||
--*/
|
||||
|
||||
|
||||
#ifndef _BZLIB_PRIVATE_H
|
||||
#define _BZLIB_PRIVATE_H
|
||||
|
||||
#include <stdlib.h>
|
||||
|
||||
#ifndef BZ_NO_STDIO
|
||||
#include <stdio.h>
|
||||
#include <ctype.h>
|
||||
#include <string.h>
|
||||
#endif
|
||||
|
||||
#include "bzlib.h"
|
||||
|
||||
|
||||
|
||||
/*-- General stuff. --*/
|
||||
|
||||
#define BZ_VERSION "1.0.1, 23-June-2000"
|
||||
|
||||
typedef char Char;
|
||||
typedef unsigned char Bool;
|
||||
typedef unsigned char UChar;
|
||||
typedef int Int32;
|
||||
typedef unsigned int UInt32;
|
||||
typedef short Int16;
|
||||
typedef unsigned short UInt16;
|
||||
|
||||
#define True ((Bool)1)
|
||||
#define False ((Bool)0)
|
||||
|
||||
#ifndef __GNUC__
|
||||
#define __inline__ /* */
|
||||
#endif
|
||||
|
||||
#ifndef BZ_NO_STDIO
|
||||
extern void BZ2_bz__AssertH__fail ( int errcode );
|
||||
#define AssertH(cond,errcode) \
|
||||
{ if (!(cond)) BZ2_bz__AssertH__fail ( errcode ); }
|
||||
#if BZ_DEBUG
|
||||
#define AssertD(cond,msg) \
|
||||
{ if (!(cond)) { \
|
||||
fprintf ( stderr, \
|
||||
"\n\nlibbzip2(debug build): internal error\n\t%s\n", msg );\
|
||||
exit(1); \
|
||||
}}
|
||||
#else
|
||||
#define AssertD(cond,msg) /* */
|
||||
#endif
|
||||
#define VPrintf0(zf) \
|
||||
fprintf(stderr,zf)
|
||||
#define VPrintf1(zf,za1) \
|
||||
fprintf(stderr,zf,za1)
|
||||
#define VPrintf2(zf,za1,za2) \
|
||||
fprintf(stderr,zf,za1,za2)
|
||||
#define VPrintf3(zf,za1,za2,za3) \
|
||||
fprintf(stderr,zf,za1,za2,za3)
|
||||
#define VPrintf4(zf,za1,za2,za3,za4) \
|
||||
fprintf(stderr,zf,za1,za2,za3,za4)
|
||||
#define VPrintf5(zf,za1,za2,za3,za4,za5) \
|
||||
fprintf(stderr,zf,za1,za2,za3,za4,za5)
|
||||
#else
|
||||
extern void bz_internal_error ( int errcode );
|
||||
#define AssertH(cond,errcode) \
|
||||
{ if (!(cond)) bz_internal_error ( errcode ); }
|
||||
#define AssertD(cond,msg) /* */
|
||||
#define VPrintf0(zf) /* */
|
||||
#define VPrintf1(zf,za1) /* */
|
||||
#define VPrintf2(zf,za1,za2) /* */
|
||||
#define VPrintf3(zf,za1,za2,za3) /* */
|
||||
#define VPrintf4(zf,za1,za2,za3,za4) /* */
|
||||
#define VPrintf5(zf,za1,za2,za3,za4,za5) /* */
|
||||
#endif
|
||||
|
||||
|
||||
#define BZALLOC(nnn) (strm->bzalloc)(strm->opaque,(nnn),1)
|
||||
#define BZFREE(ppp) (strm->bzfree)(strm->opaque,(ppp))
|
||||
|
||||
|
||||
/*-- Constants for the back end. --*/
|
||||
|
||||
#define BZ_MAX_ALPHA_SIZE 258
|
||||
#define BZ_MAX_CODE_LEN 23
|
||||
|
||||
#define BZ_RUNA 0
|
||||
#define BZ_RUNB 1
|
||||
|
||||
#define BZ_N_GROUPS 6
|
||||
#define BZ_G_SIZE 50
|
||||
#define BZ_N_ITERS 4
|
||||
|
||||
#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
|
||||
|
||||
|
||||
|
||||
/*-- Stuff for randomising repetitive blocks. --*/
|
||||
|
||||
extern Int32 BZ2_rNums[512];
|
||||
|
||||
#define BZ_RAND_DECLS \
|
||||
Int32 rNToGo; \
|
||||
Int32 rTPos \
|
||||
|
||||
#define BZ_RAND_INIT_MASK \
|
||||
s->rNToGo = 0; \
|
||||
s->rTPos = 0 \
|
||||
|
||||
#define BZ_RAND_MASK ((s->rNToGo == 1) ? 1 : 0)
|
||||
|
||||
#define BZ_RAND_UPD_MASK \
|
||||
if (s->rNToGo == 0) { \
|
||||
s->rNToGo = BZ2_rNums[s->rTPos]; \
|
||||
s->rTPos++; \
|
||||
if (s->rTPos == 512) s->rTPos = 0; \
|
||||
} \
|
||||
s->rNToGo--;
|
||||
|
||||
|
||||
|
||||
/*-- Stuff for doing CRCs. --*/
|
||||
|
||||
extern UInt32 BZ2_crc32Table[256];
|
||||
|
||||
#define BZ_INITIALISE_CRC(crcVar) \
|
||||
{ \
|
||||
crcVar = 0xffffffffL; \
|
||||
}
|
||||
|
||||
#define BZ_FINALISE_CRC(crcVar) \
|
||||
{ \
|
||||
crcVar = ~(crcVar); \
|
||||
}
|
||||
|
||||
#define BZ_UPDATE_CRC(crcVar,cha) \
|
||||
{ \
|
||||
crcVar = (crcVar << 8) ^ \
|
||||
BZ2_crc32Table[(crcVar >> 24) ^ \
|
||||
((UChar)cha)]; \
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*-- States and modes for compression. --*/
|
||||
|
||||
#define BZ_M_IDLE 1
|
||||
#define BZ_M_RUNNING 2
|
||||
#define BZ_M_FLUSHING 3
|
||||
#define BZ_M_FINISHING 4
|
||||
|
||||
#define BZ_S_OUTPUT 1
|
||||
#define BZ_S_INPUT 2
|
||||
|
||||
#define BZ_N_RADIX 2
|
||||
#define BZ_N_QSORT 12
|
||||
#define BZ_N_SHELL 18
|
||||
#define BZ_N_OVERSHOOT (BZ_N_RADIX + BZ_N_QSORT + BZ_N_SHELL + 2)
|
||||
|
||||
|
||||
|
||||
|
||||
/*-- Structure holding all the compression-side stuff. --*/
|
||||
|
||||
typedef
|
||||
struct {
|
||||
/* pointer back to the struct bz_stream */
|
||||
bz_stream* strm;
|
||||
|
||||
/* mode this stream is in, and whether inputting */
|
||||
/* or outputting data */
|
||||
Int32 mode;
|
||||
Int32 state;
|
||||
|
||||
/* remembers avail_in when flush/finish requested */
|
||||
UInt32 avail_in_expect;
|
||||
|
||||
/* for doing the block sorting */
|
||||
UInt32* arr1;
|
||||
UInt32* arr2;
|
||||
UInt32* ftab;
|
||||
Int32 origPtr;
|
||||
|
||||
/* aliases for arr1 and arr2 */
|
||||
UInt32* ptr;
|
||||
UChar* block;
|
||||
UInt16* mtfv;
|
||||
UChar* zbits;
|
||||
|
||||
/* for deciding when to use the fallback sorting algorithm */
|
||||
Int32 workFactor;
|
||||
|
||||
/* run-length-encoding of the input */
|
||||
UInt32 state_in_ch;
|
||||
Int32 state_in_len;
|
||||
BZ_RAND_DECLS;
|
||||
|
||||
/* input and output limits and current posns */
|
||||
Int32 nblock;
|
||||
Int32 nblockMAX;
|
||||
Int32 numZ;
|
||||
Int32 state_out_pos;
|
||||
|
||||
/* map of bytes used in block */
|
||||
Int32 nInUse;
|
||||
Bool inUse[256];
|
||||
UChar unseqToSeq[256];
|
||||
|
||||
/* the buffer for bit stream creation */
|
||||
UInt32 bsBuff;
|
||||
Int32 bsLive;
|
||||
|
||||
/* block and combined CRCs */
|
||||
UInt32 blockCRC;
|
||||
UInt32 combinedCRC;
|
||||
|
||||
/* misc administratium */
|
||||
Int32 verbosity;
|
||||
Int32 blockNo;
|
||||
Int32 blockSize100k;
|
||||
|
||||
/* stuff for coding the MTF values */
|
||||
Int32 nMTF;
|
||||
Int32 mtfFreq [BZ_MAX_ALPHA_SIZE];
|
||||
UChar selector [BZ_MAX_SELECTORS];
|
||||
UChar selectorMtf[BZ_MAX_SELECTORS];
|
||||
|
||||
UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
Int32 code [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
Int32 rfreq [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
/* second dimension: only 3 needed; 4 makes index calculations faster */
|
||||
UInt32 len_pack[BZ_MAX_ALPHA_SIZE][4];
|
||||
|
||||
}
|
||||
EState;
|
||||
|
||||
|
||||
|
||||
/*-- externs for compression. --*/
|
||||
|
||||
extern void
|
||||
BZ2_blockSort ( EState* );
|
||||
|
||||
extern void
|
||||
BZ2_compressBlock ( EState*, Bool );
|
||||
|
||||
extern void
|
||||
BZ2_bsInitWrite ( EState* );
|
||||
|
||||
extern void
|
||||
BZ2_hbAssignCodes ( Int32*, UChar*, Int32, Int32, Int32 );
|
||||
|
||||
extern void
|
||||
BZ2_hbMakeCodeLengths ( UChar*, Int32*, Int32, Int32 );
|
||||
|
||||
|
||||
|
||||
/*-- states for decompression. --*/
|
||||
|
||||
#define BZ_X_IDLE 1
|
||||
#define BZ_X_OUTPUT 2
|
||||
|
||||
#define BZ_X_MAGIC_1 10
|
||||
#define BZ_X_MAGIC_2 11
|
||||
#define BZ_X_MAGIC_3 12
|
||||
#define BZ_X_MAGIC_4 13
|
||||
#define BZ_X_BLKHDR_1 14
|
||||
#define BZ_X_BLKHDR_2 15
|
||||
#define BZ_X_BLKHDR_3 16
|
||||
#define BZ_X_BLKHDR_4 17
|
||||
#define BZ_X_BLKHDR_5 18
|
||||
#define BZ_X_BLKHDR_6 19
|
||||
#define BZ_X_BCRC_1 20
|
||||
#define BZ_X_BCRC_2 21
|
||||
#define BZ_X_BCRC_3 22
|
||||
#define BZ_X_BCRC_4 23
|
||||
#define BZ_X_RANDBIT 24
|
||||
#define BZ_X_ORIGPTR_1 25
|
||||
#define BZ_X_ORIGPTR_2 26
|
||||
#define BZ_X_ORIGPTR_3 27
|
||||
#define BZ_X_MAPPING_1 28
|
||||
#define BZ_X_MAPPING_2 29
|
||||
#define BZ_X_SELECTOR_1 30
|
||||
#define BZ_X_SELECTOR_2 31
|
||||
#define BZ_X_SELECTOR_3 32
|
||||
#define BZ_X_CODING_1 33
|
||||
#define BZ_X_CODING_2 34
|
||||
#define BZ_X_CODING_3 35
|
||||
#define BZ_X_MTF_1 36
|
||||
#define BZ_X_MTF_2 37
|
||||
#define BZ_X_MTF_3 38
|
||||
#define BZ_X_MTF_4 39
|
||||
#define BZ_X_MTF_5 40
|
||||
#define BZ_X_MTF_6 41
|
||||
#define BZ_X_ENDHDR_2 42
|
||||
#define BZ_X_ENDHDR_3 43
|
||||
#define BZ_X_ENDHDR_4 44
|
||||
#define BZ_X_ENDHDR_5 45
|
||||
#define BZ_X_ENDHDR_6 46
|
||||
#define BZ_X_CCRC_1 47
|
||||
#define BZ_X_CCRC_2 48
|
||||
#define BZ_X_CCRC_3 49
|
||||
#define BZ_X_CCRC_4 50
|
||||
|
||||
|
||||
|
||||
/*-- Constants for the fast MTF decoder. --*/
|
||||
|
||||
#define MTFA_SIZE 4096
|
||||
#define MTFL_SIZE 16
|
||||
|
||||
|
||||
|
||||
/*-- Structure holding all the decompression-side stuff. --*/
|
||||
|
||||
typedef
|
||||
struct {
|
||||
/* pointer back to the struct bz_stream */
|
||||
bz_stream* strm;
|
||||
|
||||
/* state indicator for this stream */
|
||||
Int32 state;
|
||||
|
||||
/* for doing the final run-length decoding */
|
||||
UChar state_out_ch;
|
||||
Int32 state_out_len;
|
||||
Bool blockRandomised;
|
||||
BZ_RAND_DECLS;
|
||||
|
||||
/* the buffer for bit stream reading */
|
||||
UInt32 bsBuff;
|
||||
Int32 bsLive;
|
||||
|
||||
/* misc administratium */
|
||||
Int32 blockSize100k;
|
||||
Bool smallDecompress;
|
||||
Int32 currBlockNo;
|
||||
Int32 verbosity;
|
||||
|
||||
/* for undoing the Burrows-Wheeler transform */
|
||||
Int32 origPtr;
|
||||
UInt32 tPos;
|
||||
Int32 k0;
|
||||
Int32 unzftab[256];
|
||||
Int32 nblock_used;
|
||||
Int32 cftab[257];
|
||||
Int32 cftabCopy[257];
|
||||
|
||||
/* for undoing the Burrows-Wheeler transform (FAST) */
|
||||
UInt32 *tt;
|
||||
|
||||
/* for undoing the Burrows-Wheeler transform (SMALL) */
|
||||
UInt16 *ll16;
|
||||
UChar *ll4;
|
||||
|
||||
/* stored and calculated CRCs */
|
||||
UInt32 storedBlockCRC;
|
||||
UInt32 storedCombinedCRC;
|
||||
UInt32 calculatedBlockCRC;
|
||||
UInt32 calculatedCombinedCRC;
|
||||
|
||||
/* map of bytes used in block */
|
||||
Int32 nInUse;
|
||||
Bool inUse[256];
|
||||
Bool inUse16[16];
|
||||
UChar seqToUnseq[256];
|
||||
|
||||
/* for decoding the MTF values */
|
||||
UChar mtfa [MTFA_SIZE];
|
||||
Int32 mtfbase[256 / MTFL_SIZE];
|
||||
UChar selector [BZ_MAX_SELECTORS];
|
||||
UChar selectorMtf[BZ_MAX_SELECTORS];
|
||||
UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
|
||||
Int32 limit [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
Int32 base [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
Int32 perm [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
Int32 minLens[BZ_N_GROUPS];
|
||||
|
||||
/* save area for scalars in the main decompress code */
|
||||
Int32 save_i;
|
||||
Int32 save_j;
|
||||
Int32 save_t;
|
||||
Int32 save_alphaSize;
|
||||
Int32 save_nGroups;
|
||||
Int32 save_nSelectors;
|
||||
Int32 save_EOB;
|
||||
Int32 save_groupNo;
|
||||
Int32 save_groupPos;
|
||||
Int32 save_nextSym;
|
||||
Int32 save_nblockMAX;
|
||||
Int32 save_nblock;
|
||||
Int32 save_es;
|
||||
Int32 save_N;
|
||||
Int32 save_curr;
|
||||
Int32 save_zt;
|
||||
Int32 save_zn;
|
||||
Int32 save_zvec;
|
||||
Int32 save_zj;
|
||||
Int32 save_gSel;
|
||||
Int32 save_gMinlen;
|
||||
Int32* save_gLimit;
|
||||
Int32* save_gBase;
|
||||
Int32* save_gPerm;
|
||||
|
||||
}
|
||||
DState;
|
||||
|
||||
|
||||
|
||||
/*-- Macros for decompression. --*/
|
||||
|
||||
#define BZ_GET_FAST(cccc) \
|
||||
s->tPos = s->tt[s->tPos]; \
|
||||
cccc = (UChar)(s->tPos & 0xff); \
|
||||
s->tPos >>= 8;
|
||||
|
||||
#define BZ_GET_FAST_C(cccc) \
|
||||
c_tPos = c_tt[c_tPos]; \
|
||||
cccc = (UChar)(c_tPos & 0xff); \
|
||||
c_tPos >>= 8;
|
||||
|
||||
#define SET_LL4(i,n) \
|
||||
{ if (((i) & 0x1) == 0) \
|
||||
s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0xf0) | (n); else \
|
||||
s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0x0f) | ((n) << 4); \
|
||||
}
|
||||
|
||||
#define GET_LL4(i) \
|
||||
((((UInt32)(s->ll4[(i) >> 1])) >> (((i) << 2) & 0x4)) & 0xF)
|
||||
|
||||
#define SET_LL(i,n) \
|
||||
{ s->ll16[i] = (UInt16)(n & 0x0000ffff); \
|
||||
SET_LL4(i, n >> 16); \
|
||||
}
|
||||
|
||||
#define GET_LL(i) \
|
||||
(((UInt32)s->ll16[i]) | (GET_LL4(i) << 16))
|
||||
|
||||
#define BZ_GET_SMALL(cccc) \
|
||||
cccc = BZ2_indexIntoF ( s->tPos, s->cftab ); \
|
||||
s->tPos = GET_LL(s->tPos);
|
||||
|
||||
|
||||
/*-- externs for decompression. --*/
|
||||
|
||||
extern Int32
|
||||
BZ2_indexIntoF ( Int32, Int32* );
|
||||
|
||||
extern Int32
|
||||
BZ2_decompress ( DState* );
|
||||
|
||||
extern void
|
||||
BZ2_hbCreateDecodeTables ( Int32*, Int32*, Int32*, UChar*,
|
||||
Int32, Int32, Int32 );
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
/*-- BZ_NO_STDIO seems to make NULL disappear on some platforms. --*/
|
||||
|
||||
#ifdef BZ_NO_STDIO
|
||||
#ifndef NULL
|
||||
#define NULL 0
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- end bzlib_private.h ---*/
|
||||
/*-------------------------------------------------------------*/
|
|
@ -1,714 +0,0 @@
|
|||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- Compression machinery (not incl block sorting) ---*/
|
||||
/*--- compress.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
||||
This program is based on (at least) the work of:
|
||||
Mike Burrows
|
||||
David Wheeler
|
||||
Peter Fenwick
|
||||
Alistair Moffat
|
||||
Radford Neal
|
||||
Ian H. Witten
|
||||
Robert Sedgewick
|
||||
Jon L. Bentley
|
||||
|
||||
For more information on these sources, see the manual.
|
||||
--*/
|
||||
|
||||
/*--
|
||||
CHANGES
|
||||
~~~~~~~
|
||||
0.9.0 -- original version.
|
||||
|
||||
0.9.0a/b -- no changes in this file.
|
||||
|
||||
0.9.0c
|
||||
* changed setting of nGroups in sendMTFValues() so as to
|
||||
do a bit better on small files
|
||||
--*/
|
||||
|
||||
#include "bzlib_private.h"
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
/*--- Bit stream I/O ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
void BZ2_bsInitWrite ( EState* s )
|
||||
{
|
||||
s->bsLive = 0;
|
||||
s->bsBuff = 0;
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
static
|
||||
void bsFinishWrite ( EState* s )
|
||||
{
|
||||
while (s->bsLive > 0) {
|
||||
s->zbits[s->numZ] = (UChar)(s->bsBuff >> 24);
|
||||
s->numZ++;
|
||||
s->bsBuff <<= 8;
|
||||
s->bsLive -= 8;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
#define bsNEEDW(nz) \
|
||||
{ \
|
||||
while (s->bsLive >= 8) { \
|
||||
s->zbits[s->numZ] \
|
||||
= (UChar)(s->bsBuff >> 24); \
|
||||
s->numZ++; \
|
||||
s->bsBuff <<= 8; \
|
||||
s->bsLive -= 8; \
|
||||
} \
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
static
|
||||
__inline__
|
||||
void bsW ( EState* s, Int32 n, UInt32 v )
|
||||
{
|
||||
bsNEEDW ( n );
|
||||
s->bsBuff |= (v << (32 - s->bsLive - n));
|
||||
s->bsLive += n;
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
static
|
||||
void bsPutUInt32 ( EState* s, UInt32 u )
|
||||
{
|
||||
bsW ( s, 8, (u >> 24) & 0xffL );
|
||||
bsW ( s, 8, (u >> 16) & 0xffL );
|
||||
bsW ( s, 8, (u >> 8) & 0xffL );
|
||||
bsW ( s, 8, u & 0xffL );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
static
|
||||
void bsPutUChar ( EState* s, UChar c )
|
||||
{
|
||||
bsW( s, 8, (UInt32)c );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
/*--- The back end proper ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
static
|
||||
void makeMaps_e ( EState* s )
|
||||
{
|
||||
Int32 i;
|
||||
s->nInUse = 0;
|
||||
for (i = 0; i < 256; i++)
|
||||
if (s->inUse[i]) {
|
||||
s->unseqToSeq[i] = s->nInUse;
|
||||
s->nInUse++;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
static
|
||||
void generateMTFValues ( EState* s )
|
||||
{
|
||||
UChar yy[256];
|
||||
Int32 i, j;
|
||||
Int32 zPend;
|
||||
Int32 wr;
|
||||
Int32 EOB;
|
||||
|
||||
/*
|
||||
After sorting (eg, here),
|
||||
s->arr1 [ 0 .. s->nblock-1 ] holds sorted order,
|
||||
and
|
||||
((UChar*)s->arr2) [ 0 .. s->nblock-1 ]
|
||||
holds the original block data.
|
||||
|
||||
The first thing to do is generate the MTF values,
|
||||
and put them in
|
||||
((UInt16*)s->arr1) [ 0 .. s->nblock-1 ].
|
||||
Because there are strictly fewer or equal MTF values
|
||||
than block values, ptr values in this area are overwritten
|
||||
with MTF values only when they are no longer needed.
|
||||
|
||||
The final compressed bitstream is generated into the
|
||||
area starting at
|
||||
(UChar*) (&((UChar*)s->arr2)[s->nblock])
|
||||
|
||||
These storage aliases are set up in bzCompressInit(),
|
||||
except for the last one, which is arranged in
|
||||
compressBlock().
|
||||
*/
|
||||
UInt32* ptr = s->ptr;
|
||||
UChar* block = s->block;
|
||||
UInt16* mtfv = s->mtfv;
|
||||
|
||||
makeMaps_e ( s );
|
||||
EOB = s->nInUse+1;
|
||||
|
||||
for (i = 0; i <= EOB; i++) s->mtfFreq[i] = 0;
|
||||
|
||||
wr = 0;
|
||||
zPend = 0;
|
||||
for (i = 0; i < s->nInUse; i++) yy[i] = (UChar) i;
|
||||
|
||||
for (i = 0; i < s->nblock; i++) {
|
||||
UChar ll_i;
|
||||
AssertD ( wr <= i, "generateMTFValues(1)" );
|
||||
j = ptr[i]-1; if (j < 0) j += s->nblock;
|
||||
ll_i = s->unseqToSeq[block[j]];
|
||||
AssertD ( ll_i < s->nInUse, "generateMTFValues(2a)" );
|
||||
|
||||
if (yy[0] == ll_i) {
|
||||
zPend++;
|
||||
} else {
|
||||
|
||||
if (zPend > 0) {
|
||||
zPend--;
|
||||
while (True) {
|
||||
if (zPend & 1) {
|
||||
mtfv[wr] = BZ_RUNB; wr++;
|
||||
s->mtfFreq[BZ_RUNB]++;
|
||||
} else {
|
||||
mtfv[wr] = BZ_RUNA; wr++;
|
||||
s->mtfFreq[BZ_RUNA]++;
|
||||
}
|
||||
if (zPend < 2) break;
|
||||
zPend = (zPend - 2) / 2;
|
||||
};
|
||||
zPend = 0;
|
||||
}
|
||||
{
|
||||
register UChar rtmp;
|
||||
register UChar* ryy_j;
|
||||
register UChar rll_i;
|
||||
rtmp = yy[1];
|
||||
yy[1] = yy[0];
|
||||
ryy_j = &(yy[1]);
|
||||
rll_i = ll_i;
|
||||
while ( rll_i != rtmp ) {
|
||||
register UChar rtmp2;
|
||||
ryy_j++;
|
||||
rtmp2 = rtmp;
|
||||
rtmp = *ryy_j;
|
||||
*ryy_j = rtmp2;
|
||||
};
|
||||
yy[0] = rtmp;
|
||||
j = ryy_j - &(yy[0]);
|
||||
mtfv[wr] = j+1; wr++; s->mtfFreq[j+1]++;
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
if (zPend > 0) {
|
||||
zPend--;
|
||||
while (True) {
|
||||
if (zPend & 1) {
|
||||
mtfv[wr] = BZ_RUNB; wr++;
|
||||
s->mtfFreq[BZ_RUNB]++;
|
||||
} else {
|
||||
mtfv[wr] = BZ_RUNA; wr++;
|
||||
s->mtfFreq[BZ_RUNA]++;
|
||||
}
|
||||
if (zPend < 2) break;
|
||||
zPend = (zPend - 2) / 2;
|
||||
};
|
||||
zPend = 0;
|
||||
}
|
||||
|
||||
mtfv[wr] = EOB; wr++; s->mtfFreq[EOB]++;
|
||||
|
||||
s->nMTF = wr;
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
#define BZ_LESSER_ICOST 0
|
||||
#define BZ_GREATER_ICOST 15
|
||||
|
||||
static
|
||||
void sendMTFValues ( EState* s )
|
||||
{
|
||||
Int32 v, t, i, j, gs, ge, totc, bt, bc, iter;
|
||||
Int32 nSelectors, alphaSize, minLen, maxLen, selCtr;
|
||||
Int32 nGroups, nBytes;
|
||||
|
||||
/*--
|
||||
UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
is a global since the decoder also needs it.
|
||||
|
||||
Int32 code[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
Int32 rfreq[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
|
||||
are also globals only used in this proc.
|
||||
Made global to keep stack frame size small.
|
||||
--*/
|
||||
|
||||
|
||||
UInt16 cost[BZ_N_GROUPS];
|
||||
Int32 fave[BZ_N_GROUPS];
|
||||
|
||||
UInt16* mtfv = s->mtfv;
|
||||
|
||||
if (s->verbosity >= 3)
|
||||
VPrintf3( " %d in block, %d after MTF & 1-2 coding, "
|
||||
"%d+2 syms in use\n",
|
||||
s->nblock, s->nMTF, s->nInUse );
|
||||
|
||||
alphaSize = s->nInUse+2;
|
||||
for (t = 0; t < BZ_N_GROUPS; t++)
|
||||
for (v = 0; v < alphaSize; v++)
|
||||
s->len[t][v] = BZ_GREATER_ICOST;
|
||||
|
||||
/*--- Decide how many coding tables to use ---*/
|
||||
AssertH ( s->nMTF > 0, 3001 );
|
||||
if (s->nMTF < 200) nGroups = 2; else
|
||||
if (s->nMTF < 600) nGroups = 3; else
|
||||
if (s->nMTF < 1200) nGroups = 4; else
|
||||
if (s->nMTF < 2400) nGroups = 5; else
|
||||
nGroups = 6;
|
||||
|
||||
/*--- Generate an initial set of coding tables ---*/
|
||||
{
|
||||
Int32 nPart, remF, tFreq, aFreq;
|
||||
|
||||
nPart = nGroups;
|
||||
remF = s->nMTF;
|
||||
gs = 0;
|
||||
while (nPart > 0) {
|
||||
tFreq = remF / nPart;
|
||||
ge = gs-1;
|
||||
aFreq = 0;
|
||||
while (aFreq < tFreq && ge < alphaSize-1) {
|
||||
ge++;
|
||||
aFreq += s->mtfFreq[ge];
|
||||
}
|
||||
|
||||
if (ge > gs
|
||||
&& nPart != nGroups && nPart != 1
|
||||
&& ((nGroups-nPart) % 2 == 1)) {
|
||||
aFreq -= s->mtfFreq[ge];
|
||||
ge--;
|
||||
}
|
||||
|
||||
if (s->verbosity >= 3)
|
||||
VPrintf5( " initial group %d, [%d .. %d], "
|
||||
"has %d syms (%4.1f%%)\n",
|
||||
nPart, gs, ge, aFreq,
|
||||
(100.0 * (float)aFreq) / (float)(s->nMTF) );
|
||||
|
||||
for (v = 0; v < alphaSize; v++)
|
||||
if (v >= gs && v <= ge)
|
||||
s->len[nPart-1][v] = BZ_LESSER_ICOST; else
|
||||
s->len[nPart-1][v] = BZ_GREATER_ICOST;
|
||||
|
||||
nPart--;
|
||||
gs = ge+1;
|
||||
remF -= aFreq;
|
||||
}
|
||||
}
|
||||
|
||||
/*---
|
||||
Iterate up to BZ_N_ITERS times to improve the tables.
|
||||
---*/
|
||||
for (iter = 0; iter < BZ_N_ITERS; iter++) {
|
||||
|
||||
for (t = 0; t < nGroups; t++) fave[t] = 0;
|
||||
|
||||
for (t = 0; t < nGroups; t++)
|
||||
for (v = 0; v < alphaSize; v++)
|
||||
s->rfreq[t][v] = 0;
|
||||
|
||||
/*---
|
||||
Set up an auxiliary length table which is used to fast-track
|
||||
the common case (nGroups == 6).
|
||||
---*/
|
||||
if (nGroups == 6) {
|
||||
for (v = 0; v < alphaSize; v++) {
|
||||
s->len_pack[v][0] = (s->len[1][v] << 16) | s->len[0][v];
|
||||
s->len_pack[v][1] = (s->len[3][v] << 16) | s->len[2][v];
|
||||
s->len_pack[v][2] = (s->len[5][v] << 16) | s->len[4][v];
|
||||
}
|
||||
}
|
||||
|
||||
nSelectors = 0;
|
||||
totc = 0;
|
||||
gs = 0;
|
||||
while (True) {
|
||||
|
||||
/*--- Set group start & end marks. --*/
|
||||
if (gs >= s->nMTF) break;
|
||||
ge = gs + BZ_G_SIZE - 1;
|
||||
if (ge >= s->nMTF) ge = s->nMTF-1;
|
||||
|
||||
/*--
|
||||
Calculate the cost of this group as coded
|
||||
by each of the coding tables.
|
||||
--*/
|
||||
for (t = 0; t < nGroups; t++) cost[t] = 0;
|
||||
|
||||
if (nGroups == 6 && 50 == ge-gs+1) {
|
||||
/*--- fast track the common case ---*/
|
||||
register UInt32 cost01, cost23, cost45;
|
||||
register UInt16 icv;
|
||||
cost01 = cost23 = cost45 = 0;
|
||||
|
||||
# define BZ_ITER(nn) \
|
||||
icv = mtfv[gs+(nn)]; \
|
||||
cost01 += s->len_pack[icv][0]; \
|
||||
cost23 += s->len_pack[icv][1]; \
|
||||
cost45 += s->len_pack[icv][2]; \
|
||||
|
||||
BZ_ITER(0); BZ_ITER(1); BZ_ITER(2); BZ_ITER(3); BZ_ITER(4);
|
||||
BZ_ITER(5); BZ_ITER(6); BZ_ITER(7); BZ_ITER(8); BZ_ITER(9);
|
||||
BZ_ITER(10); BZ_ITER(11); BZ_ITER(12); BZ_ITER(13); BZ_ITER(14);
|
||||
BZ_ITER(15); BZ_ITER(16); BZ_ITER(17); BZ_ITER(18); BZ_ITER(19);
|
||||
BZ_ITER(20); BZ_ITER(21); BZ_ITER(22); BZ_ITER(23); BZ_ITER(24);
|
||||
BZ_ITER(25); BZ_ITER(26); BZ_ITER(27); BZ_ITER(28); BZ_ITER(29);
|
||||
BZ_ITER(30); BZ_ITER(31); BZ_ITER(32); BZ_ITER(33); BZ_ITER(34);
|
||||
BZ_ITER(35); BZ_ITER(36); BZ_ITER(37); BZ_ITER(38); BZ_ITER(39);
|
||||
BZ_ITER(40); BZ_ITER(41); BZ_ITER(42); BZ_ITER(43); BZ_ITER(44);
|
||||
BZ_ITER(45); BZ_ITER(46); BZ_ITER(47); BZ_ITER(48); BZ_ITER(49);
|
||||
|
||||
# undef BZ_ITER
|
||||
|
||||
cost[0] = cost01 & 0xffff; cost[1] = cost01 >> 16;
|
||||
cost[2] = cost23 & 0xffff; cost[3] = cost23 >> 16;
|
||||
cost[4] = cost45 & 0xffff; cost[5] = cost45 >> 16;
|
||||
|
||||
} else {
|
||||
/*--- slow version which correctly handles all situations ---*/
|
||||
for (i = gs; i <= ge; i++) {
|
||||
UInt16 icv = mtfv[i];
|
||||
for (t = 0; t < nGroups; t++) cost[t] += s->len[t][icv];
|
||||
}
|
||||
}
|
||||
|
||||
/*--
|
||||
Find the coding table which is best for this group,
|
||||
and record its identity in the selector table.
|
||||
--*/
|
||||
bc = 999999999; bt = -1;
|
||||
for (t = 0; t < nGroups; t++)
|
||||
if (cost[t] < bc) { bc = cost[t]; bt = t; };
|
||||
totc += bc;
|
||||
fave[bt]++;
|
||||
s->selector[nSelectors] = bt;
|
||||
nSelectors++;
|
||||
|
||||
/*--
|
||||
Increment the symbol frequencies for the selected table.
|
||||
--*/
|
||||
if (nGroups == 6 && 50 == ge-gs+1) {
|
||||
/*--- fast track the common case ---*/
|
||||
|
||||
# define BZ_ITUR(nn) s->rfreq[bt][ mtfv[gs+(nn)] ]++
|
||||
|
||||
BZ_ITUR(0); BZ_ITUR(1); BZ_ITUR(2); BZ_ITUR(3); BZ_ITUR(4);
|
||||
BZ_ITUR(5); BZ_ITUR(6); BZ_ITUR(7); BZ_ITUR(8); BZ_ITUR(9);
|
||||
BZ_ITUR(10); BZ_ITUR(11); BZ_ITUR(12); BZ_ITUR(13); BZ_ITUR(14);
|
||||
BZ_ITUR(15); BZ_ITUR(16); BZ_ITUR(17); BZ_ITUR(18); BZ_ITUR(19);
|
||||
BZ_ITUR(20); BZ_ITUR(21); BZ_ITUR(22); BZ_ITUR(23); BZ_ITUR(24);
|
||||
BZ_ITUR(25); BZ_ITUR(26); BZ_ITUR(27); BZ_ITUR(28); BZ_ITUR(29);
|
||||
BZ_ITUR(30); BZ_ITUR(31); BZ_ITUR(32); BZ_ITUR(33); BZ_ITUR(34);
|
||||
BZ_ITUR(35); BZ_ITUR(36); BZ_ITUR(37); BZ_ITUR(38); BZ_ITUR(39);
|
||||
BZ_ITUR(40); BZ_ITUR(41); BZ_ITUR(42); BZ_ITUR(43); BZ_ITUR(44);
|
||||
BZ_ITUR(45); BZ_ITUR(46); BZ_ITUR(47); BZ_ITUR(48); BZ_ITUR(49);
|
||||
|
||||
# undef BZ_ITUR
|
||||
|
||||
} else {
|
||||
/*--- slow version which correctly handles all situations ---*/
|
||||
for (i = gs; i <= ge; i++)
|
||||
s->rfreq[bt][ mtfv[i] ]++;
|
||||
}
|
||||
|
||||
gs = ge+1;
|
||||
}
|
||||
if (s->verbosity >= 3) {
|
||||
VPrintf2 ( " pass %d: size is %d, grp uses are ",
|
||||
iter+1, totc/8 );
|
||||
for (t = 0; t < nGroups; t++)
|
||||
VPrintf1 ( "%d ", fave[t] );
|
||||
VPrintf0 ( "\n" );
|
||||
}
|
||||
|
||||
/*--
|
||||
Recompute the tables based on the accumulated frequencies.
|
||||
--*/
|
||||
for (t = 0; t < nGroups; t++)
|
||||
BZ2_hbMakeCodeLengths ( &(s->len[t][0]), &(s->rfreq[t][0]),
|
||||
alphaSize, 20 );
|
||||
}
|
||||
|
||||
|
||||
AssertH( nGroups < 8, 3002 );
|
||||
AssertH( nSelectors < 32768 &&
|
||||
nSelectors <= (2 + (900000 / BZ_G_SIZE)),
|
||||
3003 );
|
||||
|
||||
|
||||
/*--- Compute MTF values for the selectors. ---*/
|
||||
{
|
||||
UChar pos[BZ_N_GROUPS], ll_i, tmp2, tmp;
|
||||
for (i = 0; i < nGroups; i++) pos[i] = i;
|
||||
for (i = 0; i < nSelectors; i++) {
|
||||
ll_i = s->selector[i];
|
||||
j = 0;
|
||||
tmp = pos[j];
|
||||
while ( ll_i != tmp ) {
|
||||
j++;
|
||||
tmp2 = tmp;
|
||||
tmp = pos[j];
|
||||
pos[j] = tmp2;
|
||||
};
|
||||
pos[0] = tmp;
|
||||
s->selectorMtf[i] = j;
|
||||
}
|
||||
};
|
||||
|
||||
/*--- Assign actual codes for the tables. --*/
|
||||
for (t = 0; t < nGroups; t++) {
|
||||
minLen = 32;
|
||||
maxLen = 0;
|
||||
for (i = 0; i < alphaSize; i++) {
|
||||
if (s->len[t][i] > maxLen) maxLen = s->len[t][i];
|
||||
if (s->len[t][i] < minLen) minLen = s->len[t][i];
|
||||
}
|
||||
AssertH ( !(maxLen > 20), 3004 );
|
||||
AssertH ( !(minLen < 1), 3005 );
|
||||
BZ2_hbAssignCodes ( &(s->code[t][0]), &(s->len[t][0]),
|
||||
minLen, maxLen, alphaSize );
|
||||
}
|
||||
|
||||
/*--- Transmit the mapping table. ---*/
|
||||
{
|
||||
Bool inUse16[16];
|
||||
for (i = 0; i < 16; i++) {
|
||||
inUse16[i] = False;
|
||||
for (j = 0; j < 16; j++)
|
||||
if (s->inUse[i * 16 + j]) inUse16[i] = True;
|
||||
}
|
||||
|
||||
nBytes = s->numZ;
|
||||
for (i = 0; i < 16; i++)
|
||||
if (inUse16[i]) bsW(s,1,1); else bsW(s,1,0);
|
||||
|
||||
for (i = 0; i < 16; i++)
|
||||
if (inUse16[i])
|
||||
for (j = 0; j < 16; j++) {
|
||||
if (s->inUse[i * 16 + j]) bsW(s,1,1); else bsW(s,1,0);
|
||||
}
|
||||
|
||||
if (s->verbosity >= 3)
|
||||
VPrintf1( " bytes: mapping %d, ", s->numZ-nBytes );
|
||||
}
|
||||
|
||||
/*--- Now the selectors. ---*/
|
||||
nBytes = s->numZ;
|
||||
bsW ( s, 3, nGroups );
|
||||
bsW ( s, 15, nSelectors );
|
||||
for (i = 0; i < nSelectors; i++) {
|
||||
for (j = 0; j < s->selectorMtf[i]; j++) bsW(s,1,1);
|
||||
bsW(s,1,0);
|
||||
}
|
||||
if (s->verbosity >= 3)
|
||||
VPrintf1( "selectors %d, ", s->numZ-nBytes );
|
||||
|
||||
/*--- Now the coding tables. ---*/
|
||||
nBytes = s->numZ;
|
||||
|
||||
for (t = 0; t < nGroups; t++) {
|
||||
Int32 curr = s->len[t][0];
|
||||
bsW ( s, 5, curr );
|
||||
for (i = 0; i < alphaSize; i++) {
|
||||
while (curr < s->len[t][i]) { bsW(s,2,2); curr++; /* 10 */ };
|
||||
while (curr > s->len[t][i]) { bsW(s,2,3); curr--; /* 11 */ };
|
||||
bsW ( s, 1, 0 );
|
||||
}
|
||||
}
|
||||
|
||||
if (s->verbosity >= 3)
|
||||
VPrintf1 ( "code lengths %d, ", s->numZ-nBytes );
|
||||
|
||||
/*--- And finally, the block data proper ---*/
|
||||
nBytes = s->numZ;
|
||||
selCtr = 0;
|
||||
gs = 0;
|
||||
while (True) {
|
||||
if (gs >= s->nMTF) break;
|
||||
ge = gs + BZ_G_SIZE - 1;
|
||||
if (ge >= s->nMTF) ge = s->nMTF-1;
|
||||
AssertH ( s->selector[selCtr] < nGroups, 3006 );
|
||||
|
||||
if (nGroups == 6 && 50 == ge-gs+1) {
|
||||
/*--- fast track the common case ---*/
|
||||
UInt16 mtfv_i;
|
||||
UChar* s_len_sel_selCtr
|
||||
= &(s->len[s->selector[selCtr]][0]);
|
||||
Int32* s_code_sel_selCtr
|
||||
= &(s->code[s->selector[selCtr]][0]);
|
||||
|
||||
# define BZ_ITAH(nn) \
|
||||
mtfv_i = mtfv[gs+(nn)]; \
|
||||
bsW ( s, \
|
||||
s_len_sel_selCtr[mtfv_i], \
|
||||
s_code_sel_selCtr[mtfv_i] )
|
||||
|
||||
BZ_ITAH(0); BZ_ITAH(1); BZ_ITAH(2); BZ_ITAH(3); BZ_ITAH(4);
|
||||
BZ_ITAH(5); BZ_ITAH(6); BZ_ITAH(7); BZ_ITAH(8); BZ_ITAH(9);
|
||||
BZ_ITAH(10); BZ_ITAH(11); BZ_ITAH(12); BZ_ITAH(13); BZ_ITAH(14);
|
||||
BZ_ITAH(15); BZ_ITAH(16); BZ_ITAH(17); BZ_ITAH(18); BZ_ITAH(19);
|
||||
BZ_ITAH(20); BZ_ITAH(21); BZ_ITAH(22); BZ_ITAH(23); BZ_ITAH(24);
|
||||
BZ_ITAH(25); BZ_ITAH(26); BZ_ITAH(27); BZ_ITAH(28); BZ_ITAH(29);
|
||||
BZ_ITAH(30); BZ_ITAH(31); BZ_ITAH(32); BZ_ITAH(33); BZ_ITAH(34);
|
||||
BZ_ITAH(35); BZ_ITAH(36); BZ_ITAH(37); BZ_ITAH(38); BZ_ITAH(39);
|
||||
BZ_ITAH(40); BZ_ITAH(41); BZ_ITAH(42); BZ_ITAH(43); BZ_ITAH(44);
|
||||
BZ_ITAH(45); BZ_ITAH(46); BZ_ITAH(47); BZ_ITAH(48); BZ_ITAH(49);
|
||||
|
||||
# undef BZ_ITAH
|
||||
|
||||
} else {
|
||||
/*--- slow version which correctly handles all situations ---*/
|
||||
for (i = gs; i <= ge; i++) {
|
||||
bsW ( s,
|
||||
s->len [s->selector[selCtr]] [mtfv[i]],
|
||||
s->code [s->selector[selCtr]] [mtfv[i]] );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
gs = ge+1;
|
||||
selCtr++;
|
||||
}
|
||||
AssertH( selCtr == nSelectors, 3007 );
|
||||
|
||||
if (s->verbosity >= 3)
|
||||
VPrintf1( "codes %d\n", s->numZ-nBytes );
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
void BZ2_compressBlock ( EState* s, Bool is_last_block )
|
||||
{
|
||||
if (s->nblock > 0) {
|
||||
|
||||
BZ_FINALISE_CRC ( s->blockCRC );
|
||||
s->combinedCRC = (s->combinedCRC << 1) | (s->combinedCRC >> 31);
|
||||
s->combinedCRC ^= s->blockCRC;
|
||||
if (s->blockNo > 1) s->numZ = 0;
|
||||
|
||||
if (s->verbosity >= 2)
|
||||
VPrintf4( " block %d: crc = 0x%8x, "
|
||||
"combined CRC = 0x%8x, size = %d\n",
|
||||
s->blockNo, s->blockCRC, s->combinedCRC, s->nblock );
|
||||
|
||||
BZ2_blockSort ( s );
|
||||
}
|
||||
|
||||
s->zbits = (UChar*) (&((UChar*)s->arr2)[s->nblock]);
|
||||
|
||||
/*-- If this is the first block, create the stream header. --*/
|
||||
if (s->blockNo == 1) {
|
||||
BZ2_bsInitWrite ( s );
|
||||
bsPutUChar ( s, 'B' );
|
||||
bsPutUChar ( s, 'Z' );
|
||||
bsPutUChar ( s, 'h' );
|
||||
bsPutUChar ( s, (UChar)('0' + s->blockSize100k) );
|
||||
}
|
||||
|
||||
if (s->nblock > 0) {
|
||||
|
||||
bsPutUChar ( s, 0x31 ); bsPutUChar ( s, 0x41 );
|
||||
bsPutUChar ( s, 0x59 ); bsPutUChar ( s, 0x26 );
|
||||
bsPutUChar ( s, 0x53 ); bsPutUChar ( s, 0x59 );
|
||||
|
||||
/*-- Now the block's CRC, so it is in a known place. --*/
|
||||
bsPutUInt32 ( s, s->blockCRC );
|
||||
|
||||
/*--
|
||||
Now a single bit indicating (non-)randomisation.
|
||||
As of version 0.9.5, we use a better sorting algorithm
|
||||
which makes randomisation unnecessary. So always set
|
||||
the randomised bit to 'no'. Of course, the decoder
|
||||
still needs to be able to handle randomised blocks
|
||||
so as to maintain backwards compatibility with
|
||||
older versions of bzip2.
|
||||
--*/
|
||||
bsW(s,1,0);
|
||||
|
||||
bsW ( s, 24, s->origPtr );
|
||||
generateMTFValues ( s );
|
||||
sendMTFValues ( s );
|
||||
}
|
||||
|
||||
|
||||
/*-- If this is the last block, add the stream trailer. --*/
|
||||
if (is_last_block) {
|
||||
|
||||
bsPutUChar ( s, 0x17 ); bsPutUChar ( s, 0x72 );
|
||||
bsPutUChar ( s, 0x45 ); bsPutUChar ( s, 0x38 );
|
||||
bsPutUChar ( s, 0x50 ); bsPutUChar ( s, 0x90 );
|
||||
bsPutUInt32 ( s, s->combinedCRC );
|
||||
if (s->verbosity >= 2)
|
||||
VPrintf1( " final combined CRC = 0x%x\n ", s->combinedCRC );
|
||||
bsFinishWrite ( s );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- end compress.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
File diff suppressed because it is too large
Load Diff
|
@ -1,65 +0,0 @@
|
|||
dnl Process this file with autoconf to produce a configure script.
|
||||
|
||||
AC_PREREQ(2.13)
|
||||
|
||||
AC_INIT(bzlib.h)
|
||||
AC_CONFIG_AUX_DIR(`cd $srcdir/../..;pwd`)
|
||||
AM_INIT_AUTOMAKE(libbz2, 0.0)
|
||||
AM_MAINTAINER_MODE
|
||||
|
||||
# FIXME: We temporarily define our own version of AC_PROG_CC. This is
|
||||
# copied from autoconf 2.12, but does not call AC_PROG_CC_WORKS. We
|
||||
# are probably using a cross compiler, which will not be able to fully
|
||||
# link an executable. This should really be fixed in autoconf
|
||||
# itself.
|
||||
|
||||
AC_DEFUN(LIB_AC_PROG_CC,
|
||||
[AC_BEFORE([$0], [AC_PROG_CPP])dnl
|
||||
AC_CHECK_TOOL(CC, gcc, gcc)
|
||||
if test -z "$CC"; then
|
||||
AC_CHECK_PROG(CC, cc, cc, , , /usr/ucb/cc)
|
||||
test -z "$CC" && AC_MSG_ERROR([no acceptable cc found in \$PATH])
|
||||
fi
|
||||
|
||||
AC_PROG_CC_GNU
|
||||
|
||||
if test $ac_cv_prog_gcc = yes; then
|
||||
GCC=yes
|
||||
dnl Check whether -g works, even if CFLAGS is set, in case the package
|
||||
dnl plays around with CFLAGS (such as to build both debugging and
|
||||
dnl normal versions of a library), tasteless as that idea is.
|
||||
ac_test_CFLAGS="${CFLAGS+set}"
|
||||
ac_save_CFLAGS="$CFLAGS"
|
||||
CFLAGS=
|
||||
AC_PROG_CC_G
|
||||
if test "$ac_test_CFLAGS" = set; then
|
||||
CFLAGS="$ac_save_CFLAGS"
|
||||
elif test $ac_cv_prog_cc_g = yes; then
|
||||
CFLAGS="-gstabs+ -O2"
|
||||
else
|
||||
CFLAGS="-O2"
|
||||
fi
|
||||
if test "$ac_test_CXXFLAGS" != set; then
|
||||
CXXFLAGS='$(CFLAGS)'
|
||||
fi
|
||||
else
|
||||
GCC=
|
||||
test "${CFLAGS+set}" = set || CFLAGS="-g"
|
||||
fi
|
||||
])
|
||||
|
||||
CPPFLAGS=-U_WIN32
|
||||
AC_CANONICAL_HOST
|
||||
AC_CHECK_TOOL(AR, ar, ar)
|
||||
AC_CHECK_TOOL(NM, nm, nm)
|
||||
AC_CHECK_TOOL(RANLIB, ranlib, ranlib)
|
||||
LIB_AC_PROG_CC
|
||||
AC_PROG_INSTALL
|
||||
AC_STDC_HEADERS
|
||||
AC_EXEEXT
|
||||
AC_OBJEXT
|
||||
|
||||
AC_SUBST(installdata)dnl
|
||||
AC_SUBST(uninstalldata)dnl
|
||||
|
||||
AC_OUTPUT([Makefile])
|
|
@ -1,144 +0,0 @@
|
|||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- Table for doing CRCs ---*/
|
||||
/*--- crctable.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
||||
This program is based on (at least) the work of:
|
||||
Mike Burrows
|
||||
David Wheeler
|
||||
Peter Fenwick
|
||||
Alistair Moffat
|
||||
Radford Neal
|
||||
Ian H. Witten
|
||||
Robert Sedgewick
|
||||
Jon L. Bentley
|
||||
|
||||
For more information on these sources, see the manual.
|
||||
--*/
|
||||
|
||||
|
||||
#include "bzlib_private.h"
|
||||
|
||||
/*--
|
||||
I think this is an implementation of the AUTODIN-II,
|
||||
Ethernet & FDDI 32-bit CRC standard. Vaguely derived
|
||||
from code by Rob Warnock, in Section 51 of the
|
||||
comp.compression FAQ.
|
||||
--*/
|
||||
|
||||
UInt32 BZ2_crc32Table[256] = {
|
||||
|
||||
/*-- Ugly, innit? --*/
|
||||
|
||||
0x00000000L, 0x04c11db7L, 0x09823b6eL, 0x0d4326d9L,
|
||||
0x130476dcL, 0x17c56b6bL, 0x1a864db2L, 0x1e475005L,
|
||||
0x2608edb8L, 0x22c9f00fL, 0x2f8ad6d6L, 0x2b4bcb61L,
|
||||
0x350c9b64L, 0x31cd86d3L, 0x3c8ea00aL, 0x384fbdbdL,
|
||||
0x4c11db70L, 0x48d0c6c7L, 0x4593e01eL, 0x4152fda9L,
|
||||
0x5f15adacL, 0x5bd4b01bL, 0x569796c2L, 0x52568b75L,
|
||||
0x6a1936c8L, 0x6ed82b7fL, 0x639b0da6L, 0x675a1011L,
|
||||
0x791d4014L, 0x7ddc5da3L, 0x709f7b7aL, 0x745e66cdL,
|
||||
0x9823b6e0L, 0x9ce2ab57L, 0x91a18d8eL, 0x95609039L,
|
||||
0x8b27c03cL, 0x8fe6dd8bL, 0x82a5fb52L, 0x8664e6e5L,
|
||||
0xbe2b5b58L, 0xbaea46efL, 0xb7a96036L, 0xb3687d81L,
|
||||
0xad2f2d84L, 0xa9ee3033L, 0xa4ad16eaL, 0xa06c0b5dL,
|
||||
0xd4326d90L, 0xd0f37027L, 0xddb056feL, 0xd9714b49L,
|
||||
0xc7361b4cL, 0xc3f706fbL, 0xceb42022L, 0xca753d95L,
|
||||
0xf23a8028L, 0xf6fb9d9fL, 0xfbb8bb46L, 0xff79a6f1L,
|
||||
0xe13ef6f4L, 0xe5ffeb43L, 0xe8bccd9aL, 0xec7dd02dL,
|
||||
0x34867077L, 0x30476dc0L, 0x3d044b19L, 0x39c556aeL,
|
||||
0x278206abL, 0x23431b1cL, 0x2e003dc5L, 0x2ac12072L,
|
||||
0x128e9dcfL, 0x164f8078L, 0x1b0ca6a1L, 0x1fcdbb16L,
|
||||
0x018aeb13L, 0x054bf6a4L, 0x0808d07dL, 0x0cc9cdcaL,
|
||||
0x7897ab07L, 0x7c56b6b0L, 0x71159069L, 0x75d48ddeL,
|
||||
0x6b93dddbL, 0x6f52c06cL, 0x6211e6b5L, 0x66d0fb02L,
|
||||
0x5e9f46bfL, 0x5a5e5b08L, 0x571d7dd1L, 0x53dc6066L,
|
||||
0x4d9b3063L, 0x495a2dd4L, 0x44190b0dL, 0x40d816baL,
|
||||
0xaca5c697L, 0xa864db20L, 0xa527fdf9L, 0xa1e6e04eL,
|
||||
0xbfa1b04bL, 0xbb60adfcL, 0xb6238b25L, 0xb2e29692L,
|
||||
0x8aad2b2fL, 0x8e6c3698L, 0x832f1041L, 0x87ee0df6L,
|
||||
0x99a95df3L, 0x9d684044L, 0x902b669dL, 0x94ea7b2aL,
|
||||
0xe0b41de7L, 0xe4750050L, 0xe9362689L, 0xedf73b3eL,
|
||||
0xf3b06b3bL, 0xf771768cL, 0xfa325055L, 0xfef34de2L,
|
||||
0xc6bcf05fL, 0xc27dede8L, 0xcf3ecb31L, 0xcbffd686L,
|
||||
0xd5b88683L, 0xd1799b34L, 0xdc3abdedL, 0xd8fba05aL,
|
||||
0x690ce0eeL, 0x6dcdfd59L, 0x608edb80L, 0x644fc637L,
|
||||
0x7a089632L, 0x7ec98b85L, 0x738aad5cL, 0x774bb0ebL,
|
||||
0x4f040d56L, 0x4bc510e1L, 0x46863638L, 0x42472b8fL,
|
||||
0x5c007b8aL, 0x58c1663dL, 0x558240e4L, 0x51435d53L,
|
||||
0x251d3b9eL, 0x21dc2629L, 0x2c9f00f0L, 0x285e1d47L,
|
||||
0x36194d42L, 0x32d850f5L, 0x3f9b762cL, 0x3b5a6b9bL,
|
||||
0x0315d626L, 0x07d4cb91L, 0x0a97ed48L, 0x0e56f0ffL,
|
||||
0x1011a0faL, 0x14d0bd4dL, 0x19939b94L, 0x1d528623L,
|
||||
0xf12f560eL, 0xf5ee4bb9L, 0xf8ad6d60L, 0xfc6c70d7L,
|
||||
0xe22b20d2L, 0xe6ea3d65L, 0xeba91bbcL, 0xef68060bL,
|
||||
0xd727bbb6L, 0xd3e6a601L, 0xdea580d8L, 0xda649d6fL,
|
||||
0xc423cd6aL, 0xc0e2d0ddL, 0xcda1f604L, 0xc960ebb3L,
|
||||
0xbd3e8d7eL, 0xb9ff90c9L, 0xb4bcb610L, 0xb07daba7L,
|
||||
0xae3afba2L, 0xaafbe615L, 0xa7b8c0ccL, 0xa379dd7bL,
|
||||
0x9b3660c6L, 0x9ff77d71L, 0x92b45ba8L, 0x9675461fL,
|
||||
0x8832161aL, 0x8cf30badL, 0x81b02d74L, 0x857130c3L,
|
||||
0x5d8a9099L, 0x594b8d2eL, 0x5408abf7L, 0x50c9b640L,
|
||||
0x4e8ee645L, 0x4a4ffbf2L, 0x470cdd2bL, 0x43cdc09cL,
|
||||
0x7b827d21L, 0x7f436096L, 0x7200464fL, 0x76c15bf8L,
|
||||
0x68860bfdL, 0x6c47164aL, 0x61043093L, 0x65c52d24L,
|
||||
0x119b4be9L, 0x155a565eL, 0x18197087L, 0x1cd86d30L,
|
||||
0x029f3d35L, 0x065e2082L, 0x0b1d065bL, 0x0fdc1becL,
|
||||
0x3793a651L, 0x3352bbe6L, 0x3e119d3fL, 0x3ad08088L,
|
||||
0x2497d08dL, 0x2056cd3aL, 0x2d15ebe3L, 0x29d4f654L,
|
||||
0xc5a92679L, 0xc1683bceL, 0xcc2b1d17L, 0xc8ea00a0L,
|
||||
0xd6ad50a5L, 0xd26c4d12L, 0xdf2f6bcbL, 0xdbee767cL,
|
||||
0xe3a1cbc1L, 0xe760d676L, 0xea23f0afL, 0xeee2ed18L,
|
||||
0xf0a5bd1dL, 0xf464a0aaL, 0xf9278673L, 0xfde69bc4L,
|
||||
0x89b8fd09L, 0x8d79e0beL, 0x803ac667L, 0x84fbdbd0L,
|
||||
0x9abc8bd5L, 0x9e7d9662L, 0x933eb0bbL, 0x97ffad0cL,
|
||||
0xafb010b1L, 0xab710d06L, 0xa6322bdfL, 0xa2f33668L,
|
||||
0xbcb4666dL, 0xb8757bdaL, 0xb5365d03L, 0xb1f740b4L
|
||||
};
|
||||
|
||||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- end crctable.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
|
@ -1,660 +0,0 @@
|
|||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- Decompression machinery ---*/
|
||||
/*--- decompress.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
||||
This program is based on (at least) the work of:
|
||||
Mike Burrows
|
||||
David Wheeler
|
||||
Peter Fenwick
|
||||
Alistair Moffat
|
||||
Radford Neal
|
||||
Ian H. Witten
|
||||
Robert Sedgewick
|
||||
Jon L. Bentley
|
||||
|
||||
For more information on these sources, see the manual.
|
||||
--*/
|
||||
|
||||
|
||||
#include "bzlib_private.h"
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
static
|
||||
void makeMaps_d ( DState* s )
|
||||
{
|
||||
Int32 i;
|
||||
s->nInUse = 0;
|
||||
for (i = 0; i < 256; i++)
|
||||
if (s->inUse[i]) {
|
||||
s->seqToUnseq[s->nInUse] = i;
|
||||
s->nInUse++;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
#define RETURN(rrr) \
|
||||
{ retVal = rrr; goto save_state_and_return; };
|
||||
|
||||
#define GET_BITS(lll,vvv,nnn) \
|
||||
case lll: s->state = lll; \
|
||||
while (True) { \
|
||||
if (s->bsLive >= nnn) { \
|
||||
UInt32 v; \
|
||||
v = (s->bsBuff >> \
|
||||
(s->bsLive-nnn)) & ((1 << nnn)-1); \
|
||||
s->bsLive -= nnn; \
|
||||
vvv = v; \
|
||||
break; \
|
||||
} \
|
||||
if (s->strm->avail_in == 0) RETURN(BZ_OK); \
|
||||
s->bsBuff \
|
||||
= (s->bsBuff << 8) | \
|
||||
((UInt32) \
|
||||
(*((UChar*)(s->strm->next_in)))); \
|
||||
s->bsLive += 8; \
|
||||
s->strm->next_in++; \
|
||||
s->strm->avail_in--; \
|
||||
s->strm->total_in_lo32++; \
|
||||
if (s->strm->total_in_lo32 == 0) \
|
||||
s->strm->total_in_hi32++; \
|
||||
}
|
||||
|
||||
#define GET_UCHAR(lll,uuu) \
|
||||
GET_BITS(lll,uuu,8)
|
||||
|
||||
#define GET_BIT(lll,uuu) \
|
||||
GET_BITS(lll,uuu,1)
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
#define GET_MTF_VAL(label1,label2,lval) \
|
||||
{ \
|
||||
if (groupPos == 0) { \
|
||||
groupNo++; \
|
||||
if (groupNo >= nSelectors) \
|
||||
RETURN(BZ_DATA_ERROR); \
|
||||
groupPos = BZ_G_SIZE; \
|
||||
gSel = s->selector[groupNo]; \
|
||||
gMinlen = s->minLens[gSel]; \
|
||||
gLimit = &(s->limit[gSel][0]); \
|
||||
gPerm = &(s->perm[gSel][0]); \
|
||||
gBase = &(s->base[gSel][0]); \
|
||||
} \
|
||||
groupPos--; \
|
||||
zn = gMinlen; \
|
||||
GET_BITS(label1, zvec, zn); \
|
||||
while (1) { \
|
||||
if (zn > 20 /* the longest code */) \
|
||||
RETURN(BZ_DATA_ERROR); \
|
||||
if (zvec <= gLimit[zn]) break; \
|
||||
zn++; \
|
||||
GET_BIT(label2, zj); \
|
||||
zvec = (zvec << 1) | zj; \
|
||||
}; \
|
||||
if (zvec - gBase[zn] < 0 \
|
||||
|| zvec - gBase[zn] >= BZ_MAX_ALPHA_SIZE) \
|
||||
RETURN(BZ_DATA_ERROR); \
|
||||
lval = gPerm[zvec - gBase[zn]]; \
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
Int32 BZ2_decompress ( DState* s )
|
||||
{
|
||||
UChar uc;
|
||||
Int32 retVal;
|
||||
Int32 minLen, maxLen;
|
||||
bz_stream* strm = s->strm;
|
||||
|
||||
/* stuff that needs to be saved/restored */
|
||||
Int32 i;
|
||||
Int32 j;
|
||||
Int32 t;
|
||||
Int32 alphaSize;
|
||||
Int32 nGroups;
|
||||
Int32 nSelectors;
|
||||
Int32 EOB;
|
||||
Int32 groupNo;
|
||||
Int32 groupPos;
|
||||
Int32 nextSym;
|
||||
Int32 nblockMAX;
|
||||
Int32 nblock;
|
||||
Int32 es;
|
||||
Int32 N;
|
||||
Int32 curr;
|
||||
Int32 zt;
|
||||
Int32 zn;
|
||||
Int32 zvec;
|
||||
Int32 zj;
|
||||
Int32 gSel;
|
||||
Int32 gMinlen;
|
||||
Int32* gLimit;
|
||||
Int32* gBase;
|
||||
Int32* gPerm;
|
||||
|
||||
if (s->state == BZ_X_MAGIC_1) {
|
||||
/*initialise the save area*/
|
||||
s->save_i = 0;
|
||||
s->save_j = 0;
|
||||
s->save_t = 0;
|
||||
s->save_alphaSize = 0;
|
||||
s->save_nGroups = 0;
|
||||
s->save_nSelectors = 0;
|
||||
s->save_EOB = 0;
|
||||
s->save_groupNo = 0;
|
||||
s->save_groupPos = 0;
|
||||
s->save_nextSym = 0;
|
||||
s->save_nblockMAX = 0;
|
||||
s->save_nblock = 0;
|
||||
s->save_es = 0;
|
||||
s->save_N = 0;
|
||||
s->save_curr = 0;
|
||||
s->save_zt = 0;
|
||||
s->save_zn = 0;
|
||||
s->save_zvec = 0;
|
||||
s->save_zj = 0;
|
||||
s->save_gSel = 0;
|
||||
s->save_gMinlen = 0;
|
||||
s->save_gLimit = NULL;
|
||||
s->save_gBase = NULL;
|
||||
s->save_gPerm = NULL;
|
||||
}
|
||||
|
||||
/*restore from the save area*/
|
||||
i = s->save_i;
|
||||
j = s->save_j;
|
||||
t = s->save_t;
|
||||
alphaSize = s->save_alphaSize;
|
||||
nGroups = s->save_nGroups;
|
||||
nSelectors = s->save_nSelectors;
|
||||
EOB = s->save_EOB;
|
||||
groupNo = s->save_groupNo;
|
||||
groupPos = s->save_groupPos;
|
||||
nextSym = s->save_nextSym;
|
||||
nblockMAX = s->save_nblockMAX;
|
||||
nblock = s->save_nblock;
|
||||
es = s->save_es;
|
||||
N = s->save_N;
|
||||
curr = s->save_curr;
|
||||
zt = s->save_zt;
|
||||
zn = s->save_zn;
|
||||
zvec = s->save_zvec;
|
||||
zj = s->save_zj;
|
||||
gSel = s->save_gSel;
|
||||
gMinlen = s->save_gMinlen;
|
||||
gLimit = s->save_gLimit;
|
||||
gBase = s->save_gBase;
|
||||
gPerm = s->save_gPerm;
|
||||
|
||||
retVal = BZ_OK;
|
||||
|
||||
switch (s->state) {
|
||||
|
||||
GET_UCHAR(BZ_X_MAGIC_1, uc);
|
||||
if (uc != 'B') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
|
||||
GET_UCHAR(BZ_X_MAGIC_2, uc);
|
||||
if (uc != 'Z') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
|
||||
GET_UCHAR(BZ_X_MAGIC_3, uc)
|
||||
if (uc != 'h') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
|
||||
GET_BITS(BZ_X_MAGIC_4, s->blockSize100k, 8)
|
||||
if (s->blockSize100k < '1' ||
|
||||
s->blockSize100k > '9') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
s->blockSize100k -= '0';
|
||||
|
||||
if (s->smallDecompress) {
|
||||
s->ll16 = BZALLOC( s->blockSize100k * 100000 * sizeof(UInt16) );
|
||||
s->ll4 = BZALLOC(
|
||||
((1 + s->blockSize100k * 100000) >> 1) * sizeof(UChar)
|
||||
);
|
||||
if (s->ll16 == NULL || s->ll4 == NULL) RETURN(BZ_MEM_ERROR);
|
||||
} else {
|
||||
s->tt = BZALLOC( s->blockSize100k * 100000 * sizeof(Int32) );
|
||||
if (s->tt == NULL) RETURN(BZ_MEM_ERROR);
|
||||
}
|
||||
|
||||
GET_UCHAR(BZ_X_BLKHDR_1, uc);
|
||||
|
||||
if (uc == 0x17) goto endhdr_2;
|
||||
if (uc != 0x31) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_BLKHDR_2, uc);
|
||||
if (uc != 0x41) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_BLKHDR_3, uc);
|
||||
if (uc != 0x59) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_BLKHDR_4, uc);
|
||||
if (uc != 0x26) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_BLKHDR_5, uc);
|
||||
if (uc != 0x53) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_BLKHDR_6, uc);
|
||||
if (uc != 0x59) RETURN(BZ_DATA_ERROR);
|
||||
|
||||
s->currBlockNo++;
|
||||
if (s->verbosity >= 2)
|
||||
VPrintf1 ( "\n [%d: huff+mtf ", s->currBlockNo );
|
||||
|
||||
s->storedBlockCRC = 0;
|
||||
GET_UCHAR(BZ_X_BCRC_1, uc);
|
||||
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
|
||||
GET_UCHAR(BZ_X_BCRC_2, uc);
|
||||
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
|
||||
GET_UCHAR(BZ_X_BCRC_3, uc);
|
||||
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
|
||||
GET_UCHAR(BZ_X_BCRC_4, uc);
|
||||
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
|
||||
|
||||
GET_BITS(BZ_X_RANDBIT, s->blockRandomised, 1);
|
||||
|
||||
s->origPtr = 0;
|
||||
GET_UCHAR(BZ_X_ORIGPTR_1, uc);
|
||||
s->origPtr = (s->origPtr << 8) | ((Int32)uc);
|
||||
GET_UCHAR(BZ_X_ORIGPTR_2, uc);
|
||||
s->origPtr = (s->origPtr << 8) | ((Int32)uc);
|
||||
GET_UCHAR(BZ_X_ORIGPTR_3, uc);
|
||||
s->origPtr = (s->origPtr << 8) | ((Int32)uc);
|
||||
|
||||
if (s->origPtr < 0)
|
||||
RETURN(BZ_DATA_ERROR);
|
||||
if (s->origPtr > 10 + 100000*s->blockSize100k)
|
||||
RETURN(BZ_DATA_ERROR);
|
||||
|
||||
/*--- Receive the mapping table ---*/
|
||||
for (i = 0; i < 16; i++) {
|
||||
GET_BIT(BZ_X_MAPPING_1, uc);
|
||||
if (uc == 1)
|
||||
s->inUse16[i] = True; else
|
||||
s->inUse16[i] = False;
|
||||
}
|
||||
|
||||
for (i = 0; i < 256; i++) s->inUse[i] = False;
|
||||
|
||||
for (i = 0; i < 16; i++)
|
||||
if (s->inUse16[i])
|
||||
for (j = 0; j < 16; j++) {
|
||||
GET_BIT(BZ_X_MAPPING_2, uc);
|
||||
if (uc == 1) s->inUse[i * 16 + j] = True;
|
||||
}
|
||||
makeMaps_d ( s );
|
||||
if (s->nInUse == 0) RETURN(BZ_DATA_ERROR);
|
||||
alphaSize = s->nInUse+2;
|
||||
|
||||
/*--- Now the selectors ---*/
|
||||
GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
|
||||
if (nGroups < 2 || nGroups > 6) RETURN(BZ_DATA_ERROR);
|
||||
GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
|
||||
if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
|
||||
for (i = 0; i < nSelectors; i++) {
|
||||
j = 0;
|
||||
while (True) {
|
||||
GET_BIT(BZ_X_SELECTOR_3, uc);
|
||||
if (uc == 0) break;
|
||||
j++;
|
||||
if (j >= nGroups) RETURN(BZ_DATA_ERROR);
|
||||
}
|
||||
s->selectorMtf[i] = j;
|
||||
}
|
||||
|
||||
/*--- Undo the MTF values for the selectors. ---*/
|
||||
{
|
||||
UChar pos[BZ_N_GROUPS], tmp, v;
|
||||
for (v = 0; v < nGroups; v++) pos[v] = v;
|
||||
|
||||
for (i = 0; i < nSelectors; i++) {
|
||||
v = s->selectorMtf[i];
|
||||
tmp = pos[v];
|
||||
while (v > 0) { pos[v] = pos[v-1]; v--; }
|
||||
pos[0] = tmp;
|
||||
s->selector[i] = tmp;
|
||||
}
|
||||
}
|
||||
|
||||
/*--- Now the coding tables ---*/
|
||||
for (t = 0; t < nGroups; t++) {
|
||||
GET_BITS(BZ_X_CODING_1, curr, 5);
|
||||
for (i = 0; i < alphaSize; i++) {
|
||||
while (True) {
|
||||
if (curr < 1 || curr > 20) RETURN(BZ_DATA_ERROR);
|
||||
GET_BIT(BZ_X_CODING_2, uc);
|
||||
if (uc == 0) break;
|
||||
GET_BIT(BZ_X_CODING_3, uc);
|
||||
if (uc == 0) curr++; else curr--;
|
||||
}
|
||||
s->len[t][i] = curr;
|
||||
}
|
||||
}
|
||||
|
||||
/*--- Create the Huffman decoding tables ---*/
|
||||
for (t = 0; t < nGroups; t++) {
|
||||
minLen = 32;
|
||||
maxLen = 0;
|
||||
for (i = 0; i < alphaSize; i++) {
|
||||
if (s->len[t][i] > maxLen) maxLen = s->len[t][i];
|
||||
if (s->len[t][i] < minLen) minLen = s->len[t][i];
|
||||
}
|
||||
BZ2_hbCreateDecodeTables (
|
||||
&(s->limit[t][0]),
|
||||
&(s->base[t][0]),
|
||||
&(s->perm[t][0]),
|
||||
&(s->len[t][0]),
|
||||
minLen, maxLen, alphaSize
|
||||
);
|
||||
s->minLens[t] = minLen;
|
||||
}
|
||||
|
||||
/*--- Now the MTF values ---*/
|
||||
|
||||
EOB = s->nInUse+1;
|
||||
nblockMAX = 100000 * s->blockSize100k;
|
||||
groupNo = -1;
|
||||
groupPos = 0;
|
||||
|
||||
for (i = 0; i <= 255; i++) s->unzftab[i] = 0;
|
||||
|
||||
/*-- MTF init --*/
|
||||
{
|
||||
Int32 ii, jj, kk;
|
||||
kk = MTFA_SIZE-1;
|
||||
for (ii = 256 / MTFL_SIZE - 1; ii >= 0; ii--) {
|
||||
for (jj = MTFL_SIZE-1; jj >= 0; jj--) {
|
||||
s->mtfa[kk] = (UChar)(ii * MTFL_SIZE + jj);
|
||||
kk--;
|
||||
}
|
||||
s->mtfbase[ii] = kk + 1;
|
||||
}
|
||||
}
|
||||
/*-- end MTF init --*/
|
||||
|
||||
nblock = 0;
|
||||
GET_MTF_VAL(BZ_X_MTF_1, BZ_X_MTF_2, nextSym);
|
||||
|
||||
while (True) {
|
||||
|
||||
if (nextSym == EOB) break;
|
||||
|
||||
if (nextSym == BZ_RUNA || nextSym == BZ_RUNB) {
|
||||
|
||||
es = -1;
|
||||
N = 1;
|
||||
do {
|
||||
if (nextSym == BZ_RUNA) es = es + (0+1) * N; else
|
||||
if (nextSym == BZ_RUNB) es = es + (1+1) * N;
|
||||
N = N * 2;
|
||||
GET_MTF_VAL(BZ_X_MTF_3, BZ_X_MTF_4, nextSym);
|
||||
}
|
||||
while (nextSym == BZ_RUNA || nextSym == BZ_RUNB);
|
||||
|
||||
es++;
|
||||
uc = s->seqToUnseq[ s->mtfa[s->mtfbase[0]] ];
|
||||
s->unzftab[uc] += es;
|
||||
|
||||
if (s->smallDecompress)
|
||||
while (es > 0) {
|
||||
if (nblock >= nblockMAX) RETURN(BZ_DATA_ERROR);
|
||||
s->ll16[nblock] = (UInt16)uc;
|
||||
nblock++;
|
||||
es--;
|
||||
}
|
||||
else
|
||||
while (es > 0) {
|
||||
if (nblock >= nblockMAX) RETURN(BZ_DATA_ERROR);
|
||||
s->tt[nblock] = (UInt32)uc;
|
||||
nblock++;
|
||||
es--;
|
||||
};
|
||||
|
||||
continue;
|
||||
|
||||
} else {
|
||||
|
||||
if (nblock >= nblockMAX) RETURN(BZ_DATA_ERROR);
|
||||
|
||||
/*-- uc = MTF ( nextSym-1 ) --*/
|
||||
{
|
||||
Int32 ii, jj, kk, pp, lno, off;
|
||||
UInt32 nn;
|
||||
nn = (UInt32)(nextSym - 1);
|
||||
|
||||
if (nn < MTFL_SIZE) {
|
||||
/* avoid general-case expense */
|
||||
pp = s->mtfbase[0];
|
||||
uc = s->mtfa[pp+nn];
|
||||
while (nn > 3) {
|
||||
Int32 z = pp+nn;
|
||||
s->mtfa[(z) ] = s->mtfa[(z)-1];
|
||||
s->mtfa[(z)-1] = s->mtfa[(z)-2];
|
||||
s->mtfa[(z)-2] = s->mtfa[(z)-3];
|
||||
s->mtfa[(z)-3] = s->mtfa[(z)-4];
|
||||
nn -= 4;
|
||||
}
|
||||
while (nn > 0) {
|
||||
s->mtfa[(pp+nn)] = s->mtfa[(pp+nn)-1]; nn--;
|
||||
};
|
||||
s->mtfa[pp] = uc;
|
||||
} else {
|
||||
/* general case */
|
||||
lno = nn / MTFL_SIZE;
|
||||
off = nn % MTFL_SIZE;
|
||||
pp = s->mtfbase[lno] + off;
|
||||
uc = s->mtfa[pp];
|
||||
while (pp > s->mtfbase[lno]) {
|
||||
s->mtfa[pp] = s->mtfa[pp-1]; pp--;
|
||||
};
|
||||
s->mtfbase[lno]++;
|
||||
while (lno > 0) {
|
||||
s->mtfbase[lno]--;
|
||||
s->mtfa[s->mtfbase[lno]]
|
||||
= s->mtfa[s->mtfbase[lno-1] + MTFL_SIZE - 1];
|
||||
lno--;
|
||||
}
|
||||
s->mtfbase[0]--;
|
||||
s->mtfa[s->mtfbase[0]] = uc;
|
||||
if (s->mtfbase[0] == 0) {
|
||||
kk = MTFA_SIZE-1;
|
||||
for (ii = 256 / MTFL_SIZE-1; ii >= 0; ii--) {
|
||||
for (jj = MTFL_SIZE-1; jj >= 0; jj--) {
|
||||
s->mtfa[kk] = s->mtfa[s->mtfbase[ii] + jj];
|
||||
kk--;
|
||||
}
|
||||
s->mtfbase[ii] = kk + 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
/*-- end uc = MTF ( nextSym-1 ) --*/
|
||||
|
||||
s->unzftab[s->seqToUnseq[uc]]++;
|
||||
if (s->smallDecompress)
|
||||
s->ll16[nblock] = (UInt16)(s->seqToUnseq[uc]); else
|
||||
s->tt[nblock] = (UInt32)(s->seqToUnseq[uc]);
|
||||
nblock++;
|
||||
|
||||
GET_MTF_VAL(BZ_X_MTF_5, BZ_X_MTF_6, nextSym);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
/* Now we know what nblock is, we can do a better sanity
|
||||
check on s->origPtr.
|
||||
*/
|
||||
if (s->origPtr < 0 || s->origPtr >= nblock)
|
||||
RETURN(BZ_DATA_ERROR);
|
||||
|
||||
s->state_out_len = 0;
|
||||
s->state_out_ch = 0;
|
||||
BZ_INITIALISE_CRC ( s->calculatedBlockCRC );
|
||||
s->state = BZ_X_OUTPUT;
|
||||
if (s->verbosity >= 2) VPrintf0 ( "rt+rld" );
|
||||
|
||||
/*-- Set up cftab to facilitate generation of T^(-1) --*/
|
||||
s->cftab[0] = 0;
|
||||
for (i = 1; i <= 256; i++) s->cftab[i] = s->unzftab[i-1];
|
||||
for (i = 1; i <= 256; i++) s->cftab[i] += s->cftab[i-1];
|
||||
|
||||
if (s->smallDecompress) {
|
||||
|
||||
/*-- Make a copy of cftab, used in generation of T --*/
|
||||
for (i = 0; i <= 256; i++) s->cftabCopy[i] = s->cftab[i];
|
||||
|
||||
/*-- compute the T vector --*/
|
||||
for (i = 0; i < nblock; i++) {
|
||||
uc = (UChar)(s->ll16[i]);
|
||||
SET_LL(i, s->cftabCopy[uc]);
|
||||
s->cftabCopy[uc]++;
|
||||
}
|
||||
|
||||
/*-- Compute T^(-1) by pointer reversal on T --*/
|
||||
i = s->origPtr;
|
||||
j = GET_LL(i);
|
||||
do {
|
||||
Int32 tmp = GET_LL(j);
|
||||
SET_LL(j, i);
|
||||
i = j;
|
||||
j = tmp;
|
||||
}
|
||||
while (i != s->origPtr);
|
||||
|
||||
s->tPos = s->origPtr;
|
||||
s->nblock_used = 0;
|
||||
if (s->blockRandomised) {
|
||||
BZ_RAND_INIT_MASK;
|
||||
BZ_GET_SMALL(s->k0); s->nblock_used++;
|
||||
BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK;
|
||||
} else {
|
||||
BZ_GET_SMALL(s->k0); s->nblock_used++;
|
||||
}
|
||||
|
||||
} else {
|
||||
|
||||
/*-- compute the T^(-1) vector --*/
|
||||
for (i = 0; i < nblock; i++) {
|
||||
uc = (UChar)(s->tt[i] & 0xff);
|
||||
s->tt[s->cftab[uc]] |= (i << 8);
|
||||
s->cftab[uc]++;
|
||||
}
|
||||
|
||||
s->tPos = s->tt[s->origPtr] >> 8;
|
||||
s->nblock_used = 0;
|
||||
if (s->blockRandomised) {
|
||||
BZ_RAND_INIT_MASK;
|
||||
BZ_GET_FAST(s->k0); s->nblock_used++;
|
||||
BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK;
|
||||
} else {
|
||||
BZ_GET_FAST(s->k0); s->nblock_used++;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
RETURN(BZ_OK);
|
||||
|
||||
|
||||
|
||||
endhdr_2:
|
||||
|
||||
GET_UCHAR(BZ_X_ENDHDR_2, uc);
|
||||
if (uc != 0x72) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_ENDHDR_3, uc);
|
||||
if (uc != 0x45) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_ENDHDR_4, uc);
|
||||
if (uc != 0x38) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_ENDHDR_5, uc);
|
||||
if (uc != 0x50) RETURN(BZ_DATA_ERROR);
|
||||
GET_UCHAR(BZ_X_ENDHDR_6, uc);
|
||||
if (uc != 0x90) RETURN(BZ_DATA_ERROR);
|
||||
|
||||
s->storedCombinedCRC = 0;
|
||||
GET_UCHAR(BZ_X_CCRC_1, uc);
|
||||
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
|
||||
GET_UCHAR(BZ_X_CCRC_2, uc);
|
||||
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
|
||||
GET_UCHAR(BZ_X_CCRC_3, uc);
|
||||
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
|
||||
GET_UCHAR(BZ_X_CCRC_4, uc);
|
||||
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
|
||||
|
||||
s->state = BZ_X_IDLE;
|
||||
RETURN(BZ_STREAM_END);
|
||||
|
||||
default: AssertH ( False, 4001 );
|
||||
}
|
||||
|
||||
AssertH ( False, 4002 );
|
||||
|
||||
save_state_and_return:
|
||||
|
||||
s->save_i = i;
|
||||
s->save_j = j;
|
||||
s->save_t = t;
|
||||
s->save_alphaSize = alphaSize;
|
||||
s->save_nGroups = nGroups;
|
||||
s->save_nSelectors = nSelectors;
|
||||
s->save_EOB = EOB;
|
||||
s->save_groupNo = groupNo;
|
||||
s->save_groupPos = groupPos;
|
||||
s->save_nextSym = nextSym;
|
||||
s->save_nblockMAX = nblockMAX;
|
||||
s->save_nblock = nblock;
|
||||
s->save_es = es;
|
||||
s->save_N = N;
|
||||
s->save_curr = curr;
|
||||
s->save_zt = zt;
|
||||
s->save_zn = zn;
|
||||
s->save_zvec = zvec;
|
||||
s->save_zj = zj;
|
||||
s->save_gSel = gSel;
|
||||
s->save_gMinlen = gMinlen;
|
||||
s->save_gLimit = gLimit;
|
||||
s->save_gBase = gBase;
|
||||
s->save_gPerm = gPerm;
|
||||
|
||||
return retVal;
|
||||
}
|
||||
|
||||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- end decompress.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
|
@ -1,176 +0,0 @@
|
|||
/*
|
||||
minibz2
|
||||
libbz2.dll test program.
|
||||
by Yoshioka Tsuneo(QWF00133@nifty.ne.jp/tsuneo-y@is.aist-nara.ac.jp)
|
||||
This file is Public Domain.
|
||||
welcome any email to me.
|
||||
|
||||
usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]
|
||||
*/
|
||||
|
||||
#define BZ_IMPORT
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include "bzlib.h"
|
||||
#ifdef _WIN32
|
||||
#include <io.h>
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef _WIN32
|
||||
|
||||
#define BZ2_LIBNAME "libbz2-1.0.0.DLL"
|
||||
|
||||
#include <windows.h>
|
||||
static int BZ2DLLLoaded = 0;
|
||||
static HINSTANCE BZ2DLLhLib;
|
||||
int BZ2DLLLoadLibrary(void)
|
||||
{
|
||||
HINSTANCE hLib;
|
||||
|
||||
if(BZ2DLLLoaded==1){return 0;}
|
||||
hLib=LoadLibrary(BZ2_LIBNAME);
|
||||
if(hLib == NULL){
|
||||
fprintf(stderr,"Can't load %s\n",BZ2_LIBNAME);
|
||||
return -1;
|
||||
}
|
||||
BZ2_bzlibVersion=GetProcAddress(hLib,"BZ2_bzlibVersion");
|
||||
BZ2_bzopen=GetProcAddress(hLib,"BZ2_bzopen");
|
||||
BZ2_bzdopen=GetProcAddress(hLib,"BZ2_bzdopen");
|
||||
BZ2_bzread=GetProcAddress(hLib,"BZ2_bzread");
|
||||
BZ2_bzwrite=GetProcAddress(hLib,"BZ2_bzwrite");
|
||||
BZ2_bzflush=GetProcAddress(hLib,"BZ2_bzflush");
|
||||
BZ2_bzclose=GetProcAddress(hLib,"BZ2_bzclose");
|
||||
BZ2_bzerror=GetProcAddress(hLib,"BZ2_bzerror");
|
||||
|
||||
if (!BZ2_bzlibVersion || !BZ2_bzopen || !BZ2_bzdopen
|
||||
|| !BZ2_bzread || !BZ2_bzwrite || !BZ2_bzflush
|
||||
|| !BZ2_bzclose || !BZ2_bzerror) {
|
||||
fprintf(stderr,"GetProcAddress failed.\n");
|
||||
return -1;
|
||||
}
|
||||
BZ2DLLLoaded=1;
|
||||
BZ2DLLhLib=hLib;
|
||||
return 0;
|
||||
|
||||
}
|
||||
int BZ2DLLFreeLibrary(void)
|
||||
{
|
||||
if(BZ2DLLLoaded==0){return 0;}
|
||||
FreeLibrary(BZ2DLLhLib);
|
||||
BZ2DLLLoaded=0;
|
||||
}
|
||||
#endif /* WIN32 */
|
||||
|
||||
void usage(void)
|
||||
{
|
||||
puts("usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]");
|
||||
}
|
||||
|
||||
int main(int argc,char *argv[])
|
||||
{
|
||||
int decompress = 0;
|
||||
int level = 9;
|
||||
char *fn_r = NULL;
|
||||
char *fn_w = NULL;
|
||||
|
||||
#ifdef _WIN32
|
||||
if(BZ2DLLLoadLibrary()<0){
|
||||
fprintf(stderr,"Loading of %s failed. Giving up.\n", BZ2_LIBNAME);
|
||||
exit(1);
|
||||
}
|
||||
printf("Loading of %s succeeded. Library version is %s.\n",
|
||||
BZ2_LIBNAME, BZ2_bzlibVersion() );
|
||||
#endif
|
||||
while(++argv,--argc){
|
||||
if(**argv =='-' || **argv=='/'){
|
||||
char *p;
|
||||
|
||||
for(p=*argv+1;*p;p++){
|
||||
if(*p=='d'){
|
||||
decompress = 1;
|
||||
}else if('1'<=*p && *p<='9'){
|
||||
level = *p - '0';
|
||||
}else{
|
||||
usage();
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}else{
|
||||
break;
|
||||
}
|
||||
}
|
||||
if(argc>=1){
|
||||
fn_r = *argv;
|
||||
argc--;argv++;
|
||||
}else{
|
||||
fn_r = NULL;
|
||||
}
|
||||
if(argc>=1){
|
||||
fn_w = *argv;
|
||||
argc--;argv++;
|
||||
}else{
|
||||
fn_w = NULL;
|
||||
}
|
||||
{
|
||||
int len;
|
||||
char buff[0x1000];
|
||||
char mode[10];
|
||||
|
||||
if(decompress){
|
||||
BZFILE *BZ2fp_r = NULL;
|
||||
FILE *fp_w = NULL;
|
||||
|
||||
if(fn_w){
|
||||
if((fp_w = fopen(fn_w,"wb"))==NULL){
|
||||
printf("can't open [%s]\n",fn_w);
|
||||
perror("reason:");
|
||||
exit(1);
|
||||
}
|
||||
}else{
|
||||
fp_w = stdout;
|
||||
}
|
||||
if((BZ2fp_r == NULL && (BZ2fp_r = BZ2_bzdopen(fileno(stdin),"rb"))==NULL)
|
||||
|| (BZ2fp_r != NULL && (BZ2fp_r = BZ2_bzopen(fn_r,"rb"))==NULL)){
|
||||
printf("can't bz2openstream\n");
|
||||
exit(1);
|
||||
}
|
||||
while((len=BZ2_bzread(BZ2fp_r,buff,0x1000))>0){
|
||||
fwrite(buff,1,len,fp_w);
|
||||
}
|
||||
BZ2_bzclose(BZ2fp_r);
|
||||
if(fp_w != stdout) fclose(fp_w);
|
||||
}else{
|
||||
BZFILE *BZ2fp_w = NULL;
|
||||
FILE *fp_r = NULL;
|
||||
|
||||
if(fn_r){
|
||||
if((fp_r = fopen(fn_r,"rb"))==NULL){
|
||||
printf("can't open [%s]\n",fn_r);
|
||||
perror("reason:");
|
||||
exit(1);
|
||||
}
|
||||
}else{
|
||||
fp_r = stdin;
|
||||
}
|
||||
mode[0]='w';
|
||||
mode[1] = '0' + level;
|
||||
mode[2] = '\0';
|
||||
|
||||
if((fn_w == NULL && (BZ2fp_w = BZ2_bzdopen(fileno(stdout),mode))==NULL)
|
||||
|| (fn_w !=NULL && (BZ2fp_w = BZ2_bzopen(fn_w,mode))==NULL)){
|
||||
printf("can't bz2openstream\n");
|
||||
exit(1);
|
||||
}
|
||||
while((len=fread(buff,1,0x1000,fp_r))>0){
|
||||
BZ2_bzwrite(BZ2fp_w,buff,len);
|
||||
}
|
||||
BZ2_bzclose(BZ2fp_w);
|
||||
if(fp_r!=stdin)fclose(fp_r);
|
||||
}
|
||||
}
|
||||
#ifdef _WIN32
|
||||
BZ2DLLFreeLibrary();
|
||||
#endif
|
||||
return 0;
|
||||
}
|
|
@ -1,93 +0,0 @@
|
|||
# Microsoft Developer Studio Project File - Name="dlltest" - Package Owner=<4>
|
||||
# Microsoft Developer Studio Generated Build File, Format Version 5.00
|
||||
# ** 編集しないでください **
|
||||
|
||||
# TARGTYPE "Win32 (x86) Console Application" 0x0103
|
||||
|
||||
CFG=dlltest - Win32 Debug
|
||||
!MESSAGE これは有効なメイクファイルではありません。 このプロジェクトをビルドするためには NMAKE を使用してください。
|
||||
!MESSAGE [メイクファイルのエクスポート] コマンドを使用して実行してください
|
||||
!MESSAGE
|
||||
!MESSAGE NMAKE /f "dlltest.mak".
|
||||
!MESSAGE
|
||||
!MESSAGE NMAKE の実行時に構成を指定できます
|
||||
!MESSAGE コマンド ライン上でマクロの設定を定義します。例:
|
||||
!MESSAGE
|
||||
!MESSAGE NMAKE /f "dlltest.mak" CFG="dlltest - Win32 Debug"
|
||||
!MESSAGE
|
||||
!MESSAGE 選択可能なビルド モード:
|
||||
!MESSAGE
|
||||
!MESSAGE "dlltest - Win32 Release" ("Win32 (x86) Console Application" 用)
|
||||
!MESSAGE "dlltest - Win32 Debug" ("Win32 (x86) Console Application" 用)
|
||||
!MESSAGE
|
||||
|
||||
# Begin Project
|
||||
# PROP Scc_ProjName ""
|
||||
# PROP Scc_LocalPath ""
|
||||
CPP=cl.exe
|
||||
RSC=rc.exe
|
||||
|
||||
!IF "$(CFG)" == "dlltest - Win32 Release"
|
||||
|
||||
# PROP BASE Use_MFC 0
|
||||
# PROP BASE Use_Debug_Libraries 0
|
||||
# PROP BASE Output_Dir "Release"
|
||||
# PROP BASE Intermediate_Dir "Release"
|
||||
# PROP BASE Target_Dir ""
|
||||
# PROP Use_MFC 0
|
||||
# PROP Use_Debug_Libraries 0
|
||||
# PROP Output_Dir "Release"
|
||||
# PROP Intermediate_Dir "Release"
|
||||
# PROP Ignore_Export_Lib 0
|
||||
# PROP Target_Dir ""
|
||||
# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
|
||||
# ADD CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
|
||||
# ADD BASE RSC /l 0x411 /d "NDEBUG"
|
||||
# ADD RSC /l 0x411 /d "NDEBUG"
|
||||
BSC32=bscmake.exe
|
||||
# ADD BASE BSC32 /nologo
|
||||
# ADD BSC32 /nologo
|
||||
LINK32=link.exe
|
||||
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
|
||||
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 /out:"minibz2.exe"
|
||||
|
||||
!ELSEIF "$(CFG)" == "dlltest - Win32 Debug"
|
||||
|
||||
# PROP BASE Use_MFC 0
|
||||
# PROP BASE Use_Debug_Libraries 1
|
||||
# PROP BASE Output_Dir "dlltest_"
|
||||
# PROP BASE Intermediate_Dir "dlltest_"
|
||||
# PROP BASE Target_Dir ""
|
||||
# PROP Use_MFC 0
|
||||
# PROP Use_Debug_Libraries 1
|
||||
# PROP Output_Dir "dlltest_"
|
||||
# PROP Intermediate_Dir "dlltest_"
|
||||
# PROP Ignore_Export_Lib 0
|
||||
# PROP Target_Dir ""
|
||||
# ADD BASE CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
|
||||
# ADD CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
|
||||
# ADD BASE RSC /l 0x411 /d "_DEBUG"
|
||||
# ADD RSC /l 0x411 /d "_DEBUG"
|
||||
BSC32=bscmake.exe
|
||||
# ADD BASE BSC32 /nologo
|
||||
# ADD BSC32 /nologo
|
||||
LINK32=link.exe
|
||||
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
|
||||
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /out:"minibz2.exe" /pdbtype:sept
|
||||
|
||||
!ENDIF
|
||||
|
||||
# Begin Target
|
||||
|
||||
# Name "dlltest - Win32 Release"
|
||||
# Name "dlltest - Win32 Debug"
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\bzlib.h
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\dlltest.c
|
||||
# End Source File
|
||||
# End Target
|
||||
# End Project
|
|
@ -1,228 +0,0 @@
|
|||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- Huffman coding low-level stuff ---*/
|
||||
/*--- huffman.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
||||
This program is based on (at least) the work of:
|
||||
Mike Burrows
|
||||
David Wheeler
|
||||
Peter Fenwick
|
||||
Alistair Moffat
|
||||
Radford Neal
|
||||
Ian H. Witten
|
||||
Robert Sedgewick
|
||||
Jon L. Bentley
|
||||
|
||||
For more information on these sources, see the manual.
|
||||
--*/
|
||||
|
||||
|
||||
#include "bzlib_private.h"
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
#define WEIGHTOF(zz0) ((zz0) & 0xffffff00)
|
||||
#define DEPTHOF(zz1) ((zz1) & 0x000000ff)
|
||||
#define MYMAX(zz2,zz3) ((zz2) > (zz3) ? (zz2) : (zz3))
|
||||
|
||||
#define ADDWEIGHTS(zw1,zw2) \
|
||||
(WEIGHTOF(zw1)+WEIGHTOF(zw2)) | \
|
||||
(1 + MYMAX(DEPTHOF(zw1),DEPTHOF(zw2)))
|
||||
|
||||
#define UPHEAP(z) \
|
||||
{ \
|
||||
Int32 zz, tmp; \
|
||||
zz = z; tmp = heap[zz]; \
|
||||
while (weight[tmp] < weight[heap[zz >> 1]]) { \
|
||||
heap[zz] = heap[zz >> 1]; \
|
||||
zz >>= 1; \
|
||||
} \
|
||||
heap[zz] = tmp; \
|
||||
}
|
||||
|
||||
#define DOWNHEAP(z) \
|
||||
{ \
|
||||
Int32 zz, yy, tmp; \
|
||||
zz = z; tmp = heap[zz]; \
|
||||
while (True) { \
|
||||
yy = zz << 1; \
|
||||
if (yy > nHeap) break; \
|
||||
if (yy < nHeap && \
|
||||
weight[heap[yy+1]] < weight[heap[yy]]) \
|
||||
yy++; \
|
||||
if (weight[tmp] < weight[heap[yy]]) break; \
|
||||
heap[zz] = heap[yy]; \
|
||||
zz = yy; \
|
||||
} \
|
||||
heap[zz] = tmp; \
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
void BZ2_hbMakeCodeLengths ( UChar *len,
|
||||
Int32 *freq,
|
||||
Int32 alphaSize,
|
||||
Int32 maxLen )
|
||||
{
|
||||
/*--
|
||||
Nodes and heap entries run from 1. Entry 0
|
||||
for both the heap and nodes is a sentinel.
|
||||
--*/
|
||||
Int32 nNodes, nHeap, n1, n2, i, j, k;
|
||||
Bool tooLong;
|
||||
|
||||
Int32 heap [ BZ_MAX_ALPHA_SIZE + 2 ];
|
||||
Int32 weight [ BZ_MAX_ALPHA_SIZE * 2 ];
|
||||
Int32 parent [ BZ_MAX_ALPHA_SIZE * 2 ];
|
||||
|
||||
for (i = 0; i < alphaSize; i++)
|
||||
weight[i+1] = (freq[i] == 0 ? 1 : freq[i]) << 8;
|
||||
|
||||
while (True) {
|
||||
|
||||
nNodes = alphaSize;
|
||||
nHeap = 0;
|
||||
|
||||
heap[0] = 0;
|
||||
weight[0] = 0;
|
||||
parent[0] = -2;
|
||||
|
||||
for (i = 1; i <= alphaSize; i++) {
|
||||
parent[i] = -1;
|
||||
nHeap++;
|
||||
heap[nHeap] = i;
|
||||
UPHEAP(nHeap);
|
||||
}
|
||||
|
||||
AssertH( nHeap < (BZ_MAX_ALPHA_SIZE+2), 2001 );
|
||||
|
||||
while (nHeap > 1) {
|
||||
n1 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
|
||||
n2 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
|
||||
nNodes++;
|
||||
parent[n1] = parent[n2] = nNodes;
|
||||
weight[nNodes] = ADDWEIGHTS(weight[n1], weight[n2]);
|
||||
parent[nNodes] = -1;
|
||||
nHeap++;
|
||||
heap[nHeap] = nNodes;
|
||||
UPHEAP(nHeap);
|
||||
}
|
||||
|
||||
AssertH( nNodes < (BZ_MAX_ALPHA_SIZE * 2), 2002 );
|
||||
|
||||
tooLong = False;
|
||||
for (i = 1; i <= alphaSize; i++) {
|
||||
j = 0;
|
||||
k = i;
|
||||
while (parent[k] >= 0) { k = parent[k]; j++; }
|
||||
len[i-1] = j;
|
||||
if (j > maxLen) tooLong = True;
|
||||
}
|
||||
|
||||
if (! tooLong) break;
|
||||
|
||||
for (i = 1; i < alphaSize; i++) {
|
||||
j = weight[i] >> 8;
|
||||
j = 1 + (j / 2);
|
||||
weight[i] = j << 8;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
void BZ2_hbAssignCodes ( Int32 *code,
|
||||
UChar *length,
|
||||
Int32 minLen,
|
||||
Int32 maxLen,
|
||||
Int32 alphaSize )
|
||||
{
|
||||
Int32 n, vec, i;
|
||||
|
||||
vec = 0;
|
||||
for (n = minLen; n <= maxLen; n++) {
|
||||
for (i = 0; i < alphaSize; i++)
|
||||
if (length[i] == n) { code[i] = vec; vec++; };
|
||||
vec <<= 1;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
void BZ2_hbCreateDecodeTables ( Int32 *limit,
|
||||
Int32 *base,
|
||||
Int32 *perm,
|
||||
UChar *length,
|
||||
Int32 minLen,
|
||||
Int32 maxLen,
|
||||
Int32 alphaSize )
|
||||
{
|
||||
Int32 pp, i, j, vec;
|
||||
|
||||
pp = 0;
|
||||
for (i = minLen; i <= maxLen; i++)
|
||||
for (j = 0; j < alphaSize; j++)
|
||||
if (length[j] == i) { perm[pp] = j; pp++; };
|
||||
|
||||
for (i = 0; i < BZ_MAX_CODE_LEN; i++) base[i] = 0;
|
||||
for (i = 0; i < alphaSize; i++) base[length[i]+1]++;
|
||||
|
||||
for (i = 1; i < BZ_MAX_CODE_LEN; i++) base[i] += base[i-1];
|
||||
|
||||
for (i = 0; i < BZ_MAX_CODE_LEN; i++) limit[i] = 0;
|
||||
vec = 0;
|
||||
|
||||
for (i = minLen; i <= maxLen; i++) {
|
||||
vec += (base[i+1] - base[i]);
|
||||
limit[i] = vec-1;
|
||||
vec <<= 1;
|
||||
}
|
||||
for (i = minLen + 1; i <= maxLen; i++)
|
||||
base[i] = ((limit[i-1] + 1) << 1) - base[i];
|
||||
}
|
||||
|
||||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- end huffman.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
|
@ -1,27 +0,0 @@
|
|||
LIBRARY LIBBZ2
|
||||
DESCRIPTION "libbzip2: library for data compression"
|
||||
EXPORTS
|
||||
BZ2_bzCompressInit
|
||||
BZ2_bzCompress
|
||||
BZ2_bzCompressEnd
|
||||
BZ2_bzDecompressInit
|
||||
BZ2_bzDecompress
|
||||
BZ2_bzDecompressEnd
|
||||
BZ2_bzReadOpen
|
||||
BZ2_bzReadClose
|
||||
BZ2_bzReadGetUnused
|
||||
BZ2_bzRead
|
||||
BZ2_bzWriteOpen
|
||||
BZ2_bzWrite
|
||||
BZ2_bzWriteClose
|
||||
BZ2_bzWriteClose64
|
||||
BZ2_bzBuffToBuffCompress
|
||||
BZ2_bzBuffToBuffDecompress
|
||||
BZ2_bzlibVersion
|
||||
BZ2_bzopen
|
||||
BZ2_bzdopen
|
||||
BZ2_bzread
|
||||
BZ2_bzwrite
|
||||
BZ2_bzflush
|
||||
BZ2_bzclose
|
||||
BZ2_bzerror
|
|
@ -1,130 +0,0 @@
|
|||
# Microsoft Developer Studio Project File - Name="libbz2" - Package Owner=<4>
|
||||
# Microsoft Developer Studio Generated Build File, Format Version 5.00
|
||||
# ** 編集しないでください **
|
||||
|
||||
# TARGTYPE "Win32 (x86) Dynamic-Link Library" 0x0102
|
||||
|
||||
CFG=libbz2 - Win32 Debug
|
||||
!MESSAGE これは有効なメイクファイルではありません。 このプロジェクトをビルドするためには NMAKE を使用してください。
|
||||
!MESSAGE [メイクファイルのエクスポート] コマンドを使用して実行してください
|
||||
!MESSAGE
|
||||
!MESSAGE NMAKE /f "libbz2.mak".
|
||||
!MESSAGE
|
||||
!MESSAGE NMAKE の実行時に構成を指定できます
|
||||
!MESSAGE コマンド ライン上でマクロの設定を定義します。例:
|
||||
!MESSAGE
|
||||
!MESSAGE NMAKE /f "libbz2.mak" CFG="libbz2 - Win32 Debug"
|
||||
!MESSAGE
|
||||
!MESSAGE 選択可能なビルド モード:
|
||||
!MESSAGE
|
||||
!MESSAGE "libbz2 - Win32 Release" ("Win32 (x86) Dynamic-Link Library" 用)
|
||||
!MESSAGE "libbz2 - Win32 Debug" ("Win32 (x86) Dynamic-Link Library" 用)
|
||||
!MESSAGE
|
||||
|
||||
# Begin Project
|
||||
# PROP Scc_ProjName ""
|
||||
# PROP Scc_LocalPath ""
|
||||
CPP=cl.exe
|
||||
MTL=midl.exe
|
||||
RSC=rc.exe
|
||||
|
||||
!IF "$(CFG)" == "libbz2 - Win32 Release"
|
||||
|
||||
# PROP BASE Use_MFC 0
|
||||
# PROP BASE Use_Debug_Libraries 0
|
||||
# PROP BASE Output_Dir "Release"
|
||||
# PROP BASE Intermediate_Dir "Release"
|
||||
# PROP BASE Target_Dir ""
|
||||
# PROP Use_MFC 0
|
||||
# PROP Use_Debug_Libraries 0
|
||||
# PROP Output_Dir "Release"
|
||||
# PROP Intermediate_Dir "Release"
|
||||
# PROP Ignore_Export_Lib 0
|
||||
# PROP Target_Dir ""
|
||||
# ADD BASE CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c
|
||||
# ADD CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c
|
||||
# ADD BASE MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32
|
||||
# ADD MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32
|
||||
# ADD BASE RSC /l 0x411 /d "NDEBUG"
|
||||
# ADD RSC /l 0x411 /d "NDEBUG"
|
||||
BSC32=bscmake.exe
|
||||
# ADD BASE BSC32 /nologo
|
||||
# ADD BSC32 /nologo
|
||||
LINK32=link.exe
|
||||
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386
|
||||
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386 /out:"libbz2.dll"
|
||||
|
||||
!ELSEIF "$(CFG)" == "libbz2 - Win32 Debug"
|
||||
|
||||
# PROP BASE Use_MFC 0
|
||||
# PROP BASE Use_Debug_Libraries 1
|
||||
# PROP BASE Output_Dir "Debug"
|
||||
# PROP BASE Intermediate_Dir "Debug"
|
||||
# PROP BASE Target_Dir ""
|
||||
# PROP Use_MFC 0
|
||||
# PROP Use_Debug_Libraries 1
|
||||
# PROP Output_Dir "Debug"
|
||||
# PROP Intermediate_Dir "Debug"
|
||||
# PROP Ignore_Export_Lib 0
|
||||
# PROP Target_Dir ""
|
||||
# ADD BASE CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c
|
||||
# ADD CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c
|
||||
# ADD BASE MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32
|
||||
# ADD MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32
|
||||
# ADD BASE RSC /l 0x411 /d "_DEBUG"
|
||||
# ADD RSC /l 0x411 /d "_DEBUG"
|
||||
BSC32=bscmake.exe
|
||||
# ADD BASE BSC32 /nologo
|
||||
# ADD BSC32 /nologo
|
||||
LINK32=link.exe
|
||||
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /pdbtype:sept
|
||||
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /out:"libbz2.dll" /pdbtype:sept
|
||||
|
||||
!ENDIF
|
||||
|
||||
# Begin Target
|
||||
|
||||
# Name "libbz2 - Win32 Release"
|
||||
# Name "libbz2 - Win32 Debug"
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\blocksort.c
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\bzlib.c
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\bzlib.h
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\bzlib_private.h
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\compress.c
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\crctable.c
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\decompress.c
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\huffman.c
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\libbz2.def
|
||||
# End Source File
|
||||
# Begin Source File
|
||||
|
||||
SOURCE=.\randtable.c
|
||||
# End Source File
|
||||
# End Target
|
||||
# End Project
|
|
@ -1,63 +0,0 @@
|
|||
# Makefile for Microsoft Visual C++ 6.0
|
||||
# usage: nmake -f makefile.msc
|
||||
# K.M. Syring (syring@gsf.de)
|
||||
# Fixed up by JRS for bzip2-0.9.5d release.
|
||||
|
||||
CC=cl
|
||||
CFLAGS= -DWIN32 -MD -Ox -D_FILE_OFFSET_BITS=64
|
||||
|
||||
OBJS= blocksort.obj \
|
||||
huffman.obj \
|
||||
crctable.obj \
|
||||
randtable.obj \
|
||||
compress.obj \
|
||||
decompress.obj \
|
||||
bzlib.obj
|
||||
|
||||
all: lib bzip2 test
|
||||
|
||||
bzip2: lib
|
||||
$(CC) $(CFLAGS) -o bzip2 bzip2.c libbz2.lib setargv.obj
|
||||
$(CC) $(CFLAGS) -o bzip2recover bzip2recover.c
|
||||
|
||||
lib: $(OBJS)
|
||||
lib /out:libbz2.lib $(OBJS)
|
||||
|
||||
test: bzip2
|
||||
type words1
|
||||
.\\bzip2 -1 < sample1.ref > sample1.rb2
|
||||
.\\bzip2 -2 < sample2.ref > sample2.rb2
|
||||
.\\bzip2 -3 < sample3.ref > sample3.rb2
|
||||
.\\bzip2 -d < sample1.bz2 > sample1.tst
|
||||
.\\bzip2 -d < sample2.bz2 > sample2.tst
|
||||
.\\bzip2 -ds < sample3.bz2 > sample3.tst
|
||||
@echo All six of the fc's should find no differences.
|
||||
@echo If fc finds an error on sample3.bz2, this could be
|
||||
@echo because WinZip's 'TAR file smart CR/LF conversion'
|
||||
@echo is too clever for its own good. Disable this option.
|
||||
@echo The correct size for sample3.ref is 120,244. If it
|
||||
@echo is 150,251, WinZip has messed it up.
|
||||
fc sample1.bz2 sample1.rb2
|
||||
fc sample2.bz2 sample2.rb2
|
||||
fc sample3.bz2 sample3.rb2
|
||||
fc sample1.tst sample1.ref
|
||||
fc sample2.tst sample2.ref
|
||||
fc sample3.tst sample3.ref
|
||||
|
||||
|
||||
|
||||
clean:
|
||||
del *.obj
|
||||
del libbz2.lib
|
||||
del bzip2.exe
|
||||
del bzip2recover.exe
|
||||
del sample1.rb2
|
||||
del sample2.rb2
|
||||
del sample3.rb2
|
||||
del sample1.tst
|
||||
del sample2.tst
|
||||
del sample3.tst
|
||||
|
||||
.c.obj:
|
||||
$(CC) $(CFLAGS) -c $*.c -o $*.obj
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,47 +0,0 @@
|
|||
<HTML>
|
||||
<HEAD>
|
||||
<!-- This HTML file has been created by texi2html 1.54
|
||||
from manual.texi on 23 March 2000 -->
|
||||
|
||||
<TITLE>bzip2 and libbzip2 - Introduction</TITLE>
|
||||
<link href="manual_2.html" rel=Next>
|
||||
<link href="manual_toc.html" rel=ToC>
|
||||
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<p>Go to the first, previous, <A HREF="manual_2.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
|
||||
<P><HR><P>
|
||||
|
||||
|
||||
<H1><A NAME="SEC1" HREF="manual_toc.html#TOC1">Introduction</A></H1>
|
||||
|
||||
<P>
|
||||
<CODE>bzip2</CODE> compresses files using the Burrows-Wheeler
|
||||
block-sorting text compression algorithm, and Huffman coding.
|
||||
Compression is generally considerably better than that
|
||||
achieved by more conventional LZ77/LZ78-based compressors,
|
||||
and approaches the performance of the PPM family of statistical compressors.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2</CODE> is built on top of <CODE>libbzip2</CODE>, a flexible library
|
||||
for handling compressed data in the <CODE>bzip2</CODE> format. This manual
|
||||
describes both how to use the program and
|
||||
how to work with the library interface. Most of the
|
||||
manual is devoted to this library, not the program,
|
||||
which is good news if your interest is only in the program.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Chapter 2 describes how to use <CODE>bzip2</CODE>; this is the only part
|
||||
you need to read if you just want to know how to operate the program.
|
||||
Chapter 3 describes the programming interfaces in detail, and
|
||||
Chapter 4 records some miscellaneous notes which I thought
|
||||
ought to be recorded somewhere.
|
||||
|
||||
</P>
|
||||
|
||||
<P><HR><P>
|
||||
<p>Go to the first, previous, <A HREF="manual_2.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
|
||||
</BODY>
|
||||
</HTML>
|
|
@ -1,484 +0,0 @@
|
|||
<HTML>
|
||||
<HEAD>
|
||||
<!-- This HTML file has been created by texi2html 1.54
|
||||
from manual.texi on 23 March 2000 -->
|
||||
|
||||
<TITLE>bzip2 and libbzip2 - How to use bzip2</TITLE>
|
||||
<link href="manual_3.html" rel=Next>
|
||||
<link href="manual_1.html" rel=Previous>
|
||||
<link href="manual_toc.html" rel=ToC>
|
||||
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_1.html">previous</A>, <A HREF="manual_3.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
|
||||
<P><HR><P>
|
||||
|
||||
|
||||
<H1><A NAME="SEC2" HREF="manual_toc.html#TOC2">How to use <CODE>bzip2</CODE></A></H1>
|
||||
|
||||
<P>
|
||||
This chapter contains a copy of the <CODE>bzip2</CODE> man page,
|
||||
and nothing else.
|
||||
|
||||
</P>
|
||||
|
||||
<BLOCKQUOTE>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC3" HREF="manual_toc.html#TOC3">NAME</A></H4>
|
||||
|
||||
<UL>
|
||||
<LI><CODE>bzip2</CODE>, <CODE>bunzip2</CODE>
|
||||
|
||||
- a block-sorting file compressor, v1.0
|
||||
<LI><CODE>bzcat</CODE>
|
||||
|
||||
- decompresses files to stdout
|
||||
<LI><CODE>bzip2recover</CODE>
|
||||
|
||||
- recovers data from damaged bzip2 files
|
||||
</UL>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC4" HREF="manual_toc.html#TOC4">SYNOPSIS</A></H4>
|
||||
|
||||
<UL>
|
||||
<LI><CODE>bzip2</CODE> [ -cdfkqstvzVL123456789 ] [ filenames ... ]
|
||||
|
||||
<LI><CODE>bunzip2</CODE> [ -fkvsVL ] [ filenames ... ]
|
||||
|
||||
<LI><CODE>bzcat</CODE> [ -s ] [ filenames ... ]
|
||||
|
||||
<LI><CODE>bzip2recover</CODE> filename
|
||||
|
||||
</UL>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC5" HREF="manual_toc.html#TOC5">DESCRIPTION</A></H4>
|
||||
|
||||
<P>
|
||||
<CODE>bzip2</CODE> compresses files using the Burrows-Wheeler block sorting
|
||||
text compression algorithm, and Huffman coding. Compression is
|
||||
generally considerably better than that achieved by more conventional
|
||||
LZ77/LZ78-based compressors, and approaches the performance of the PPM
|
||||
family of statistical compressors.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
The command-line options are deliberately very similar to those of GNU
|
||||
<CODE>gzip</CODE>, but they are not identical.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2</CODE> expects a list of file names to accompany the command-line
|
||||
flags. Each file is replaced by a compressed version of itself, with
|
||||
the name <CODE>original_name.bz2</CODE>. Each compressed file has the same
|
||||
modification date, permissions, and, when possible, ownership as the
|
||||
corresponding original, so that these properties can be correctly
|
||||
restored at decompression time. File name handling is naive in the
|
||||
sense that there is no mechanism for preserving original file names,
|
||||
permissions, ownerships or dates in filesystems which lack these
|
||||
concepts, or have serious file name length restrictions, such as MS-DOS.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2</CODE> and <CODE>bunzip2</CODE> will by default not overwrite existing
|
||||
files. If you want this to happen, specify the <CODE>-f</CODE> flag.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
If no file names are specified, <CODE>bzip2</CODE> compresses from standard
|
||||
input to standard output. In this case, <CODE>bzip2</CODE> will decline to
|
||||
write compressed output to a terminal, as this would be entirely
|
||||
incomprehensible and therefore pointless.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bunzip2</CODE> (or <CODE>bzip2 -d</CODE>) decompresses all
|
||||
specified files. Files which were not created by <CODE>bzip2</CODE>
|
||||
will be detected and ignored, and a warning issued.
|
||||
<CODE>bzip2</CODE> attempts to guess the filename for the decompressed file
|
||||
from that of the compressed file as follows:
|
||||
|
||||
<UL>
|
||||
<LI><CODE>filename.bz2 </CODE> becomes <CODE>filename</CODE>
|
||||
|
||||
<LI><CODE>filename.bz </CODE> becomes <CODE>filename</CODE>
|
||||
|
||||
<LI><CODE>filename.tbz2</CODE> becomes <CODE>filename.tar</CODE>
|
||||
|
||||
<LI><CODE>filename.tbz </CODE> becomes <CODE>filename.tar</CODE>
|
||||
|
||||
<LI><CODE>anyothername </CODE> becomes <CODE>anyothername.out</CODE>
|
||||
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
If the file does not end in one of the recognised endings,
|
||||
<CODE>.bz2</CODE>, <CODE>.bz</CODE>,
|
||||
<CODE>.tbz2</CODE> or <CODE>.tbz</CODE>, <CODE>bzip2</CODE> complains that it cannot
|
||||
guess the name of the original file, and uses the original name
|
||||
with <CODE>.out</CODE> appended.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
As with compression, supplying no
|
||||
filenames causes decompression from standard input to standard output.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bunzip2</CODE> will correctly decompress a file which is the
|
||||
concatenation of two or more compressed files. The result is the
|
||||
concatenation of the corresponding uncompressed files. Integrity
|
||||
testing (<CODE>-t</CODE>) of concatenated compressed files is also supported.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
You can also compress or decompress files to the standard output by
|
||||
giving the <CODE>-c</CODE> flag. Multiple files may be compressed and
|
||||
decompressed like this. The resulting outputs are fed sequentially to
|
||||
stdout. Compression of multiple files in this manner generates a stream
|
||||
containing multiple compressed file representations. Such a stream
|
||||
can be decompressed correctly only by <CODE>bzip2</CODE> version 0.9.0 or
|
||||
later. Earlier versions of <CODE>bzip2</CODE> will stop after decompressing
|
||||
the first file in the stream.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzcat</CODE> (or <CODE>bzip2 -dc</CODE>) decompresses all specified files to
|
||||
the standard output.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2</CODE> will read arguments from the environment variables
|
||||
<CODE>BZIP2</CODE> and <CODE>BZIP</CODE>, in that order, and will process them
|
||||
before any arguments read from the command line. This gives a
|
||||
convenient way to supply default arguments.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Compression is always performed, even if the compressed file is slightly
|
||||
larger than the original. Files of less than about one hundred bytes
|
||||
tend to get larger, since the compression mechanism has a constant
|
||||
overhead in the region of 50 bytes. Random data (including the output
|
||||
of most file compressors) is coded at about 8.05 bits per byte, giving
|
||||
an expansion of around 0.5%.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
As a self-check for your protection, <CODE>bzip2</CODE> uses 32-bit CRCs to
|
||||
make sure that the decompressed version of a file is identical to the
|
||||
original. This guards against corruption of the compressed data, and
|
||||
against undetected bugs in <CODE>bzip2</CODE> (hopefully very unlikely). The
|
||||
chances of data corruption going undetected is microscopic, about one
|
||||
chance in four billion for each file processed. Be aware, though, that
|
||||
the check occurs upon decompression, so it can only tell you that
|
||||
something is wrong. It can't help you recover the original uncompressed
|
||||
data. You can use <CODE>bzip2recover</CODE> to try to recover data from
|
||||
damaged files.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Return values: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
|
||||
compressed file, 3 for an internal consistency error (eg, bug) which
|
||||
caused <CODE>bzip2</CODE> to panic.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC6" HREF="manual_toc.html#TOC6">OPTIONS</A></H4>
|
||||
<DL COMPACT>
|
||||
|
||||
<DT><CODE>-c --stdout</CODE>
|
||||
<DD>
|
||||
Compress or decompress to standard output.
|
||||
<DT><CODE>-d --decompress</CODE>
|
||||
<DD>
|
||||
Force decompression. <CODE>bzip2</CODE>, <CODE>bunzip2</CODE> and <CODE>bzcat</CODE> are
|
||||
really the same program, and the decision about what actions to take is
|
||||
done on the basis of which name is used. This flag overrides that
|
||||
mechanism, and forces bzip2 to decompress.
|
||||
<DT><CODE>-z --compress</CODE>
|
||||
<DD>
|
||||
The complement to <CODE>-d</CODE>: forces compression, regardless of the
|
||||
invokation name.
|
||||
<DT><CODE>-t --test</CODE>
|
||||
<DD>
|
||||
Check integrity of the specified file(s), but don't decompress them.
|
||||
This really performs a trial decompression and throws away the result.
|
||||
<DT><CODE>-f --force</CODE>
|
||||
<DD>
|
||||
Force overwrite of output files. Normally, <CODE>bzip2</CODE> will not overwrite
|
||||
existing output files. Also forces <CODE>bzip2</CODE> to break hard links
|
||||
to files, which it otherwise wouldn't do.
|
||||
<DT><CODE>-k --keep</CODE>
|
||||
<DD>
|
||||
Keep (don't delete) input files during compression
|
||||
or decompression.
|
||||
<DT><CODE>-s --small</CODE>
|
||||
<DD>
|
||||
Reduce memory usage, for compression, decompression and testing. Files
|
||||
are decompressed and tested using a modified algorithm which only
|
||||
requires 2.5 bytes per block byte. This means any file can be
|
||||
decompressed in 2300k of memory, albeit at about half the normal speed.
|
||||
|
||||
During compression, <CODE>-s</CODE> selects a block size of 200k, which limits
|
||||
memory use to around the same figure, at the expense of your compression
|
||||
ratio. In short, if your machine is low on memory (8 megabytes or
|
||||
less), use -s for everything. See MEMORY MANAGEMENT below.
|
||||
<DT><CODE>-q --quiet</CODE>
|
||||
<DD>
|
||||
Suppress non-essential warning messages. Messages pertaining to
|
||||
I/O errors and other critical events will not be suppressed.
|
||||
<DT><CODE>-v --verbose</CODE>
|
||||
<DD>
|
||||
Verbose mode -- show the compression ratio for each file processed.
|
||||
Further <CODE>-v</CODE>'s increase the verbosity level, spewing out lots of
|
||||
information which is primarily of interest for diagnostic purposes.
|
||||
<DT><CODE>-L --license -V --version</CODE>
|
||||
<DD>
|
||||
Display the software version, license terms and conditions.
|
||||
<DT><CODE>-1 to -9</CODE>
|
||||
<DD>
|
||||
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
|
||||
effect when decompressing. See MEMORY MANAGEMENT below.
|
||||
<DT><CODE>--</CODE>
|
||||
<DD>
|
||||
Treats all subsequent arguments as file names, even if they start
|
||||
with a dash. This is so you can handle files with names beginning
|
||||
with a dash, for example: <CODE>bzip2 -- -myfilename</CODE>.
|
||||
<DT><CODE>--repetitive-fast</CODE>
|
||||
<DD>
|
||||
<DT><CODE>--repetitive-best</CODE>
|
||||
<DD>
|
||||
These flags are redundant in versions 0.9.5 and above. They provided
|
||||
some coarse control over the behaviour of the sorting algorithm in
|
||||
earlier versions, which was sometimes useful. 0.9.5 and above have an
|
||||
improved algorithm which renders these flags irrelevant.
|
||||
</DL>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC7" HREF="manual_toc.html#TOC7">MEMORY MANAGEMENT</A></H4>
|
||||
|
||||
<P>
|
||||
<CODE>bzip2</CODE> compresses large files in blocks. The block size affects
|
||||
both the compression ratio achieved, and the amount of memory needed for
|
||||
compression and decompression. The flags <CODE>-1</CODE> through <CODE>-9</CODE>
|
||||
specify the block size to be 100,000 bytes through 900,000 bytes (the
|
||||
default) respectively. At decompression time, the block size used for
|
||||
compression is read from the header of the compressed file, and
|
||||
<CODE>bunzip2</CODE> then allocates itself just enough memory to decompress
|
||||
the file. Since block sizes are stored in compressed files, it follows
|
||||
that the flags <CODE>-1</CODE> to <CODE>-9</CODE> are irrelevant to and so ignored
|
||||
during decompression.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Compression and decompression requirements, in bytes, can be estimated
|
||||
as:
|
||||
|
||||
<PRE>
|
||||
Compression: 400k + ( 8 x block size )
|
||||
|
||||
Decompression: 100k + ( 4 x block size ), or
|
||||
100k + ( 2.5 x block size )
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
Larger block sizes give rapidly diminishing marginal returns. Most of
|
||||
the compression comes from the first two or three hundred k of block
|
||||
size, a fact worth bearing in mind when using <CODE>bzip2</CODE> on small machines.
|
||||
It is also important to appreciate that the decompression memory
|
||||
requirement is set at compression time by the choice of block size.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
For files compressed with the default 900k block size, <CODE>bunzip2</CODE>
|
||||
will require about 3700 kbytes to decompress. To support decompression
|
||||
of any file on a 4 megabyte machine, <CODE>bunzip2</CODE> has an option to
|
||||
decompress using approximately half this amount of memory, about 2300
|
||||
kbytes. Decompression speed is also halved, so you should use this
|
||||
option only where necessary. The relevant flag is <CODE>-s</CODE>.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
In general, try and use the largest block size memory constraints allow,
|
||||
since that maximises the compression achieved. Compression and
|
||||
decompression speed are virtually unaffected by block size.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Another significant point applies to files which fit in a single block
|
||||
-- that means most files you'd encounter using a large block size. The
|
||||
amount of real memory touched is proportional to the size of the file,
|
||||
since the file is smaller than a block. For example, compressing a file
|
||||
20,000 bytes long with the flag <CODE>-9</CODE> will cause the compressor to
|
||||
allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
|
||||
kbytes of it. Similarly, the decompressor will allocate 3700k but only
|
||||
touch 100k + 20000 * 4 = 180 kbytes.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Here is a table which summarises the maximum memory usage for different
|
||||
block sizes. Also recorded is the total compressed size for 14 files of
|
||||
the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
|
||||
column gives some feel for how compression varies with block size.
|
||||
These figures tend to understate the advantage of larger block sizes for
|
||||
larger files, since the Corpus is dominated by smaller files.
|
||||
|
||||
<PRE>
|
||||
Compress Decompress Decompress Corpus
|
||||
Flag usage usage -s usage Size
|
||||
|
||||
-1 1200k 500k 350k 914704
|
||||
-2 2000k 900k 600k 877703
|
||||
-3 2800k 1300k 850k 860338
|
||||
-4 3600k 1700k 1100k 846899
|
||||
-5 4400k 2100k 1350k 845160
|
||||
-6 5200k 2500k 1600k 838626
|
||||
-7 6100k 2900k 1850k 834096
|
||||
-8 6800k 3300k 2100k 828642
|
||||
-9 7600k 3700k 2350k 828642
|
||||
</PRE>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC8" HREF="manual_toc.html#TOC8">RECOVERING DATA FROM DAMAGED FILES</A></H4>
|
||||
|
||||
<P>
|
||||
<CODE>bzip2</CODE> compresses files in blocks, usually 900kbytes long. Each
|
||||
block is handled independently. If a media or transmission error causes
|
||||
a multi-block <CODE>.bz2</CODE> file to become damaged, it may be possible to
|
||||
recover data from the undamaged blocks in the file.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
The compressed representation of each block is delimited by a 48-bit
|
||||
pattern, which makes it possible to find the block boundaries with
|
||||
reasonable certainty. Each block also carries its own 32-bit CRC, so
|
||||
damaged blocks can be distinguished from undamaged ones.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2recover</CODE> is a simple program whose purpose is to search for
|
||||
blocks in <CODE>.bz2</CODE> files, and write each block out into its own
|
||||
<CODE>.bz2</CODE> file. You can then use <CODE>bzip2 -t</CODE> to test the
|
||||
integrity of the resulting files, and decompress those which are
|
||||
undamaged.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2recover</CODE>
|
||||
takes a single argument, the name of the damaged file,
|
||||
and writes a number of files <CODE>rec0001file.bz2</CODE>,
|
||||
<CODE>rec0002file.bz2</CODE>, etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example,
|
||||
<CODE>bzip2 -dc rec*file.bz2 > recovered_data</CODE> -- lists the files in
|
||||
the correct order.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2recover</CODE> should be of most use dealing with large <CODE>.bz2</CODE>
|
||||
files, as these will contain many blocks. It is clearly
|
||||
futile to use it on damaged single-block files, since a
|
||||
damaged block cannot be recovered. If you wish to minimise
|
||||
any potential data loss through media or transmission errors,
|
||||
you might consider compressing with a smaller
|
||||
block size.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC9" HREF="manual_toc.html#TOC9">PERFORMANCE NOTES</A></H4>
|
||||
|
||||
<P>
|
||||
The sorting phase of compression gathers together similar strings in the
|
||||
file. Because of this, files containing very long runs of repeated
|
||||
symbols, like "aabaabaabaab ..." (repeated several hundred times) may
|
||||
compress more slowly than normal. Versions 0.9.5 and above fare much
|
||||
better than previous versions in this respect. The ratio between
|
||||
worst-case and average-case compression time is in the region of 10:1.
|
||||
For previous versions, this figure was more like 100:1. You can use the
|
||||
<CODE>-vvvv</CODE> option to monitor progress in great detail, if you want.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Decompression speed is unaffected by these phenomena.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2</CODE> usually allocates several megabytes of memory to operate
|
||||
in, and then charges all over it in a fairly random fashion. This means
|
||||
that performance, both for compressing and decompressing, is largely
|
||||
determined by the speed at which your machine can service cache misses.
|
||||
Because of this, small changes to the code to reduce the miss rate have
|
||||
been observed to give disproportionately large performance improvements.
|
||||
I imagine <CODE>bzip2</CODE> will perform best on machines with very large
|
||||
caches.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC10" HREF="manual_toc.html#TOC10">CAVEATS</A></H4>
|
||||
|
||||
<P>
|
||||
I/O error messages are not as helpful as they could be. <CODE>bzip2</CODE>
|
||||
tries hard to detect I/O errors and exit cleanly, but the details of
|
||||
what the problem is sometimes seem rather misleading.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
This manual page pertains to version 1.0 of <CODE>bzip2</CODE>. Compressed
|
||||
data created by this version is entirely forwards and backwards
|
||||
compatible with the previous public releases, versions 0.1pl2, 0.9.0 and
|
||||
0.9.5, but with the following exception: 0.9.0 and above can correctly
|
||||
decompress multiple concatenated compressed files. 0.1pl2 cannot do
|
||||
this; it will stop after decompressing just the first file in the
|
||||
stream.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2recover</CODE> uses 32-bit integers to represent bit positions in
|
||||
compressed files, so it cannot handle compressed files more than 512
|
||||
megabytes long. This could easily be fixed.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H4><A NAME="SEC11" HREF="manual_toc.html#TOC11">AUTHOR</A></H4>
|
||||
<P>
|
||||
Julian Seward, <CODE>jseward@acm.org</CODE>.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
The ideas embodied in <CODE>bzip2</CODE> are due to (at least) the following
|
||||
people: Michael Burrows and David Wheeler (for the block sorting
|
||||
transformation), David Wheeler (again, for the Huffman coder), Peter
|
||||
Fenwick (for the structured coding model in the original <CODE>bzip</CODE>,
|
||||
and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
|
||||
(for the arithmetic coder in the original <CODE>bzip</CODE>). I am much
|
||||
indebted for their help, support and advice. See the manual in the
|
||||
source distribution for pointers to sources of documentation. Christian
|
||||
von Roques encouraged me to look for faster sorting algorithms, so as to
|
||||
speed up compression. Bela Lubkin encouraged me to improve the
|
||||
worst-case compression performance. Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and were generally
|
||||
helpful.
|
||||
|
||||
</P>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
<P><HR><P>
|
||||
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_1.html">previous</A>, <A HREF="manual_3.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
|
||||
</BODY>
|
||||
</HTML>
|
File diff suppressed because it is too large
Load Diff
|
@ -1,528 +0,0 @@
|
|||
<HTML>
|
||||
<HEAD>
|
||||
<!-- This HTML file has been created by texi2html 1.54
|
||||
from manual.texi on 23 March 2000 -->
|
||||
|
||||
<TITLE>bzip2 and libbzip2 - Miscellanea</TITLE>
|
||||
<link href="manual_3.html" rel=Previous>
|
||||
<link href="manual_toc.html" rel=ToC>
|
||||
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_3.html">previous</A>, next, last section, <A HREF="manual_toc.html">table of contents</A>.
|
||||
<P><HR><P>
|
||||
|
||||
|
||||
<H1><A NAME="SEC43" HREF="manual_toc.html#TOC43">Miscellanea</A></H1>
|
||||
|
||||
<P>
|
||||
These are just some random thoughts of mine. Your mileage may
|
||||
vary.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
<H2><A NAME="SEC44" HREF="manual_toc.html#TOC44">Limitations of the compressed file format</A></H2>
|
||||
<P>
|
||||
<CODE>bzip2-1.0</CODE>, <CODE>0.9.5</CODE> and <CODE>0.9.0</CODE>
|
||||
use exactly the same file format as the previous
|
||||
version, <CODE>bzip2-0.1</CODE>. This decision was made in the interests of
|
||||
stability. Creating yet another incompatible compressed file format
|
||||
would create further confusion and disruption for users.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Nevertheless, this is not a painless decision. Development
|
||||
work since the release of <CODE>bzip2-0.1</CODE> in August 1997
|
||||
has shown complexities in the file format which slow down
|
||||
decompression and, in retrospect, are unnecessary. These are:
|
||||
|
||||
<UL>
|
||||
<LI>The run-length encoder, which is the first of the
|
||||
|
||||
compression transformations, is entirely irrelevant.
|
||||
The original purpose was to protect the sorting algorithm
|
||||
from the very worst case input: a string of repeated
|
||||
symbols. But algorithm steps Q6a and Q6b in the original
|
||||
Burrows-Wheeler technical report (SRC-124) show how
|
||||
repeats can be handled without difficulty in block
|
||||
sorting.
|
||||
<LI>The randomisation mechanism doesn't really need to be
|
||||
|
||||
there. Udi Manber and Gene Myers published a suffix
|
||||
array construction algorithm a few years back, which
|
||||
can be employed to sort any block, no matter how
|
||||
repetitive, in O(N log N) time. Subsequent work by
|
||||
Kunihiko Sadakane has produced a derivative O(N (log N)^2)
|
||||
algorithm which usually outperforms the Manber-Myers
|
||||
algorithm.
|
||||
|
||||
I could have changed to Sadakane's algorithm, but I find
|
||||
it to be slower than <CODE>bzip2</CODE>'s existing algorithm for
|
||||
most inputs, and the randomisation mechanism protects
|
||||
adequately against bad cases. I didn't think it was
|
||||
a good tradeoff to make. Partly this is due to the fact
|
||||
that I was not flooded with email complaints about
|
||||
<CODE>bzip2-0.1</CODE>'s performance on repetitive data, so
|
||||
perhaps it isn't a problem for real inputs.
|
||||
|
||||
Probably the best long-term solution,
|
||||
and the one I have incorporated into 0.9.5 and above,
|
||||
is to use the existing sorting
|
||||
algorithm initially, and fall back to a O(N (log N)^2)
|
||||
algorithm if the standard algorithm gets into difficulties.
|
||||
<LI>The compressed file format was never designed to be
|
||||
|
||||
handled by a library, and I have had to jump though
|
||||
some hoops to produce an efficient implementation of
|
||||
decompression. It's a bit hairy. Try passing
|
||||
<CODE>decompress.c</CODE> through the C preprocessor
|
||||
and you'll see what I mean. Much of this complexity
|
||||
could have been avoided if the compressed size of
|
||||
each block of data was recorded in the data stream.
|
||||
<LI>An Adler-32 checksum, rather than a CRC32 checksum,
|
||||
|
||||
would be faster to compute.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
It would be fair to say that the <CODE>bzip2</CODE> format was frozen
|
||||
before I properly and fully understood the performance
|
||||
consequences of doing so.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Improvements which I was able to incorporate into
|
||||
0.9.0, despite using the same file format, are:
|
||||
|
||||
<UL>
|
||||
<LI>Single array implementation of the inverse BWT. This
|
||||
|
||||
significantly speeds up decompression, presumably
|
||||
because it reduces the number of cache misses.
|
||||
<LI>Faster inverse MTF transform for large MTF values. The
|
||||
|
||||
new implementation is based on the notion of sliding blocks
|
||||
of values.
|
||||
<LI><CODE>bzip2-0.9.0</CODE> now reads and writes files with <CODE>fread</CODE>
|
||||
|
||||
and <CODE>fwrite</CODE>; version 0.1 used <CODE>putc</CODE> and <CODE>getc</CODE>.
|
||||
Duh! Well, you live and learn.
|
||||
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Further ahead, it would be nice
|
||||
to be able to do random access into files. This will
|
||||
require some careful design of compressed file formats.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H2><A NAME="SEC45" HREF="manual_toc.html#TOC45">Portability issues</A></H2>
|
||||
<P>
|
||||
After some consideration, I have decided not to use
|
||||
GNU <CODE>autoconf</CODE> to configure 0.9.5 or 1.0.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>autoconf</CODE>, admirable and wonderful though it is,
|
||||
mainly assists with portability problems between Unix-like
|
||||
platforms. But <CODE>bzip2</CODE> doesn't have much in the way
|
||||
of portability problems on Unix; most of the difficulties appear
|
||||
when porting to the Mac, or to Microsoft's operating systems.
|
||||
<CODE>autoconf</CODE> doesn't help in those cases, and brings in a
|
||||
whole load of new complexity.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Most people should be able to compile the library and program
|
||||
under Unix straight out-of-the-box, so to speak, especially
|
||||
if you have a version of GNU C available.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
There are a couple of <CODE>__inline__</CODE> directives in the code. GNU C
|
||||
(<CODE>gcc</CODE>) should be able to handle them. If you're not using
|
||||
GNU C, your C compiler shouldn't see them at all.
|
||||
If your compiler does, for some reason, see them and doesn't
|
||||
like them, just <CODE>#define</CODE> <CODE>__inline__</CODE> to be <CODE>/* */</CODE>. One
|
||||
easy way to do this is to compile with the flag <CODE>-D__inline__=</CODE>,
|
||||
which should be understood by most Unix compilers.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
If you still have difficulties, try compiling with the macro
|
||||
<CODE>BZ_STRICT_ANSI</CODE> defined. This should enable you to build the
|
||||
library in a strictly ANSI compliant environment. Building the program
|
||||
itself like this is dangerous and not supported, since you remove
|
||||
<CODE>bzip2</CODE>'s checks against compressing directories, symbolic links,
|
||||
devices, and other not-really-a-file entities. This could cause
|
||||
filesystem corruption!
|
||||
|
||||
</P>
|
||||
<P>
|
||||
One other thing: if you create a <CODE>bzip2</CODE> binary for public
|
||||
distribution, please try and link it statically (<CODE>gcc -s</CODE>). This
|
||||
avoids all sorts of library-version issues that others may encounter
|
||||
later on.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
If you build <CODE>bzip2</CODE> on Win32, you must set <CODE>BZ_UNIX</CODE> to 0 and
|
||||
<CODE>BZ_LCCWIN32</CODE> to 1, in the file <CODE>bzip2.c</CODE>, before compiling.
|
||||
Otherwise the resulting binary won't work correctly.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H2><A NAME="SEC46" HREF="manual_toc.html#TOC46">Reporting bugs</A></H2>
|
||||
<P>
|
||||
I tried pretty hard to make sure <CODE>bzip2</CODE> is
|
||||
bug free, both by design and by testing. Hopefully
|
||||
you'll never need to read this section for real.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Nevertheless, if <CODE>bzip2</CODE> dies with a segmentation
|
||||
fault, a bus error or an internal assertion failure, it
|
||||
will ask you to email me a bug report. Experience with
|
||||
version 0.1 shows that almost all these problems can
|
||||
be traced to either compiler bugs or hardware problems.
|
||||
|
||||
<UL>
|
||||
<LI>
|
||||
|
||||
Recompile the program with no optimisation, and see if it
|
||||
works. And/or try a different compiler.
|
||||
I heard all sorts of stories about various flavours
|
||||
of GNU C (and other compilers) generating bad code for
|
||||
<CODE>bzip2</CODE>, and I've run across two such examples myself.
|
||||
|
||||
2.7.X versions of GNU C are known to generate bad code from
|
||||
time to time, at high optimisation levels.
|
||||
If you get problems, try using the flags
|
||||
<CODE>-O2</CODE> <CODE>-fomit-frame-pointer</CODE> <CODE>-fno-strength-reduce</CODE>.
|
||||
You should specifically <EM>not</EM> use <CODE>-funroll-loops</CODE>.
|
||||
|
||||
You may notice that the Makefile runs six tests as part of
|
||||
the build process. If the program passes all of these, it's
|
||||
a pretty good (but not 100%) indication that the compiler has
|
||||
done its job correctly.
|
||||
<LI>
|
||||
|
||||
If <CODE>bzip2</CODE> crashes randomly, and the crashes are not
|
||||
repeatable, you may have a flaky memory subsystem. <CODE>bzip2</CODE>
|
||||
really hammers your memory hierarchy, and if it's a bit marginal,
|
||||
you may get these problems. Ditto if your disk or I/O subsystem
|
||||
is slowly failing. Yup, this really does happen.
|
||||
|
||||
Try using a different machine of the same type, and see if
|
||||
you can repeat the problem.
|
||||
<LI>This isn't really a bug, but ... If <CODE>bzip2</CODE> tells
|
||||
|
||||
you your file is corrupted on decompression, and you
|
||||
obtained the file via FTP, there is a possibility that you
|
||||
forgot to tell FTP to do a binary mode transfer. That absolutely
|
||||
will cause the file to be non-decompressible. You'll have to transfer
|
||||
it again.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
If you've incorporated <CODE>libbzip2</CODE> into your own program
|
||||
and are getting problems, please, please, please, check that the
|
||||
parameters you are passing in calls to the library, are
|
||||
correct, and in accordance with what the documentation says
|
||||
is allowable. I have tried to make the library robust against
|
||||
such problems, but I'm sure I haven't succeeded.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Finally, if the above comments don't help, you'll have to send
|
||||
me a bug report. Now, it's just amazing how many people will
|
||||
send me a bug report saying something like
|
||||
|
||||
<PRE>
|
||||
bzip2 crashed with segmentation fault on my machine
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
and absolutely nothing else. Needless to say, a such a report
|
||||
is <EM>totally, utterly, completely and comprehensively 100% useless;
|
||||
a waste of your time, my time, and net bandwidth</EM>.
|
||||
With no details at all, there's no way I can possibly begin
|
||||
to figure out what the problem is.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
The rules of the game are: facts, facts, facts. Don't omit
|
||||
them because "oh, they won't be relevant". At the bare
|
||||
minimum:
|
||||
|
||||
<PRE>
|
||||
Machine type. Operating system version.
|
||||
Exact version of <CODE>bzip2</CODE> (do <CODE>bzip2 -V</CODE>).
|
||||
Exact version of the compiler used.
|
||||
Flags passed to the compiler.
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
However, the most important single thing that will help me is
|
||||
the file that you were trying to compress or decompress at the
|
||||
time the problem happened. Without that, my ability to do anything
|
||||
more than speculate about the cause, is limited.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Please remember that I connect to the Internet with a modem, so
|
||||
you should contact me before mailing me huge files.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H2><A NAME="SEC47" HREF="manual_toc.html#TOC47">Did you get the right package?</A></H2>
|
||||
|
||||
<P>
|
||||
<CODE>bzip2</CODE> is a resource hog. It soaks up large amounts of CPU cycles
|
||||
and memory. Also, it gives very large latencies. In the worst case, you
|
||||
can feed many megabytes of uncompressed data into the library before
|
||||
getting any compressed output, so this probably rules out applications
|
||||
requiring interactive behaviour.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
These aren't faults of my implementation, I hope, but more
|
||||
an intrinsic property of the Burrows-Wheeler transform (unfortunately).
|
||||
Maybe this isn't what you want.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
If you want a compressor and/or library which is faster, uses less
|
||||
memory but gets pretty good compression, and has minimal latency,
|
||||
consider Jean-loup
|
||||
Gailly's and Mark Adler's work, <CODE>zlib-1.1.2</CODE> and
|
||||
<CODE>gzip-1.2.4</CODE>. Look for them at
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>http://www.cdrom.com/pub/infozip/zlib</CODE> and
|
||||
<CODE>http://www.gzip.org</CODE> respectively.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
For something faster and lighter still, you might try Markus F X J
|
||||
Oberhumer's <CODE>LZO</CODE> real-time compression/decompression library, at
|
||||
<BR> <CODE>http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html</CODE>.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
If you want to use the <CODE>bzip2</CODE> algorithms to compress small blocks
|
||||
of data, 64k bytes or smaller, for example on an on-the-fly disk
|
||||
compressor, you'd be well advised not to use this library. Instead,
|
||||
I've made a special library tuned for that kind of use. It's part of
|
||||
<CODE>e2compr-0.40</CODE>, an on-the-fly disk compressor for the Linux
|
||||
<CODE>ext2</CODE> filesystem. Look at
|
||||
<CODE>http://www.netspace.net.au/~reiter/e2compr</CODE>.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H2><A NAME="SEC48" HREF="manual_toc.html#TOC48">Testing</A></H2>
|
||||
|
||||
<P>
|
||||
A record of the tests I've done.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
First, some data sets:
|
||||
|
||||
<UL>
|
||||
<LI>B: a directory containing 6001 files, one for every length in the
|
||||
|
||||
range 0 to 6000 bytes. The files contain random lowercase
|
||||
letters. 18.7 megabytes.
|
||||
<LI>H: my home directory tree. Documents, source code, mail files,
|
||||
|
||||
compressed data. H contains B, and also a directory of
|
||||
files designed as boundary cases for the sorting; mostly very
|
||||
repetitive, nasty files. 565 megabytes.
|
||||
<LI>A: directory tree holding various applications built from source:
|
||||
|
||||
<CODE>egcs</CODE>, <CODE>gcc-2.8.1</CODE>, KDE, GTK, Octave, etc.
|
||||
2200 megabytes.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The tests conducted are as follows. Each test means compressing
|
||||
(a copy of) each file in the data set, decompressing it and
|
||||
comparing it against the original.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
First, a bunch of tests with block sizes and internal buffer
|
||||
sizes set very small,
|
||||
to detect any problems with the
|
||||
blocking and buffering mechanisms.
|
||||
This required modifying the source code so as to try to
|
||||
break it.
|
||||
|
||||
<OL>
|
||||
<LI>Data set H, with
|
||||
|
||||
buffer size of 1 byte, and block size of 23 bytes.
|
||||
<LI>Data set B, buffer sizes 1 byte, block size 1 byte.
|
||||
|
||||
<LI>As (2) but small-mode decompression.
|
||||
|
||||
<LI>As (2) with block size 2 bytes.
|
||||
|
||||
<LI>As (2) with block size 3 bytes.
|
||||
|
||||
<LI>As (2) with block size 4 bytes.
|
||||
|
||||
<LI>As (2) with block size 5 bytes.
|
||||
|
||||
<LI>As (2) with block size 6 bytes and small-mode decompression.
|
||||
|
||||
<LI>H with buffer size of 1 byte, but normal block
|
||||
|
||||
size (up to 900000 bytes).
|
||||
</OL>
|
||||
|
||||
<P>
|
||||
Then some tests with unmodified source code.
|
||||
|
||||
<OL>
|
||||
<LI>H, all settings normal.
|
||||
|
||||
<LI>As (1), with small-mode decompress.
|
||||
|
||||
<LI>H, compress with flag <CODE>-1</CODE>.
|
||||
|
||||
<LI>H, compress with flag <CODE>-s</CODE>, decompress with flag <CODE>-s</CODE>.
|
||||
|
||||
<LI>Forwards compatibility: H, <CODE>bzip2-0.1pl2</CODE> compressing,
|
||||
|
||||
<CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
|
||||
<LI>Backwards compatibility: H, <CODE>bzip2-0.9.5</CODE> compressing,
|
||||
|
||||
<CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
|
||||
<LI>Bigger tests: A, all settings normal.
|
||||
|
||||
<LI>As (7), using the fallback (Sadakane-like) sorting algorithm.
|
||||
|
||||
<LI>As (8), compress with flag <CODE>-1</CODE>, decompress with flag
|
||||
|
||||
<CODE>-s</CODE>.
|
||||
<LI>H, using the fallback sorting algorithm.
|
||||
|
||||
<LI>Forwards compatibility: A, <CODE>bzip2-0.1pl2</CODE> compressing,
|
||||
|
||||
<CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
|
||||
<LI>Backwards compatibility: A, <CODE>bzip2-0.9.5</CODE> compressing,
|
||||
|
||||
<CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
|
||||
<LI>Misc test: about 400 megabytes of <CODE>.tar</CODE> files with
|
||||
|
||||
<CODE>bzip2</CODE> compiled with Checker (a memory access error
|
||||
detector, like Purify).
|
||||
<LI>Misc tests to make sure it builds and runs ok on non-Linux/x86
|
||||
|
||||
platforms.
|
||||
</OL>
|
||||
|
||||
<P>
|
||||
These tests were conducted on a 225 MHz IDT WinChip machine, running
|
||||
Linux 2.0.36. They represent nearly a week of continuous computation.
|
||||
All tests completed successfully.
|
||||
|
||||
</P>
|
||||
|
||||
|
||||
|
||||
<H2><A NAME="SEC49" HREF="manual_toc.html#TOC49">Further reading</A></H2>
|
||||
<P>
|
||||
<CODE>bzip2</CODE> is not research work, in the sense that it doesn't present
|
||||
any new ideas. Rather, it's an engineering exercise based on existing
|
||||
ideas.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Four documents describe essentially all the ideas behind <CODE>bzip2</CODE>:
|
||||
|
||||
<PRE>
|
||||
Michael Burrows and D. J. Wheeler:
|
||||
"A block-sorting lossless data compression algorithm"
|
||||
10th May 1994.
|
||||
Digital SRC Research Report 124.
|
||||
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
|
||||
If you have trouble finding it, try searching at the
|
||||
New Zealand Digital Library, http://www.nzdl.org.
|
||||
|
||||
Daniel S. Hirschberg and Debra A. LeLewer
|
||||
"Efficient Decoding of Prefix Codes"
|
||||
Communications of the ACM, April 1990, Vol 33, Number 4.
|
||||
You might be able to get an electronic copy of this
|
||||
from the ACM Digital Library.
|
||||
|
||||
David J. Wheeler
|
||||
Program bred3.c and accompanying document bred3.ps.
|
||||
This contains the idea behind the multi-table Huffman
|
||||
coding scheme.
|
||||
ftp://ftp.cl.cam.ac.uk/users/djw3/
|
||||
|
||||
Jon L. Bentley and Robert Sedgewick
|
||||
"Fast Algorithms for Sorting and Searching Strings"
|
||||
Available from Sedgewick's web page,
|
||||
www.cs.princeton.edu/~rs
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
The following paper gives valuable additional insights into the
|
||||
algorithm, but is not immediately the basis of any code
|
||||
used in bzip2.
|
||||
|
||||
<PRE>
|
||||
Peter Fenwick:
|
||||
Block Sorting Text Compression
|
||||
Proceedings of the 19th Australasian Computer Science Conference,
|
||||
Melbourne, Australia. Jan 31 - Feb 2, 1996.
|
||||
ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
Kunihiko Sadakane's sorting algorithm, mentioned above,
|
||||
is available from:
|
||||
|
||||
<PRE>
|
||||
http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
The Manber-Myers suffix array construction
|
||||
algorithm is described in a paper
|
||||
available from:
|
||||
|
||||
<PRE>
|
||||
http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
Finally, the following paper documents some recent investigations
|
||||
I made into the performance of sorting algorithms:
|
||||
|
||||
<PRE>
|
||||
Julian Seward:
|
||||
On the Performance of BWT Sorting Algorithms
|
||||
Proceedings of the IEEE Data Compression Conference 2000
|
||||
Snowbird, Utah. 28-30 March 2000.
|
||||
</PRE>
|
||||
|
||||
<P><HR><P>
|
||||
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_3.html">previous</A>, next, last section, <A HREF="manual_toc.html">table of contents</A>.
|
||||
</BODY>
|
||||
</HTML>
|
|
@ -1,173 +0,0 @@
|
|||
<HTML>
|
||||
<HEAD>
|
||||
<!-- This HTML file has been created by texi2html 1.54
|
||||
from manual.texi on 23 March 2000 -->
|
||||
|
||||
<TITLE>bzip2 and libbzip2 - Table of Contents</TITLE>
|
||||
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<H1>bzip2 and libbzip2</H1>
|
||||
<H2>a program and library for data compression</H2>
|
||||
<H2>copyright (C) 1996-2000 Julian Seward</H2>
|
||||
<H2>version 1.0 of 21 March 2000</H2>
|
||||
<ADDRESS>Julian Seward</ADDRESS>
|
||||
<P>
|
||||
<P><HR><P>
|
||||
|
||||
<P>
|
||||
This program, <CODE>bzip2</CODE>,
|
||||
and associated library <CODE>libbzip2</CODE>, are
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
<UL>
|
||||
<LI>
|
||||
|
||||
Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
<LI>
|
||||
|
||||
The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
<LI>
|
||||
|
||||
Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
<LI>
|
||||
|
||||
The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Julian Seward, Cambridge, UK.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>jseward@acm.org</CODE>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>http://sourceware.cygnus.com/bzip2</CODE>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>http://www.cacheprof.org</CODE>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>http://www.muraroa.demon.co.uk</CODE>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>bzip2</CODE>/<CODE>libbzip2</CODE> version 1.0 of 21 March 2000.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
PATENTS: To the best of my knowledge, <CODE>bzip2</CODE> does not use any patented
|
||||
algorithms. However, I do not have the resources available to carry out
|
||||
a full patent search. Therefore I cannot give any guarantee of the
|
||||
above statement.
|
||||
|
||||
</P>
|
||||
|
||||
<UL>
|
||||
<LI><A NAME="TOC1" HREF="manual_1.html#SEC1">Introduction</A>
|
||||
<LI><A NAME="TOC2" HREF="manual_2.html#SEC2">How to use <CODE>bzip2</CODE></A>
|
||||
<UL>
|
||||
<UL>
|
||||
<UL>
|
||||
<LI><A NAME="TOC3" HREF="manual_2.html#SEC3">NAME</A>
|
||||
<LI><A NAME="TOC4" HREF="manual_2.html#SEC4">SYNOPSIS</A>
|
||||
<LI><A NAME="TOC5" HREF="manual_2.html#SEC5">DESCRIPTION</A>
|
||||
<LI><A NAME="TOC6" HREF="manual_2.html#SEC6">OPTIONS</A>
|
||||
<LI><A NAME="TOC7" HREF="manual_2.html#SEC7">MEMORY MANAGEMENT</A>
|
||||
<LI><A NAME="TOC8" HREF="manual_2.html#SEC8">RECOVERING DATA FROM DAMAGED FILES</A>
|
||||
<LI><A NAME="TOC9" HREF="manual_2.html#SEC9">PERFORMANCE NOTES</A>
|
||||
<LI><A NAME="TOC10" HREF="manual_2.html#SEC10">CAVEATS</A>
|
||||
<LI><A NAME="TOC11" HREF="manual_2.html#SEC11">AUTHOR</A>
|
||||
</UL>
|
||||
</UL>
|
||||
</UL>
|
||||
<LI><A NAME="TOC12" HREF="manual_3.html#SEC12">Programming with <CODE>libbzip2</CODE></A>
|
||||
<UL>
|
||||
<LI><A NAME="TOC13" HREF="manual_3.html#SEC13">Top-level structure</A>
|
||||
<UL>
|
||||
<LI><A NAME="TOC14" HREF="manual_3.html#SEC14">Low-level summary</A>
|
||||
<LI><A NAME="TOC15" HREF="manual_3.html#SEC15">High-level summary</A>
|
||||
<LI><A NAME="TOC16" HREF="manual_3.html#SEC16">Utility functions summary</A>
|
||||
</UL>
|
||||
<LI><A NAME="TOC17" HREF="manual_3.html#SEC17">Error handling</A>
|
||||
<LI><A NAME="TOC18" HREF="manual_3.html#SEC18">Low-level interface</A>
|
||||
<UL>
|
||||
<LI><A NAME="TOC19" HREF="manual_3.html#SEC19"><CODE>BZ2_bzCompressInit</CODE></A>
|
||||
<LI><A NAME="TOC20" HREF="manual_3.html#SEC20"><CODE>BZ2_bzCompress</CODE></A>
|
||||
<LI><A NAME="TOC21" HREF="manual_3.html#SEC21"><CODE>BZ2_bzCompressEnd</CODE></A>
|
||||
<LI><A NAME="TOC22" HREF="manual_3.html#SEC22"><CODE>BZ2_bzDecompressInit</CODE></A>
|
||||
<LI><A NAME="TOC23" HREF="manual_3.html#SEC23"><CODE>BZ2_bzDecompress</CODE></A>
|
||||
<LI><A NAME="TOC24" HREF="manual_3.html#SEC24"><CODE>BZ2_bzDecompressEnd</CODE></A>
|
||||
</UL>
|
||||
<LI><A NAME="TOC25" HREF="manual_3.html#SEC25">High-level interface</A>
|
||||
<UL>
|
||||
<LI><A NAME="TOC26" HREF="manual_3.html#SEC26"><CODE>BZ2_bzReadOpen</CODE></A>
|
||||
<LI><A NAME="TOC27" HREF="manual_3.html#SEC27"><CODE>BZ2_bzRead</CODE></A>
|
||||
<LI><A NAME="TOC28" HREF="manual_3.html#SEC28"><CODE>BZ2_bzReadGetUnused</CODE></A>
|
||||
<LI><A NAME="TOC29" HREF="manual_3.html#SEC29"><CODE>BZ2_bzReadClose</CODE></A>
|
||||
<LI><A NAME="TOC30" HREF="manual_3.html#SEC30"><CODE>BZ2_bzWriteOpen</CODE></A>
|
||||
<LI><A NAME="TOC31" HREF="manual_3.html#SEC31"><CODE>BZ2_bzWrite</CODE></A>
|
||||
<LI><A NAME="TOC32" HREF="manual_3.html#SEC32"><CODE>BZ2_bzWriteClose</CODE></A>
|
||||
<LI><A NAME="TOC33" HREF="manual_3.html#SEC33">Handling embedded compressed data streams</A>
|
||||
<LI><A NAME="TOC34" HREF="manual_3.html#SEC34">Standard file-reading/writing code</A>
|
||||
</UL>
|
||||
<LI><A NAME="TOC35" HREF="manual_3.html#SEC35">Utility functions</A>
|
||||
<UL>
|
||||
<LI><A NAME="TOC36" HREF="manual_3.html#SEC36"><CODE>BZ2_bzBuffToBuffCompress</CODE></A>
|
||||
<LI><A NAME="TOC37" HREF="manual_3.html#SEC37"><CODE>BZ2_bzBuffToBuffDecompress</CODE></A>
|
||||
</UL>
|
||||
<LI><A NAME="TOC38" HREF="manual_3.html#SEC38"><CODE>zlib</CODE> compatibility functions</A>
|
||||
<LI><A NAME="TOC39" HREF="manual_3.html#SEC39">Using the library in a <CODE>stdio</CODE>-free environment</A>
|
||||
<UL>
|
||||
<LI><A NAME="TOC40" HREF="manual_3.html#SEC40">Getting rid of <CODE>stdio</CODE></A>
|
||||
<LI><A NAME="TOC41" HREF="manual_3.html#SEC41">Critical error handling</A>
|
||||
</UL>
|
||||
<LI><A NAME="TOC42" HREF="manual_3.html#SEC42">Making a Windows DLL</A>
|
||||
</UL>
|
||||
<LI><A NAME="TOC43" HREF="manual_4.html#SEC43">Miscellanea</A>
|
||||
<UL>
|
||||
<LI><A NAME="TOC44" HREF="manual_4.html#SEC44">Limitations of the compressed file format</A>
|
||||
<LI><A NAME="TOC45" HREF="manual_4.html#SEC45">Portability issues</A>
|
||||
<LI><A NAME="TOC46" HREF="manual_4.html#SEC46">Reporting bugs</A>
|
||||
<LI><A NAME="TOC47" HREF="manual_4.html#SEC47">Did you get the right package?</A>
|
||||
<LI><A NAME="TOC48" HREF="manual_4.html#SEC48">Testing</A>
|
||||
<LI><A NAME="TOC49" HREF="manual_4.html#SEC49">Further reading</A>
|
||||
</UL>
|
||||
</UL>
|
||||
<P><HR><P>
|
||||
This document was generated on 23 March 2000 using the
|
||||
<A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A>
|
||||
translator version 1.51a.</P>
|
||||
</BODY>
|
||||
</HTML>
|
|
@ -1,124 +0,0 @@
|
|||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- Table for randomising repetitive blocks ---*/
|
||||
/*--- randtable.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
||||
|
||||
/*--
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
2. The origin of this software must not be misrepresented; you must
|
||||
not claim that you wrote the original software. If you use this
|
||||
software in a product, an acknowledgment in the product
|
||||
documentation would be appreciated but is not required.
|
||||
|
||||
3. Altered source versions must be plainly marked as such, and must
|
||||
not be misrepresented as being the original software.
|
||||
|
||||
4. The name of the author may not be used to endorse or promote
|
||||
products derived from this software without specific prior written
|
||||
permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
|
||||
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
|
||||
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
||||
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
|
||||
This program is based on (at least) the work of:
|
||||
Mike Burrows
|
||||
David Wheeler
|
||||
Peter Fenwick
|
||||
Alistair Moffat
|
||||
Radford Neal
|
||||
Ian H. Witten
|
||||
Robert Sedgewick
|
||||
Jon L. Bentley
|
||||
|
||||
For more information on these sources, see the manual.
|
||||
--*/
|
||||
|
||||
|
||||
#include "bzlib_private.h"
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
Int32 BZ2_rNums[512] = {
|
||||
619, 720, 127, 481, 931, 816, 813, 233, 566, 247,
|
||||
985, 724, 205, 454, 863, 491, 741, 242, 949, 214,
|
||||
733, 859, 335, 708, 621, 574, 73, 654, 730, 472,
|
||||
419, 436, 278, 496, 867, 210, 399, 680, 480, 51,
|
||||
878, 465, 811, 169, 869, 675, 611, 697, 867, 561,
|
||||
862, 687, 507, 283, 482, 129, 807, 591, 733, 623,
|
||||
150, 238, 59, 379, 684, 877, 625, 169, 643, 105,
|
||||
170, 607, 520, 932, 727, 476, 693, 425, 174, 647,
|
||||
73, 122, 335, 530, 442, 853, 695, 249, 445, 515,
|
||||
909, 545, 703, 919, 874, 474, 882, 500, 594, 612,
|
||||
641, 801, 220, 162, 819, 984, 589, 513, 495, 799,
|
||||
161, 604, 958, 533, 221, 400, 386, 867, 600, 782,
|
||||
382, 596, 414, 171, 516, 375, 682, 485, 911, 276,
|
||||
98, 553, 163, 354, 666, 933, 424, 341, 533, 870,
|
||||
227, 730, 475, 186, 263, 647, 537, 686, 600, 224,
|
||||
469, 68, 770, 919, 190, 373, 294, 822, 808, 206,
|
||||
184, 943, 795, 384, 383, 461, 404, 758, 839, 887,
|
||||
715, 67, 618, 276, 204, 918, 873, 777, 604, 560,
|
||||
951, 160, 578, 722, 79, 804, 96, 409, 713, 940,
|
||||
652, 934, 970, 447, 318, 353, 859, 672, 112, 785,
|
||||
645, 863, 803, 350, 139, 93, 354, 99, 820, 908,
|
||||
609, 772, 154, 274, 580, 184, 79, 626, 630, 742,
|
||||
653, 282, 762, 623, 680, 81, 927, 626, 789, 125,
|
||||
411, 521, 938, 300, 821, 78, 343, 175, 128, 250,
|
||||
170, 774, 972, 275, 999, 639, 495, 78, 352, 126,
|
||||
857, 956, 358, 619, 580, 124, 737, 594, 701, 612,
|
||||
669, 112, 134, 694, 363, 992, 809, 743, 168, 974,
|
||||
944, 375, 748, 52, 600, 747, 642, 182, 862, 81,
|
||||
344, 805, 988, 739, 511, 655, 814, 334, 249, 515,
|
||||
897, 955, 664, 981, 649, 113, 974, 459, 893, 228,
|
||||
433, 837, 553, 268, 926, 240, 102, 654, 459, 51,
|
||||
686, 754, 806, 760, 493, 403, 415, 394, 687, 700,
|
||||
946, 670, 656, 610, 738, 392, 760, 799, 887, 653,
|
||||
978, 321, 576, 617, 626, 502, 894, 679, 243, 440,
|
||||
680, 879, 194, 572, 640, 724, 926, 56, 204, 700,
|
||||
707, 151, 457, 449, 797, 195, 791, 558, 945, 679,
|
||||
297, 59, 87, 824, 713, 663, 412, 693, 342, 606,
|
||||
134, 108, 571, 364, 631, 212, 174, 643, 304, 329,
|
||||
343, 97, 430, 751, 497, 314, 983, 374, 822, 928,
|
||||
140, 206, 73, 263, 980, 736, 876, 478, 430, 305,
|
||||
170, 514, 364, 692, 829, 82, 855, 953, 676, 246,
|
||||
369, 970, 294, 750, 807, 827, 150, 790, 288, 923,
|
||||
804, 378, 215, 828, 592, 281, 565, 555, 710, 82,
|
||||
896, 831, 547, 261, 524, 462, 293, 465, 502, 56,
|
||||
661, 821, 976, 991, 658, 869, 905, 758, 745, 193,
|
||||
768, 550, 608, 933, 378, 286, 215, 979, 792, 961,
|
||||
61, 688, 793, 644, 986, 403, 106, 366, 905, 644,
|
||||
372, 567, 466, 434, 645, 210, 389, 550, 919, 135,
|
||||
780, 773, 635, 389, 707, 100, 626, 958, 165, 504,
|
||||
920, 176, 193, 713, 857, 265, 203, 50, 668, 108,
|
||||
645, 990, 626, 197, 510, 357, 358, 850, 858, 364,
|
||||
936, 638
|
||||
};
|
||||
|
||||
|
||||
/*-------------------------------------------------------------*/
|
||||
/*--- end randtable.c ---*/
|
||||
/*-------------------------------------------------------------*/
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
|
@ -1,39 +0,0 @@
|
|||
|
||||
/* spew out a thoroughly gigantic file designed so that bzip2
|
||||
can compress it reasonably rapidly. This is to help test
|
||||
support for large files (> 2GB) in a reasonable amount of time.
|
||||
I suggest you use the undocumented --exponential option to
|
||||
bzip2 when compressing the resulting file; this saves a bit of
|
||||
time. Note: *don't* bother with --exponential when compressing
|
||||
Real Files; it'll just waste a lot of CPU time :-)
|
||||
(but is otherwise harmless).
|
||||
*/
|
||||
|
||||
#define _FILE_OFFSET_BITS 64
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
/* The number of megabytes of junk to spew out (roughly) */
|
||||
#define MEGABYTES 5000
|
||||
|
||||
#define N_BUF 1000000
|
||||
char buf[N_BUF];
|
||||
|
||||
int main ( int argc, char** argv )
|
||||
{
|
||||
int ii, kk, p;
|
||||
srandom(1);
|
||||
setbuffer ( stdout, buf, N_BUF );
|
||||
for (kk = 0; kk < MEGABYTES * 515; kk+=3) {
|
||||
p = 25+random()%50;
|
||||
for (ii = 0; ii < p; ii++)
|
||||
printf ( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" );
|
||||
for (ii = 0; ii < p-1; ii++)
|
||||
printf ( "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" );
|
||||
for (ii = 0; ii < p+1; ii++)
|
||||
printf ( "ccccccccccccccccccccccccccccccccccccc" );
|
||||
}
|
||||
fflush(stdout);
|
||||
return 0;
|
||||
}
|
|
@ -1,126 +0,0 @@
|
|||
|
||||
/* A test program written to test robustness to decompression of
|
||||
corrupted data. Usage is
|
||||
unzcrash filename
|
||||
and the program will read the specified file, compress it (in memory),
|
||||
and then repeatedly decompress it, each time with a different bit of
|
||||
the compressed data inverted, so as to test all possible one-bit errors.
|
||||
This should not cause any invalid memory accesses. If it does,
|
||||
I want to know about it!
|
||||
|
||||
p.s. As you can see from the above description, the process is
|
||||
incredibly slow. A file of size eg 5KB will cause it to run for
|
||||
many hours.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <assert.h>
|
||||
#include "bzlib.h"
|
||||
|
||||
#define M_BLOCK 1000000
|
||||
|
||||
typedef unsigned char uchar;
|
||||
|
||||
#define M_BLOCK_OUT (M_BLOCK + 1000000)
|
||||
uchar inbuf[M_BLOCK];
|
||||
uchar outbuf[M_BLOCK_OUT];
|
||||
uchar zbuf[M_BLOCK + 600 + (M_BLOCK / 100)];
|
||||
|
||||
int nIn, nOut, nZ;
|
||||
|
||||
static char *bzerrorstrings[] = {
|
||||
"OK"
|
||||
,"SEQUENCE_ERROR"
|
||||
,"PARAM_ERROR"
|
||||
,"MEM_ERROR"
|
||||
,"DATA_ERROR"
|
||||
,"DATA_ERROR_MAGIC"
|
||||
,"IO_ERROR"
|
||||
,"UNEXPECTED_EOF"
|
||||
,"OUTBUFF_FULL"
|
||||
,"???" /* for future */
|
||||
,"???" /* for future */
|
||||
,"???" /* for future */
|
||||
,"???" /* for future */
|
||||
,"???" /* for future */
|
||||
,"???" /* for future */
|
||||
};
|
||||
|
||||
void flip_bit ( int bit )
|
||||
{
|
||||
int byteno = bit / 8;
|
||||
int bitno = bit % 8;
|
||||
uchar mask = 1 << bitno;
|
||||
//fprintf ( stderr, "(byte %d bit %d mask %d)",
|
||||
// byteno, bitno, (int)mask );
|
||||
zbuf[byteno] ^= mask;
|
||||
}
|
||||
|
||||
int main ( int argc, char** argv )
|
||||
{
|
||||
FILE* f;
|
||||
int r;
|
||||
int bit;
|
||||
int i;
|
||||
|
||||
if (argc != 2) {
|
||||
fprintf ( stderr, "usage: unzcrash filename\n" );
|
||||
return 1;
|
||||
}
|
||||
|
||||
f = fopen ( argv[1], "r" );
|
||||
if (!f) {
|
||||
fprintf ( stderr, "unzcrash: can't open %s\n", argv[1] );
|
||||
return 1;
|
||||
}
|
||||
|
||||
nIn = fread ( inbuf, 1, M_BLOCK, f );
|
||||
fprintf ( stderr, "%d bytes read\n", nIn );
|
||||
|
||||
nZ = M_BLOCK;
|
||||
r = BZ2_bzBuffToBuffCompress (
|
||||
zbuf, &nZ, inbuf, nIn, 9, 0, 30 );
|
||||
|
||||
assert (r == BZ_OK);
|
||||
fprintf ( stderr, "%d after compression\n", nZ );
|
||||
|
||||
for (bit = 0; bit < nZ*8; bit++) {
|
||||
fprintf ( stderr, "bit %d ", bit );
|
||||
flip_bit ( bit );
|
||||
nOut = M_BLOCK_OUT;
|
||||
r = BZ2_bzBuffToBuffDecompress (
|
||||
outbuf, &nOut, zbuf, nZ, 0, 0 );
|
||||
fprintf ( stderr, " %d %s ", r, bzerrorstrings[-r] );
|
||||
|
||||
if (r != BZ_OK) {
|
||||
fprintf ( stderr, "\n" );
|
||||
} else {
|
||||
if (nOut != nIn) {
|
||||
fprintf(stderr, "nIn/nOut mismatch %d %d\n", nIn, nOut );
|
||||
return 1;
|
||||
} else {
|
||||
for (i = 0; i < nOut; i++)
|
||||
if (inbuf[i] != outbuf[i]) {
|
||||
fprintf(stderr, "mismatch at %d\n", i );
|
||||
return 1;
|
||||
}
|
||||
if (i == nOut) fprintf(stderr, "really ok!\n" );
|
||||
}
|
||||
}
|
||||
|
||||
flip_bit ( bit );
|
||||
}
|
||||
|
||||
#if 0
|
||||
assert (nOut == nIn);
|
||||
for (i = 0; i < nOut; i++) {
|
||||
if (inbuf[i] != outbuf[i]) {
|
||||
fprintf ( stderr, "difference at %d !\n", i );
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
fprintf ( stderr, "all ok\n" );
|
||||
return 0;
|
||||
}
|
|
@ -1,5 +0,0 @@
|
|||
|
||||
If compilation produces errors, or a large number of warnings,
|
||||
please read README.COMPILATION.PROBLEMS -- you might be able to
|
||||
adjust the flags in this Makefile to improve matters.
|
||||
|
|
@ -1,4 +0,0 @@
|
|||
|
||||
Doing 6 tests (3 compress, 3 uncompress) ...
|
||||
If there's a problem, things might stop at this point.
|
||||
|
|
@ -1,5 +0,0 @@
|
|||
|
||||
Checking test results. If any of the four "cmp"s which follow
|
||||
report any differences, something is wrong. If you can't easily
|
||||
figure out what, please let me know (jseward@acm.org).
|
||||
|
|
@ -1,23 +0,0 @@
|
|||
|
||||
If you got this far and the "cmp"s didn't complain, it looks
|
||||
like you're in business.
|
||||
|
||||
To install in /usr/bin, /usr/lib, /usr/man and /usr/include, type
|
||||
make install
|
||||
To install somewhere else, eg, /xxx/yyy/{bin,lib,man,include}, type
|
||||
make install PREFIX=/xxx/yyy
|
||||
If you are (justifiably) paranoid and want to see what 'make install'
|
||||
is going to do, you can first do
|
||||
make -n install or
|
||||
make -n install PREFIX=/xxx/yyy respectively.
|
||||
The -n instructs make to show the commands it would execute, but
|
||||
not actually execute them.
|
||||
|
||||
Instructions for use are in the preformatted manual page, in the file
|
||||
bzip2.txt. For more detailed documentation, read the full manual.
|
||||
It is available in Postscript form (manual.ps) and HTML form
|
||||
(manual_toc.html).
|
||||
|
||||
You can also do "bzip2 --help" to see some helpful information.
|
||||
"bzip2 -L" displays the software license.
|
||||
|
Loading…
Reference in New Issue