[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter explains details about compressed file format used by
ebzip
.
B.1 Overview about Compression File Format B.2 Data Part B.3 Index Part B.4 Header Part
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The compressed file format has the following features.
A compressed file consists of header, index and data parts. They are placed in that order.
+--------+-------------+-----------------------------+ | header | index | data | +--------+-------------+-----------------------------+ EOF |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
An orignal file is compressed by the following processes.
At first, ebzip
slices an original file.
Each slice except for the last slice (slice N in the following
picture) has the same size.
+---------------+---------------+-- --+----------+ | slice 1 | slice 2 | ... | slice N | +---------------+---------------+-- --+----------+ EOF |
Slice size is determined by the compression level (see section 4.3 Compression Level, about compression level):
compression level | slice size |
0 | 2048 bytes |
1 | 4096 bytes |
2 | 8192 bytes |
3 | 16384 bytes |
4 | 32768 bytes |
5 | 65536 bytes |
Second, if the last slice is shorter than the slice size, ebzip
extends the last slice to the slice size by padding bytes of 0x00.
pad +---------------+---------------+-- --+---------+-----+ | slice 1 | slice 2 | ... | slice N | +---------------+---------------+-- --+---------+-----+ EOF |
Finally, ebzip
compresses each slice into the DEFLATE compressed
data format, described in RFC 1951.
A slice is compressed independently of another slice.
Usually, each compressed slice occupies different size.
If the number of bits of the compressed slice is not a multiple of 8,
1 to 7 bits are padded to the number of bits come to a multiple of 8
at the tail of the compressed slice.
Thus, each compressed slice starts at byte boundary.
The contents of the padded bits are undefined, but the padded bits are
never used.
+------------+----------+-- --+--------------+ | compressed |compressed| ... | compressed | | slice 1 | slice 2 | ... | slice N | +------------+----------+-- --+--------------+ |
This is a data part of the compressed file format, which consists of compressed slices.
The padding in the last slice is compressed as a part of the slice.
When ebunzip
recovers the last slice, it uncompresses the slice
and then remove the padding.
When a compressed slice is larger than or equal to slice size,
ebzip
discards the compressed data of the slice.
In this case, ebzip
records original data as the compressed data
for that slice instead.
If an original file is empty, the data part is not appered in a compressed file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
At compression, ebzip
records an index for each compressed
slice.
An index represents a distance betweeen the beginning of the compressed
file to the beginning of a compressed slice.
The unit of the distance is byte.
+---------+---------+-- --+---------+---------+ | index 1 | index 2 | ........... | index N |index END| +---------+---------+-- --+---------+---------+ | | | | +---+ +----+ +------+ +-----------+ V V V V +------------------+------------------+-- --+--------------+ | compressed | compressed | ... | compressed | | slice 1 | slice 2 | ... | slice N | +------------------+------------------+-- --+--------------+ |
Each index takes from 2 to 4 bytes, according with size of an original file:
original file size | index size |
0 ... 65535 bytes | 2 bytes |
65535 ... 16777215 bytes | 3 bytes |
16777216 ... 4294967295 bytes | 4 bytes |
All multi-byte numbers in the indexes stored with the most significant byte first. For example, 0x1234 is stored as follows. First byte holds 0x12, and second byte holds 0x34.
+---------+---------+ |0001 0010|0011 0100| +---------+---------+ (0x12) (0x34) |
The index part begins with the index for the compressed slice 1, and the index for the compressed slice 2 follows it. The index for compressed slice N is followed by the index for END; index for the next byte of the end of the compressed slice N. This index also represents the size of the compressed file.
+---------+---------+-- --+---------+---------+ | index 1 | index 2 | ....... | index N |index END| +---------+---------+-- --+---------+---------+ |
If its size is equal to the slice size, the data of the slice is not compressed accutually.
If an original file is empty, the index part has only one index. The index represents the size of the compressed file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A header part occupies 22 bytes. It consists of the following fields.
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | magic ID |*1| *2 | file size | Adler-32 | mtime | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 *1: zip mode and compression level *2: reserved area |
magic ID (5 bytes)
zip mode (4 bits of the most significant bit side)
compression level (4 bits of the least-significant-bit side)
reserved area (2 bytes)
file size (6 bytes)
Adler-32 (4 bytes)
mtime (4 bytes)
Both zip mode
and compression level
are packed into
the 5th byte in the header.
zip mode
includes a most siginificant bit, and
compression level
includes least-significant-bit.
If zip mode
is 1 and compression level
is 2, then
5th byte of the header is 0x12.
MSB LSB +---+---+---+---+---+---+---+---+ | 0 0 0 1 0 0 1 0 | = 0x12 +---+---+---+---+---+---+---+---+ (zip mode) | (compression level) |
All multi-byte numbers in the header are stored with the most significant byte first.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |