Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Bzip2
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== File format == No formal specification for bzip2 exists, although an informal specification has been reverse engineered from the reference implementation.<ref>{{cite web | url=https://github.com/dsnet/compress/blob/master/doc/bzip2-format.pdf | title=BZIP2 Format Specification | website=[[GitHub]] | date=17 March 2022 }}</ref> As an overview, a <code>.bz2</code> stream consists of a 4-byte header, followed by zero or more compressed blocks, immediately followed by an end-of-stream marker containing a 32-bit CRC for the plaintext whole stream processed. The compressed blocks are bit-aligned and no padding occurs. <!-- /* Paul Sladen, 2007-01-11 */ --><pre> .magic:16 = 'BZ' signature/magic number .version:8 = 'h' for Bzip2 ('H'uffman coding), '0' for Bzip1 (deprecated) .hundred_k_blocksize:8 = '1'..'9' block-size 100 kB-900 kB (uncompressed) .compressed_magic:48 = 0x314159265359 (BCD (pi)) .crc:32 = checksum for this block .randomised:1 = 0=>normal, 1=>randomised (deprecated) .origPtr:24 = starting pointer into BWT for after untransform .huffman_used_map:16 = bitmap, of ranges of 16 bytes, present/not present .huffman_used_bitmaps:0..256 = bitmap, of symbols used, present/not present (multiples of 16) .huffman_groups:3 = 2..6 number of different Huffman tables in use .selectors_used:15 = number of times that the Huffman tables are swapped (each 50 symbols) *.selector_list:1..6 = zero-terminated bit runs (0..62) of MTF'ed Huffman table (*selectors_used) .start_huffman_length:5 = 0..20 starting bit length for Huffman deltas *.delta_bit_length:1..40 = 0=>next symbol; 1=>alter length { 1=>decrement length; 0=>increment length } (*(symbols+2)*groups) .contents:2..β = Huffman encoded data stream until end of block (max. 7372800 bit) .eos_magic:48 = 0x177245385090 (BCD sqrt(pi)) .crc:32 = checksum for whole stream .padding:0..7 = align to whole byte </pre> Because of the first-stage RLE compression (see above), the maximum length of plaintext that a single 900 kB bzip2 block can contain is around 46 MB (45,899,236 bytes). This can occur if the whole plaintext consists entirely of repeated values (the resulting <code>.bz2</code> file in this case is 46 bytes long). An even smaller file of 40 bytes can be achieved by using an input containing entirely values of 251, an apparent compression ratio of 1147480.9:1. A compressed block in bzip2 can be decompressed without having to process earlier blocks. This means that bzip2 files can be decompressed in parallel, making it a good format for use in [[big data]] applications with cluster computing frameworks like [[Hadoop]] and [[Apache Spark]].<ref>{{cite web | url=https://issues.apache.org/jira/browse/HADOOP-4012 | title=[HADOOP-4012] Providing splitting support for bzip2 compressed files | work=[[Apache Software Foundation]] | date=2009 | access-date=2015-10-14 }}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Bzip2
(section)
Add topic