Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Burrows–Wheeler transform
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Explanation== If the original string had several substrings that occurred often, then the BWT-transformed string will have several places where a single character is repeated many times in a row,<ref>{{cite web|url=https://github.com/adrien-mogenet/scala-bwt/blob/master/src/main/scala/me/algos/bwt/BurrowsWheelerCodec.scala|title=adrien-mogenet/scala-bwt|website=GitHub|access-date=19 April 2018}}</ref> creating more-easily-compressible data. For instance, consider transforming an English text frequently containing the word "the": For example: {| class="wikitable" ! Input | <code>THE.MAN.AND.THE.DOG.WAITED.AT.THE.STATION.FOR.THE.TRAIN.TO.THE.CITY</code> |- ! Output | <code>NDEENEEODTRNEGRWM..T.EN.HHHHHT.OTTTTTATAC.AOIATDIFOT.ASI..Y..A..I.T</code> |} Sorting the rotations of this text groups rotations starting with "he " together, and the last character of such a rotation (which is also the character before the "he ") will usually be "t" (though perhaps occasionally not, such as if the text contained "ache "), so the result of the transform will contain a run, or runs, of many consecutive "t" characters. Similarly, rotations beginning with "e " are grouped together, but "e " is often preceded by "h", so we see the output above contains a run of five consecutive "h" characters. Thus it can be seen that the success of this transform depends upon one value having a high probability of occurring before a sequence, so that in general it needs fairly long samples (a few kilobytes at least) of appropriate data (such as text). The remarkable thing about the BWT is not that it generates a more easily encoded output—an ordinary sort would do that—but that it does this ''reversibly'', allowing the original document to be re-generated from the last column data. The inverse can be understood this way. Take the final table in the BWT algorithm, and erase all but the last column. Given only this information, you can easily reconstruct the first column. The last column tells you all the characters in the text, so just sort these characters alphabetically to get the first column. Then, the last and first columns (of each row) together give you all ''pairs'' of successive characters in the document, where pairs are taken cyclically so that the last and first character form a pair. Sorting the list of pairs gives the first ''and second'' columns. To obtain the third column, the last column is again prepended to the table, and the rows are sorted lexicographically. Continuing in this manner, you can reconstruct the entire list. Then, the row with the "end of file" character at the end is the original text. Reversing the example above is done like this: {| class="wikitable" ! colspan=4 | Inverse transformation |- ! colspan=4 | Input |- | align=center colspan=4 | BNN<span style="color:red;">^</span>AA<span style="color:red;">$</span>A |- ! Add 1 ! Sort 1 ! Add 2 ! Sort 2 |- | align=right | B N N <span style="color:red;">^</span> A A <span style="color:red;">$</span> A | align=right | A A A B N N <span style="color:red;">^</span> <span style="color:red;">$</span> | align=right | BA NA NA <span style="color:red;">^</span>B AN AN <span style="color:red;">$</span><span style="color:red;">^</span> A<span style="color:red;">$</span> | align=right | AN AN A<span style="color:red;">$</span> BA NA NA <span style="color:red;">^</span>B <span style="color:red;">$</span><span style="color:red;">^</span> |- ! Add 3 ! Sort 3 ! Add 4 ! Sort 4 |- | align=right | BAN NAN NA<span style="color:red;">$</span> <span style="color:red;">^</span>BA ANA ANA <span style="color:red;">$</span><span style="color:red;">^</span>B A<span style="color:red;">$</span><span style="color:red;">^</span> | align=right | ANA ANA A<span style="color:red;">$</span><span style="color:red;">^</span> BAN NAN NA<span style="color:red;">$</span> <span style="color:red;">^</span>BA <span style="color:red;">$</span><span style="color:red;">^</span>B | align=right | BANA NANA NA<span style="color:red;">$</span><span style="color:red;">^</span> <span style="color:red;">^</span>BAN ANAN ANA<span style="color:red;">$</span> <span style="color:red;">$</span><span style="color:red;">^</span>BA A<span style="color:red;">$</span><span style="color:red;">^</span>B | align=right | ANAN ANA<span style="color:red;">$</span> A<span style="color:red;">$</span><span style="color:red;">^</span>B BANA NANA NA<span style="color:red;">$</span><span style="color:red;">^</span> <span style="color:red;">^</span>BAN <span style="color:red;">$</span><span style="color:red;">^</span>BA |- ! Add 5 ! Sort 5 ! Add 6 ! Sort 6 |- | align=right | BANAN NANA<span style="color:red;">$</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>B <span style="color:red;">^</span>BANA ANANA ANA<span style="color:red;">$</span><span style="color:red;">^</span> <span style="color:red;">$</span><span style="color:red;">^</span>BAN A<span style="color:red;">$</span><span style="color:red;">^</span>BA | align=right | ANANA ANA<span style="color:red;">$</span><span style="color:red;">^</span> A<span style="color:red;">$</span><span style="color:red;">^</span>BA BANAN NANA<span style="color:red;">$</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>B <span style="color:red;">^</span>BANA <span style="color:red;">$</span><span style="color:red;">^</span>BAN | align=right | BANANA NANA<span style="color:red;">$</span><span style="color:red;">^</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>BA <span style="color:red;">^</span>BANAN ANANA<span style="color:red;">$</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>B <span style="color:red;">$</span><span style="color:red;">^</span>BANA A<span style="color:red;">$</span><span style="color:red;">^</span>BAN | align=right | ANANA<span style="color:red;">$</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>B A<span style="color:red;">$</span><span style="color:red;">^</span>BAN BANANA NANA<span style="color:red;">$</span><span style="color:red;">^</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>BA <span style="color:red;">^</span>BANAN <span style="color:red;">$</span><span style="color:red;">^</span>BANA |- ! Add 7 ! Sort 7 ! Add 8 ! Sort 8 |- | align=right | BANANA<span style="color:red;">$</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>B NA<span style="color:red;">$</span><span style="color:red;">^</span>BAN <span style="color:red;">^</span>BANANA ANANA<span style="color:red;">$</span><span style="color:red;">^</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>BA <span style="color:red;">$</span><span style="color:red;">^</span>BANAN A<span style="color:red;">$</span><span style="color:red;">^</span>BANA | align=right | ANANA<span style="color:red;">$</span><span style="color:red;">^</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>BA A<span style="color:red;">$</span><span style="color:red;">^</span>BANA BANANA<span style="color:red;">$</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>B NA<span style="color:red;">$</span><span style="color:red;">^</span>BAN <span style="color:red;">^</span>BANANA <span style="color:red;">$</span><span style="color:red;">^</span>BANAN | align=right | BANANA<span style="color:red;">$</span><span style="color:red;">^</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>BA NA<span style="color:red;">$</span><span style="color:red;">^</span>BANA <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> ANANA<span style="color:red;">$</span><span style="color:red;">^</span>B ANA<span style="color:red;">$</span><span style="color:red;">^</span>BAN <span style="color:red;">$</span><span style="color:red;">^</span>BANANA A<span style="color:red;">$</span><span style="color:red;">^</span>BANAN | align=right | ANANA<span style="color:red;">$</span><span style="color:red;">^</span>B ANA<span style="color:red;">$</span><span style="color:red;">^</span>BAN A<span style="color:red;">$</span><span style="color:red;">^</span>BANAN BANANA<span style="color:red;">$</span><span style="color:red;">^</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>BA NA<span style="color:red;">$</span><span style="color:red;">^</span>BANA <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> <span style="color:red;">$</span><span style="color:red;">^</span>BANANA |- ! colspan=4 | Output |- | align=center colspan=4 | <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> |}
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Burrows–Wheeler transform
(section)
Add topic