Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Burrows–Wheeler transform
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Bijective variant== Since any rotation of the input string will lead to the same transformed string, the BWT cannot be inverted without adding an EOF marker to the end of the input or doing something equivalent, making it possible to distinguish the input string from all its rotations. Increasing the size of the alphabet (by appending the EOF character) makes later compression steps awkward. There is a [[bijective]] version of the transform, by which the transformed string uniquely identifies the original, and the two have the same length and contain exactly the same characters, just in a different order.<ref>{{citation | last1 = Gil | first1 = J. | last2 = Scott | first2 = D. A. | title = A bijective string sorting transform | url = http://bijective.dogma.net/00yyy.pdf | year = 2009 | access-date = 2009-07-09 | archive-date = 2011-10-08 | archive-url = https://web.archive.org/web/20111008001603/http://bijective.dogma.net/00yyy.pdf | url-status = dead }}</ref><ref>{{citation | last = Kufleitner | first = Manfred | editor1-last = Holub | editor1-first = Jan | editor2-last = Žďárek | editor2-first = Jan | arxiv = 0908.0239 | contribution = On bijective variants of the Burrows–Wheeler transform | pages = 65–69 | title = Prague Stringology Conference | url = http://www.stringology.org/event/2009/p07.html | year = 2009| bibcode = 2009arXiv0908.0239K}}.</ref> The bijective transform is computed by factoring the input into a non-increasing sequence of [[Lyndon word]]s; such a factorization exists and is unique by the [[Chen–Fox–Lyndon theorem]],<ref>*{{Citation | last=Lothaire | first=M. | author-link=M. Lothaire | others=Perrin, D.; Reutenauer, C.; Berstel, J.; Pin, J. E.; Pirillo, G.; Foata, D.; Sakarovitch, J.; Simon, I.; Schützenberger, M. P.; Choffrut, C.; Cori, R.; Lyndon, Roger; Rota, Gian-Carlo. Foreword by Roger Lyndon | title=Combinatorics on words | edition=2nd | series=Encyclopedia of Mathematics and Its Applications | volume=17 | publisher=[[Cambridge University Press]] | year=1997 | isbn=978-0-521-59924-5 | zbl=0874.20040 | page=67 }}</ref> and may be found in linear time and constant space.<ref>{{citation | last = Duval | first = Jean-Pierre | doi = 10.1016/0196-6774(83)90017-2 | issue = 4 | journal = Journal of Algorithms | pages = 363–381 | title = Factorizing words over an ordered alphabet | volume = 4 | year = 1983 | zbl=0532.68061| issn=0196-6774}}.</ref> The algorithm sorts the rotations of all the words; as in the Burrows–Wheeler transform, this produces a sorted sequence of ''n'' strings. The transformed string is then obtained by picking the final character of each string in this sorted list. The one important caveat here is that strings of different lengths are not ordered in the usual way; the two strings are repeated forever, and the infinite repeats are sorted. For example, "ORO" precedes "OR" because "OROORO..." precedes "OROROR...". For example, the text "<span style="color:red;">^</span>BANANA<span style="color:red;">$</span>" is transformed into "ANNBAA<span style="color:red;">^</span><span style="color:red;">$</span>" through these steps (the red <span style="color:red;">$</span> character indicates the [[End-of-file|EOF]] pointer) in the original string. The EOF character is unneeded in the bijective transform, so it is dropped during the transform and re-added to its proper place in the file. The string is broken into Lyndon words so the words in the sequence are decreasing using the comparison method above. (Note that we're sorting '<span style="color:red;">^</span>' as succeeding other characters.) "<span style="color:red;">^</span>BANANA" becomes (<span style="color:red;">^</span>) (B) (AN) (AN) (A). {| class="wikitable" |- ! colspan="5" | Bijective transformation |- ! Input ! All<br />rotations ! Sorted alphabetically ! Last column<br />of rotated Lyndon word ! Output |- | align=center | <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> | '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... (<span style="color:red;">^</span>) '''B'''BBBBBBB... (B) '''ANAN'''ANAN... (AN) '''NANA'''NANA... (NA) '''ANAN'''ANAN... (AN) '''NANA'''NANA... (NA) '''A'''AAAAAAA... (A) | '''A'''AAAAAAA... (A) '''A'''NANANAN... (AN) '''A'''NANANAN... (AN) '''B'''BBBBBBB... (B) '''N'''ANANANA... (NA) '''N'''ANANANA... (NA) '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... (<span style="color:red;">^</span>) | '''A'''AAAAAAA... ('''A''') A'''N'''ANANAN... (A'''N''') A'''N'''ANANAN... (A'''N''') '''B'''BBBBBBB... ('''B''') N'''A'''NANANA... (N'''A''') N'''A'''NANANA... (N'''A''') '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... ('''<span style="color:red;">^</span>''') | ANNBAA<span style="color:red;">^</span><span style="color:red;">$</span> |} {| class="wikitable" ! colspan=4 | Inverse bijective transform |- ! colspan=4 | Input |- | align=center colspan=4 | ANNBAA<span style="color:red;">^</span> |- ! Add 1 ! Sort 1 ! Add 2 ! Sort 2 |- | align=right | A N N B A A <span style="color:red;">^</span> | align=right | A A A B N N <span style="color:red;">^</span> | align=right | AA NA NA BB AN AN <span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AA AN AN BB NA NA <span style="color:red;">^</span><span style="color:red;">^</span> |- ! Add 3 ! Sort 3 ! Add 4 ! Sort 4 |- | align=right | AAA NAN NAN BBB ANA ANA <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AAA ANA ANA BBB NAN NAN <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AAAA NANA NANA BBBB ANAN ANAN <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AAAA ANAN ANAN BBBB NANA NANA <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> |- ! colspan=4 | Output |- | align=center colspan=4 | <span style="color:red;">^</span>BANANA |} Up until the last step, the process is identical to the inverse Burrows–Wheeler process, but here it will not necessarily give rotations of a single sequence; it instead gives rotations of Lyndon words (which will start to repeat as the process is continued). Here, we can see (repetitions of) four distinct Lyndon words: (A), (AN) (twice), (B), and (<span style="color:red;">^</span>). (NANA... doesn't represent a distinct word, as it is a cycle of ANAN....) At this point, these words are sorted into reverse order: (<span style="color:red;">^</span>), (B), (AN), (AN), (A). These are then concatenated to get :<span style="color:red;">^</span>BANANA The Burrows–Wheeler transform can indeed be viewed as a special case of this bijective transform; instead of the traditional introduction of a new letter from outside our alphabet to denote the end of the string, we can introduce a new letter that compares as preceding all existing letters that is put at the beginning of the string. The whole string is now a Lyndon word, and running it through the bijective process will therefore result in a transformed result that, when inverted, gives back the Lyndon word, with no need for reassembling at the end. For example, applying the bijective transform gives: {| class="wikitable" ! Input | <code>SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES</code> |- ! Lyndon words | <code><span style="color:#990000;">S</span><span style="color:#FF9900;">IX</span><span style="color:#006600;">.MIXED.PIXIES.SIFT.SIXTY.PIXIE</span><span style="color:#0000DD;">.DUST</span><span style="color:#660066;">.BOXES</span></code> |- ! Output | <code>STEYDST.E.IXXIIXXSMPPXS.B..EE..SUSFXDIOIIIIT</code> |} The bijective transform includes eight runs of identical characters. These runs are, in order: <code>XX</code>, <code>II</code>, <code>XX</code>, <code>PP</code>, <code>..</code>, <code>EE</code>, <code>..</code>, and <code>IIII</code>. In total, 18 characters are used in these runs.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Burrows–Wheeler transform
(section)
Add topic