Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
TeX
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Hyphenation and justification=== In comparison with manual typesetting, the problem of [[Justification (typesetting)|justification]] is easy to solve with a digital system such as TeX, which, provided that good points for line breaking have been defined, can automatically spread the spaces between words to fill in the line. The problem is thus to find the set of breakpoints that will give the most visually pleasing result. Many line-breaking algorithms use a [[greedy algorithm|''first-fit'' approach]], where the breakpoints for each line are determined one after the other, and no breakpoint is changed after it has been chosen.<ref>{{Citation | first = Michael P | last = Barnett | title = Computer Typesetting: Experiments and Prospects | place = [[Cambridge, Massachusetts|Cambridge]], [[Massachusetts|MA]] | publisher = [[MIT Press]] | date = 1965}}</ref> Such a system is not able to define a breakpoint depending on the effect that it will have on the following lines. In comparison, the ''total-fit'' line-breaking algorithm used by TeX and developed by Donald Knuth and Michael Plass <ref>url=http://svn.tug.org/interviews/plass.html</ref> considers ''all'' the possible breakpoints in a paragraph, and finds the combination of line breaks that will produce the most globally pleasing arrangement. Formally, the algorithm defines a value called ''badness'' associated with each possible line break; the badness is increased if the spaces on the line must stretch or shrink too much to make the line the correct width. Penalties are added if a breakpoint is particularly undesirable: for example, if a word must be [[hyphen]]ated, if two lines in a row are hyphenated, or if a very loose line is immediately followed by a very tight line. The algorithm will then find the breakpoints that will minimize the sum of squares of the badness (including penalties) of the resulting lines. If the paragraph contains <math>n</math> possible breakpoints, the number of situations that must be evaluated naively is <math>2^n</math>. However, by using the method of [[dynamic programming]], the complexity of the algorithm can be brought down to <math>O(n^2)</math> (see [[Big O notation]]). Further simplifications (for example, not testing extremely unlikely breakpoints such as a hyphenation in the first word of a paragraph, or very overfull lines) lead to an efficient algorithm whose running time is <math>O(n w)</math>, where <math>w</math> is the width of a line. A similar algorithm is used to determine the best way to break paragraphs across two pages, in order to avoid [[Widow (typesetting)|widows]] or [[Orphan (typesetting)|orphans]] (lines that appear alone on a page while the rest of the paragraph is on the following or preceding page). However, in general, a thesis by Michael Plass shows how the page-breaking problem can be [[NP-complete]] because of the added complication of placing figures.{{Sfn | Knuth | Plass | 1981}} TeX's line-breaking algorithm has been adopted by several other programs, such as [[Adobe InDesign]] (a [[desktop publishing]] [[Computer application|application]])<ref>{{Citation | publisher = [[Advogato]] | url = http://www.advogato.org/article/28.html | type = interview | title = Donald E. Knuth| journal = TUGboat | volume = 21 | date = 2000 | pages = 103β10 | access-date = 26 December 2005 | archive-url = https://web.archive.org/web/20090122043044/http://www.advogato.org/article/28.html | archive-date = 22 January 2009 | url-status = dead }}</ref> and the [[GNU]] [[fmt (Unix)|fmt]] [[Unix]] [[command line]] utility.<ref>{{Citation|publisher=GNU Project |chapter-url=https://www.gnu.org/software/coreutils/manual/html_node/fmt-invocation.html#fmt-invocation |title=Core GNU utilities (GNU coreutils) manual |chapter=4.1 fmt: Reformat paragraph text |date=2016 }}</ref> If no suitable line break can be found for a line, the system will try to hyphenate a word. The original version of TeX used a hyphenation algorithm based on a set of rules for the removal of prefixes and suffixes of words, and for deciding if it should insert a break between the two consonants in a pattern of the form [[vowel]]β[[consonant]]β[[consonant]]β[[vowel]] (which is possible most of the time).{{Sfn | Liang | 1983 | p = 3}} TeX82 introduced a new hyphenation algorithm, designed by [[Frank Liang]] in 1983, to assign priorities to breakpoints in letter groups. A list of hyphenation patterns is first generated automatically from a corpus of hyphenated words (a list of 50,000 words). If TeX must find the acceptable hyphenation positions in the word ''encyclopedia'', for example, it will consider all the subwords of the extended word ''.encyclopedia.'', where ''.'' is a special marker to indicate the beginning or end of the word. The list of subwords includes all the subwords of length 1 (''.'', ''e'', ''n'', ''c'', ''y'', etc.), of length 2 (''.e'', ''en'', ''nc'', etc.), etc., up to the subword of length 14, which is the word itself, including the markers. TeX will then look into its list of hyphenation patterns, and find subwords for which it has calculated the desirability of hyphenation at each position. In the case of our word, 11 such patterns can be matched, namely <sub>1</sub>c<sub>4</sub>l<sub>4</sub>, <sub>1</sub>cy, <sub>1</sub>d<sub>4</sub>i<sub>3</sub>a, <sub>4</sub>edi, e<sub>3</sub>dia, <sub>2</sub>i<sub>1</sub>a, ope<sub>5</sub>d, <sub>2</sub>p<sub>2</sub>ed, <sub>3</sub>pedi, pedia<sub>4</sub>, y<sub>1</sub>c. For each position in the word, TeX will calculate the ''maximum value'' obtained among all matching patterns, yielding en<sub>1</sub>cy<sub>1</sub>c<sub>4</sub>l<sub>4</sub>o<sub>3</sub>p<sub>4</sub>e<sub>5</sub>d<sub>4</sub>i<sub>3</sub>a<sub>4</sub>. Finally, the acceptable positions are those indicated by an [[even and odd numbers|odd]] number, yielding the acceptable hyphenations ''en-cy-clo-pe-di-a''. This system based on subwords allows the definition of very general patterns (such as <sub>2</sub>i<sub>1</sub>a), with low indicative numbers (either odd or even), which can then be superseded by more specific patterns (such as <sub>1</sub>d<sub>4</sub>i<sub>3</sub>a) if necessary. These patterns find about 90% of the hyphens in the original dictionary; more importantly, they do not insert any spurious hyphen. In addition, a list of exceptions (words for which the patterns do not predict the correct hyphenation) are included with the Plain TeX format; additional ones can be specified by the user.{{Sfn | Liang | 1983}}{{Rp | needed = yes | date = March 2013}}<ref>{{Citation | title = The TeXbook | chapter = Appendix H: Hyphenation | pages = 449β55}}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
TeX
(section)
Add topic