Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Collation
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Automation== When information is stored in digital systems, collation may become an automated process. It is then necessary to implement an appropriate collation [[algorithm]] that allows the information to be sorted in a satisfactory manner for the application in question. Often the aim will be to achieve an alphabetical or numerical ordering that follows the standard criteria as described in the preceding sections. However, not all of these criteria are easy to automate.<ref name="Walters">[https://books.google.com/books?id=5Pd_iFM4eLsC&dq=%22collation+algorithms%22&pg=PA278 ''M Programming: A Comprehensive Guide''], Richard F. Walters, Digital Press, 1997</ref> The simplest kind of automated collation is based on the numerical codes of the symbols in a [[character set]], such as [[ASCII]] coding (or any of its [[superset]]s such as [[Unicode]]), with the symbols being ordered in increasing numerical order of their codes, and this ordering being extended to strings in accordance with the basic principles of alphabetical ordering (mathematically speaking, [[lexicographical order]]ing). So a computer program might treat the characters ''a'', ''b'', ''C'', ''d'', and ''$'' as being ordered ''$'', ''C'', ''a'', ''b'', ''d'' (the corresponding ASCII codes are ''$'' = 36, ''a'' = 97, ''b'' = 98, ''C'' = 67, and ''d'' = 100). Therefore, strings beginning with ''C'', ''M'', or ''Z'' would be sorted before strings with lower-case ''a'', ''b'', etc. This is sometimes called ''[[ASCIIbetical order]]''. This deviates from the standard alphabetical order, particularly due to the ordering of capital letters before all lower-case ones (and possibly the treatment of spaces and other non-letter characters). It is therefore often applied with certain alterations, the most obvious being case conversion (often to uppercase, for historical reasons<ref group="note">Historically, computers only handled text in uppercase (this dates back to [[telegraph]] conventions).</ref>) before comparison of ASCII values. In many collation algorithms, the comparison is based not on the numerical codes of the characters, but with reference to the '''collating sequence''' – a sequence in which the characters are assumed to come for the purpose of collation – as well as other ordering rules appropriate to the given application. This can serve to apply the correct conventions used for alphabetical ordering in the language in question, dealing properly with differently cased letters, [[modified letter]]s, [[digraph (orthography)|digraphs]], particular abbreviations, and so on, as mentioned above under [[#Alphabetical order|Alphabetical order]], and in detail in the [[Alphabetical order]] article. Such algorithms are potentially quite complex, possibly requiring several passes through the text.<ref name="Walters"/> Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in [[German (language)|German]] dictionaries the word ''ökonomisch'' comes between ''offenbar'' and ''olfaktorisch'', while [[Turkish language|Turkish]] dictionaries treat ''o'' and ''ö'' as different letters, placing ''oyun'' before ''öbür''. A standard algorithm for collating any collection of strings composed of any standard [[Unicode]] symbols is the [[Unicode Collation Algorithm]]. This can be adapted to use the appropriate collation sequence for a given language by tailoring its default collation table. Several such tailorings are collected in [[Common Locale Data Repository]]. ===Sort keys=== In some applications, the strings by which items are collated may differ from the identifiers that are displayed. For example, ''The Shining'' might be [[sorting|sorted]] as ''Shining, The'' (see [[#Alphabetical order|Alphabetical order]] above), but it may still be desired to display it as ''The Shining''. In this case two sets of strings can be stored, one for display purposes, and another for collation purposes. Strings used for collation in this way are called ''sort keys''. ===Issues with numbers=== Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a", even though '7' comes after '1' in [[Unicode]]. This can be extended to [[Roman numeral]]s. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly. For example, [[Microsoft Windows]] does this when sorting [[file name]]s. Sorting decimals properly is a bit more difficult, because different locales use different symbols for a [[decimal separator|decimal point]], and sometimes the same character used as a [[Decimal mark|decimal point]] is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Collation
(section)
Add topic