Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Unicode
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== General Category property === Each code point is assigned a classification, listed as the code point's [[Character property (Unicode)#General Category|General Category]] property. Here, at the uppermost level code points are categorized as one of Letter, Mark, Number, Punctuation, Symbol, Separator, or Other. Under each category, each code point is then further subcategorized. In most cases, other properties must be used to adequately describe all the characteristics of any given code point. {{General Category (Unicode)}} The {{val|1024}} points in the range {{tt|U+D800}}β{{tt|U+DBFF}} are known as ''high-surrogate'' code points, and code points in the range {{tt|U+DC00}}β{{tt|U+DFFF}} ({{val|1024}} code points) are known as ''low-surrogate'' code points. A high-surrogate code point followed by a low-surrogate code point forms a ''surrogate pair'' in UTF-16 in order to represent code points greater than {{tt|U+FFFF}}. In principle, these code points cannot otherwise be used, though in practice this rule is often ignored, especially when not using UTF-16. A small set of code points are guaranteed never to be assigned to characters, although third-parties may make independent use of them at their discretion. There are 66 of these ''noncharacters'': {{tt|U+FDD0}}β{{tt|U+FDEF}} and the last two code points in each of the 17 planes (e.g. {{tt|U+FFFE}}, {{tt|U+FFFF}}, {{tt|U+1FFFE}}, {{tt|U+1FFFF}}, ..., {{tt|U+10FFFE}}, {{Tt|U+10FFFF}}). The set of noncharacters is stable, and no new noncharacters will ever be defined.<ref name="stability-policy">{{Cite web |title=Unicode Character Encoding Stability Policy |url=https://unicode.org/policies/stability_policy.html |access-date=16 March 2010}}</ref> Like surrogates, the rule that these cannot be used is often ignored, although the operation of the [[byte order mark]] assumes that {{tt|U+FFFE}} will never be the first code point in a text. The exclusion of surrogates and noncharacters leaves {{val|1111998}} code points available for use. ''Private use'' code points are considered to be assigned, but they intentionally have no interpretation specified by ''The Unicode Standard''<ref>{{Cite web |title=Properties |url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G43463 |access-date=13 September 2024}}</ref> such that any interchange of such code points requires an independent agreement between the sender and receiver as to their interpretation. There are three private use areas in the Unicode codespace: * Private Use Area: {{tt|U+E000}}β{{tt|U+F8FF}} ({{val|6400}} characters), * Supplementary Private Use Area-A: {{tt|U+F0000}}β{{tt|U+FFFFD}} ({{val|65534}} characters), * Supplementary Private Use Area-B: {{tt|U+100000}}β{{tt|U+10FFFD}} ({{val|65534}} characters). ''Graphic'' characters are those defined by ''The Unicode Standard'' to have particular semantics, either having a visible [[glyph]] shape or representing a visible space. As of Unicode 16.0, there are {{val|154826}} graphic characters. ''Format'' characters are characters that do not have a visible appearance but may have an effect on the appearance or behavior of neighboring characters. For example, {{unichar|200C|Zero width non-joiner|nlink=}} and {{unichar|200D|Zero width joiner|nlink=}} may be used to change the default shaping behavior of adjacent characters (e.g. to inhibit ligatures or request ligature formation). There are 172 format characters in Unicode 16.0. 65 code points, the ranges {{tt|U+0000}}β{{tt|U+001F}} and {{tt|U+007F}}β{{tt|U+009F}}, are reserved as ''control codes'', corresponding to the [[C0 and C1 control codes]] as defined in [[ISO/IEC 6429]]. {{tt|U+0089}} {{smallcaps|LINE TABULATION}}, {{tt|U+008A}} {{smallcaps|LINE FEED}}, and {{tt|U+000D}} {{smallcaps|CARRIAGE RETURN}} are widely used in texts using Unicode. In a phenomenon known as [[mojibake]], the C1 code points are improperly decoded according to the [[Windows-1252]] codepage, previously widely used in Western European contexts. Together, graphic, format, control code, and private use characters are collectively referred to as ''assigned characters''. ''Reserved'' code points are those code points that are valid and available for use, but have not yet been assigned. As of Unicode 15.1, there are {{val|819467}} reserved code points.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Unicode
(section)
Add topic