Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
UTF-16
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == In the late 1980s, work began on developing a uniform encoding for a "Universal Character Set" ([[Universal Coded Character Set|UCS]]) that would replace earlier language-specific encodings with one coordinated system. The goal was to include all required characters from most of the world's languages, as well as symbols from technical domains such as science, mathematics, and music. The original idea was to replace the typical 256-character encodings, which required 1 byte per character, with an encoding using 65,536 (2<sup>16</sup>) values, which would require 2 bytes (16 bits) per character. Two groups worked on this in parallel, [[ISO/IEC JTC 1/SC 2]] and the [[Unicode Consortium]], the latter representing mostly manufacturers of computing equipment. The two groups attempted to synchronize their character assignments so that the developing encodings would be mutually compatible. The early 2-byte encoding was originally called "Unicode", but is now called "UCS-2".<ref name="unicode-6_0" /><ref name="ucs-2-utf-16-differences" /><ref name="mysql_UCS-2">{{cite web|url=https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-ucs2.html|title=MySQL :: MySQL 5.7 Reference Manual :: 10.1.9.4 The ucs2 Character Set (UCS-2 Unicode Encoding)|website=dev.mysql.com}}</ref> When it became increasingly clear that 2<sup>16</sup> characters would not suffice,<ref name="unicode.org/faq">{{cite web|title=What is UTF-16?|url=https://www.unicode.org/faq/utf_bom.html#utf16-1|website=The Unicode Consortium|publisher=Unicode, Inc.|access-date=29 March 2018}}</ref> [[IEEE]] introduced a larger 31-bit space and an encoding ([[UCS-4]]) that would require 4 bytes per character. This was resisted by the [[Unicode Consortium]], both because 4 bytes per character wasted a lot of memory and disk space, and because some manufacturers were already heavily invested in 2-byte-per-character technology. The UTF-16 encoding scheme was developed as a compromise and introduced with version 2.0 of the Unicode standard in July 1996.<ref>{{cite web |url=https://www.unicode.org/faq//utf_bom.html|title=Questions about encoding forms |access-date=2010-11-12}}</ref> It is fully specified in RFC 2781, published in 2000 by the [[IETF]].<ref>ISO/IEC 10646:2014 "Information technology β Universal Coded Character Set (UCS)" sections 9 and 10.</ref><ref>{{cite book |title=The Unicode Standard version 7.0 |date=2014 |chapter-url=https://www.unicode.org/versions/Unicode7.0.0/ch02.pdf#G11153 |chapter=Chapter 2 General Structure |at=2.5 Encoding Forms}}</ref> UTF-16 is specified in the latest versions of both the international standard [[ISO/IEC 10646]] and the Unicode Standard. "UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard."<ref name="unicode-6_0" /><ref name="ucs-2-utf-16-differences" /> UTF-16 will never be extended to support a larger number of code points or to support the code points that were replaced by surrogates, as this would violate the Unicode Stability Policy with respect to general category or surrogate code points.<ref>{{cite web|url=https://unicode.org/policies/stability_policy.html|title=Unicode Character Encoding Stability Policies|website=unicode.org}}</ref> (Any scheme that remains a [[self-synchronizing code]] would require allocating at least one [[Plane (Unicode)#Basic Multilingual Plane|Basic Multilingual Plane]] (BMP) code point to start a sequence. Changing the purpose of a code point is disallowed.)
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
UTF-16
(section)
Add topic