Editing UTF-16 (section)

== History ==
In the late 1980s, work began on developing a uniform encoding for a "Universal Character Set" ([[Universal Coded Character Set|UCS]]) that would replace earlier language-specific encodings with one coordinated system. The goal was to include all required characters from most of the world's languages, as well as symbols from technical domains such as science, mathematics, and music. The original idea was to replace the typical 256-character encodings, which required 1 byte per character, with an encoding using 65,536 (2<sup>16</sup>) values, which would require 2 bytes (16 bits) per character.

Two groups worked on this in parallel, [[ISO/IEC JTC 1/SC 2]] and the [[Unicode Consortium]], the latter representing mostly manufacturers of computing equipment.  The two groups attempted to synchronize their character assignments so that the developing encodings would be mutually compatible.  The early 2-byte encoding was originally called "Unicode", but is now called "UCS-2".<ref name="unicode-6_0" /><ref name="ucs-2-utf-16-differences" /><ref name="mysql_UCS-2">{{cite web|url=https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-ucs2.html|title=MySQL :: MySQL 5.7 Reference Manual :: 10.1.9.4 The ucs2 Character Set (UCS-2 Unicode Encoding)|website=dev.mysql.com}}</ref>

When it became increasingly clear that 2<sup>16</sup> characters would not suffice,<ref name="unicode.org/faq">{{cite web|title=What is UTF-16?|url=https://www.unicode.org/faq/utf_bom.html#utf16-1|website=The Unicode Consortium|publisher=Unicode, Inc.|access-date=29 March 2018}}</ref> [[IEEE]] introduced a larger 31-bit space and an encoding ([[UCS-4]]) that would require 4 bytes per character. This was resisted by the [[Unicode Consortium]], both because 4 bytes per character wasted a lot of memory and disk space, and because some manufacturers were already heavily invested in 2-byte-per-character technology. The UTF-16 encoding scheme was developed as a compromise and introduced with version 2.0 of the Unicode standard in July 1996.<ref>{{cite web |url=https://www.unicode.org/faq//utf_bom.html|title=Questions about encoding forms |access-date=2010-11-12}}</ref> It is fully specified in RFC 2781, published in 2000 by the [[IETF]].<ref>ISO/IEC 10646:2014 "Information technology – Universal Coded Character Set (UCS)" sections 9 and 10.</ref><ref>{{cite book |title=The Unicode Standard version 7.0 |date=2014 |chapter-url=https://www.unicode.org/versions/Unicode7.0.0/ch02.pdf#G11153 |chapter=Chapter 2 General Structure |at=2.5 Encoding Forms}}</ref>

UTF-16 is specified in the latest versions of both the international standard [[ISO/IEC 10646]] and the Unicode Standard. "UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard."<ref name="unicode-6_0" /><ref name="ucs-2-utf-16-differences" /> UTF-16 will never be extended to support a larger number of code points or to support the code points that were replaced by surrogates, as this would violate the Unicode Stability Policy with respect to general category or surrogate code points.<ref>{{cite web|url=https://unicode.org/policies/stability_policy.html|title=Unicode Character Encoding Stability Policies|website=unicode.org}}</ref> (Any scheme that remains a [[self-synchronizing code]] would require allocating at least one  [[Plane (Unicode)#Basic Multilingual Plane|Basic Multilingual Plane]] (BMP) code point to start a sequence. Changing the purpose of  a code point is disallowed.)