Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
UTF-16
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Byte-order encoding schemes == UTF-16 and UCS-2 produce a sequence of 16-bit code units. Since most communication and storage protocols are defined for bytes, and each unit thus takes two 8-bit bytes, the order of the bytes may depend on the [[endianness]] (byte order) of the computer architecture. To assist in recognizing the byte order of code units, '''UTF-16''' allows a [[byte order mark]] (BOM), a code point with the value U+FEFF, to precede the first actual coded value.{{efn|UTF-8 encoding produces byte values strictly less than 0xFE, so either byte in the BOM sequence also identifies the encoding as UTF-16 (assuming that UTF-32 is not expected).}} (U+FEFF is the invisible [[zero-width non-breaking space]]/ZWNBSP character).{{efn|Use of U+FEFF as the character ZWNBSP instead of as a BOM has been deprecated in favor of U+2060 (WORD JOINER); see [https://www.unicode.org/faq/utf_bom.html#BOM Byte Order Mark (BOM) FAQ] at Unicode.org. But if an application interprets an initial BOM as a character, the ZWNBSP character is invisible, so the impact is minimal.}} If the endian architecture of the decoder matches that of the encoder, the decoder detects the 0xFEFF value, but an opposite-endian decoder interprets the BOM as the [[{{Proper name|noncharacter}}]] value U+FFFE reserved for this purpose. This incorrect result provides a hint to perform byte-swapping for the remaining values. If the BOM is missing, RFC 2781 recommends{{efn|{{IETF RFC|2781}} section 4.3 says that if there is no BOM, "the text SHOULD be interpreted as being big-endian." According to section 1.2, the meaning of the term "SHOULD" is governed by {{IETF RFC|2119}}. In that document, section 3 says "... there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course".}} that big-endian (BE) encoding be assumed. In practice, due to Windows using little-endian (LE) order by default, many applications assume little-endian encoding. It is also reliable to detect endianness by looking for null bytes, on the assumption that characters less than U+0100 are very common. If more even bytes (starting at 0) are null, then it is big-endian. The standard also allows the byte order to be stated explicitly by specifying '''UTF-16BE''' or '''UTF-16LE''' as the encoding type. When the byte order is specified explicitly this way, a BOM is specifically ''not'' supposed to be prepended to the text, and a U+FEFF at the beginning should be handled as a ZWNBSP character. Most applications ignore a BOM in all cases despite this rule. For [[Internet]] protocols, [[Internet Assigned Numbers Authority|IANA]] has approved "UTF-16", "UTF-16BE", and "UTF-16LE" as the names for these encodings (the names are case insensitive). The aliases '''UTF_16''' or '''UTF16''' may be meaningful in some programming languages or software applications, but they are not standard names in Internet protocols. Similar designations, '''UCS-2BE''' and '''UCS-2LE''', are used to show versions of '''UCS-2'''.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
UTF-16
(section)
Add topic