Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
UTF-8
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Standards == The official name for the encoding is {{code|UTF-8}}, the spelling used in all Unicode Consortium documents. The [[hyphen-minus]] is required and no spaces are allowed. Some other names used are: * Most standards are also case-insensitive and <code>utf-8</code> is often used.{{citation needed|date=March 2023}} * Web standards (which include [[Cascading Style Sheets|CSS]], [[HTML]], [[XML]], and [[HTTP headers]]) also allow {{code|utf8}} and many other aliases<!-- e.g. "unicode20utf8" for UTF-8, likely not useful to list any or all, just stating "many"-->.<ref>{{cite web|url=https://encoding.spec.whatwg.org/#names-and-labels|title=Encoding Standard § 4.2. Names and labels|publisher=[[WHATWG]]|access-date=2018-04-29}}</ref> * The official [[Internet Assigned Numbers Authority]] lists {{code|csUTF8}} as the only alias,<ref name="IANA_2013_CS">{{cite web |publisher=[[Internet Assigned Numbers Authority]] |url=https://www.iana.org/assignments/character-sets |title=Character Sets |date=2013-01-23 |access-date=2013-02-08}}</ref> which is rarely used. * In some locales {{code|UTF-8N}} means UTF-8 ''without'' a [[byte order mark|byte-order mark]] (BOM), and in this case {{code|UTF-8}} ''may'' imply there ''is'' a BOM.<ref>{{cite web |url=https://suika.fam.cx/~wakaba/wiki/sw/n/BOM |title=BOM | work = suikawiki |archive-url=https://web.archive.org/web/20090117052232/https://suika.fam.cx/~wakaba/wiki/sw/n/BOM |archive-date=2009-01-17 |language=ja}}</ref><ref>{{cite web |author-last=Davis |author-first=Mark |author-link=Mark Davis (Unicode) |title=Forms of Unicode |publisher=[[IBM]] |url=https://www-128.ibm.com/developerworks/library/utfencodingforms/index.html |access-date=2013-09-18 |archive-url=https://web.archive.org/web/20050506211548/https://www-128.ibm.com/developerworks/library/utfencodingforms/index.html |archive-date=2005-05-06}}</ref> * In [[Windows]], UTF-8 is [[Windows code page|codepage]] <code>65001</code><ref>{{Cite web |url=https://www.dostips.com/forum/viewtopic.php?t=5357 |title=UTF-8 codepage 65001 in Windows 7 - part I |author=Liviu |quote=Previously under XP (and, unverified, but probably Vista, too) for loops simply did not work while codepage 65001 was active |language=en-gb |date=2014-02-07 |access-date=2018-01-30}}</ref> with the symbolic name <code>CP_UTF8</code> in source code. * In [[MySQL]], UTF-8 is called <code>utf8mb4</code>,<ref>{{Cite web |title=MySQL :: MySQL 8.0 Reference Manual :: 10.9.1 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding) |url=https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb4.html |work=MySQL 8.0 Reference Manual |publisher=[[Oracle Corporation]] |access-date=2023-03-14}}</ref> while {{code|utf8}} and {{code|utf8mb3}} refer to the obsolete [[CESU-8]] variant.<ref name="mysql3-utf8mb3">{{Cite web |title=MySQL :: MySQL 8.0 Reference Manual :: 10.9.2 The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding) |url=https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html |work=MySQL 8.0 Reference Manual |publisher=[[Oracle Corporation]] |access-date=2023-02-24}}</ref> * In [[Oracle Database]] (since version 9.0), <code>AL32UTF8</code><ref>{{Cite web |title=Database Globalization Support Guide |url=https://docs.oracle.com/cd/E11882_01/server.112/e10729/ch6unicode.htm |access-date=2023-03-16 |website=docs.oracle.com |language=en}}</ref> means UTF-8, while {{code|UTF-8}} means CESU-8. * In HP [[Printer Command Language|PCL]], the Symbol-ID for UTF-8 is <code>18N</code>.<ref>{{Cite web|url=https://pclhelp.com/pcl-symbol-sets/ |archive-url=https://web.archive.org/web/20150219212843/http://pclhelp.com/pcl-symbol-sets/|url-status=dead|archive-date=2015-02-19|title=HP PCL Symbol Sets {{!}} Printer Control Language (PCL & PXL) Support Blog|date=2015-02-19|access-date=2018-01-30}}</ref> There are several current definitions of UTF-8 in various standards documents: * {{IETF RFC|3629|link=no}} / STD 63 (2003), which establishes UTF-8 as a standard internet protocol element * {{IETF RFC|5198|link=no}} defines UTF-8 [[Unicode equivalence|NFC]] for Network Interchange (2008) * ISO/IEC 10646:2020/Amd 1:2023<!-- §9.1 (2023? or 2020)--><ref>[https://www.iso.org/standard/83362.html ISO/IEC 10646].</ref> * ''The Unicode Standard, Version 16.0.0'' (2024)<ref>''[https://www.unicode.org/versions/Unicode16.0.0/ The Unicode Standard, Version 16.0]'' [https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G31703 §3.9 D92, §3.10 D95], 2021.</ref> They supersede the definitions given in the following obsolete works: * ''The Unicode Standard, Version 2.0'', Appendix A (1996) * ISO/IEC 10646-1:1993 Amendment 2 / Annex R (1996) * {{IETF RFC|2044|link=no}} (1996) * {{IETF RFC|2279|link=no}} (1998) * ''The Unicode Standard, Version 3.0'', §2.3 (2000) plus Corrigendum #1 : UTF-8 Shortest Form (2000) * ''Unicode Standard Annex #27: Unicode 3.1'' (2001)<ref>[https://www.unicode.org/reports/tr27/tr27-3.html ''Unicode Standard Annex #27: Unicode 3.1''], 2001.</ref> * <!-- Is there a reason to single out 5.0 and 6.0, but not e.g. 15? Skip all after 3.0, since only then encoding of UTF-8 changed? -->''The Unicode Standard, Version 5.0'' (2006)<ref>[https://www.unicode.org/versions/Unicode5.0.0/ ''The Unicode Standard, Version 5.0''] [https://www.unicode.org/versions/Unicode5.0.0/ch03.pdf §3.9–§3.10 ch. 3], 2006.</ref> * ''The Unicode Standard, Version 6.0'' (2010)<ref>[https://www.unicode.org/versions/Unicode6.0.0/ ''The Unicode Standard, Version 6.0''] [https://www.unicode.org/versions/Unicode6.0.0/ch03.pdf §3.9 D92, §3.10 D95], 2010.</ref> They are all the same in their general mechanics, with the main differences being on issues such as allowed range of code point values and safe handling of invalid input.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
UTF-8
(section)
Add topic