Editing X86 (section)

==Extensions==

===Floating-point unit===
{{Main|x87}}
{{Further|Floating-point unit}}
[[File:Intel chips 386 387.jpg|thumb|An Intel 386 with the 387 co-processor]]
Early x86 processors could be extended with [[floating-point]] hardware in the form of a series of floating-point [[numerical analysis|numerical]] [[co-processor]]s with names like [[Intel 8087|8087]], 80287 and 80387, abbreviated x87. This was also known as the NPX (''Numeric Processor eXtension''), an apt name since the coprocessors, while used mainly for floating-point calculations, also performed integer operations on both binary and decimal formats. With very few exceptions, the 80486 and subsequent x86 processors then integrated this x87 functionality on chip which made the x87 instructions a [[de facto]] integral part of the x86 instruction set.

Each x87 register, known as ST(0) through ST(7), is 80&nbsp;bits wide and stores numbers in the [[IEEE floating-point standard]] double extended precision format. These registers are organized as a stack with ST(0) as the top. This was done in order to conserve opcode space, and the registers are therefore randomly accessible only for either operand in a register-to-register instruction; ST0 must always be one of the two operands, either the source or the destination, regardless of whether the other operand is ST(x) or a memory operand. However, random access to the stack registers can be obtained through an instruction which exchanges any specified ST(x) with ST(0).

The operations include arithmetic and transcendental functions, including trigonometric and exponential functions, and instructions that load common constants (such as 0; 1; e, the base of the natural logarithm; log2(10); and log10(2)) into one of the stack registers. While the integer ability is often overlooked, the x87 can operate on larger integers with a single instruction than the 8086, 80286, 80386, or any x86 CPU without to 64-bit extensions can, and repeated integer calculations even on small values (e.g., 16-bit) can be accelerated by executing integer instructions on the x86 CPU and the x87 in parallel. (The x86 CPU keeps running while the x87 coprocessor calculates, and the x87 sets a signal to the x86 when it is finished or interrupts the x86 if it needs attention because of an error.)

===MMX===
{{Main|MMX (instruction set)}}

MMX is a [[Single instruction, multiple data|SIMD]] instruction set designed by Intel and introduced in 1997 for the [[Pentium MMX]] microprocessor.<ref name="intel">{{cite web |title=Programming With the Intel MMX™ Technology |url=http://www.intel.com/design/intarch/techinfo/pentium/mmxprog.htm |website=Embedded Pentium® Processor Family Technical Information Center |publisher=Intel |access-date=5 June 2022 |archive-url=https://web.archive.org/web/20030725092803/http://www.intel.com/design/intarch/techinfo/pentium/mmxprog.htm |archive-date=25 July 2003 |url-status=dead}}</ref> The MMX instruction set was developed from a similar concept first used on the [[Intel i860]]. It is supported on most subsequent IA-32 processors by Intel and other vendors. MMX is typically used for video processing (in multimedia applications, for instance).<ref>{{cite journal |last1=Krishnaprasad |first1=S. |title=SIMD programming illustrated using Intel's MMX instruction set |journal=Journal of Computing Sciences in Colleges |date=1 January 2004 |volume=19 |issue=3 |pages=268–277 |url=https://dl.acm.org/doi/10.5555/948835.948862 |issn=1937-4771}}</ref>

MMX added 8 new registers to the architecture, known as MM0 through MM7 (henceforth referred to as ''MMn''). In reality, these new registers were just aliases for the existing x87 FPU stack registers. Hence, anything that was done to the floating-point stack would also affect the MMX registers. Unlike the FP stack, these MMn registers were fixed, not relative, and therefore they were randomly accessible. The instruction set did not adopt the stack-like semantics so that existing operating systems could still correctly save and restore the register state when multitasking without modifications.<ref name="intel" />

Each of the MMn registers are 64-bit integers. However, one of the main concepts of the MMX instruction set is the concept of ''packed data types'', which means instead of using the whole register for a single 64-bit integer ([[quadword]]), one may use it to contain two 32-bit integers ([[Integer (computer science)|doubleword]]), four 16-bit integers ([[Integer (computer science)|word]]) or eight 8-bit integers ([[Integer (computer science)|byte]]). Given that the MMX's 64-bit MMn registers are aliased to the FPU stack and each of the floating-point registers are 80&nbsp;bits wide, the upper 16&nbsp;bits of the floating-point registers are unused in MMX. These bits are set to all ones by any MMX instruction, which correspond to the floating-point representation of [[NaN]]s or infinities.<ref name="intel" />

===3DNow!===
{{Main|3DNow!}}
In 1997, AMD introduced 3DNow!.<ref>{{cite news |last1=Sexton |first1=Michael Justin Allen |title=The History Of AMD CPUs |url=https://www.tomshardware.com/picturestory/713-amd-cpu-history.html |access-date=5 June 2022 |work=Tom's Hardware |date=21 April 2017 |language=en}}</ref> The introduction of this technology coincided with the rise of [[3D computer graphics|3D]] entertainment applications and was designed to improve the CPU's [[vector processing]] performance of graphic-intensive applications. 3D video game developers and 3D graphics hardware vendors use 3DNow! to enhance their performance on AMD's [[AMD K6|K6]] and [[Athlon]] series of processors.<ref>{{cite news |last1=Shimpi |first1=Anand Lal |title=AMD's K6-2 350: Something to do... |url=https://www.anandtech.com/show/161/2 |access-date=5 June 2022 |work=AnandTech |date=29 October 1998}}</ref>

3DNow! was designed to be the natural evolution of MMX from integers to floating point. As such, it uses exactly the same register naming convention as MMX, that is MM0 through MM7.<ref>{{cite web |title=Intel's MMX and AMD's 3DNow! SIMD Operations |url=https://web.mit.edu/rhel-doc/3/rhel-as-en-3/i386-simd.html |website=web.mit.edu |access-date=5 June 2022}}</ref> The only difference is that instead of packing integers into these registers, two [[single-precision floating-point format|single-precision floating-point]] numbers are packed into each register. The advantage of aliasing the FPU registers is that the same instruction and data structures used to save the state of the FPU registers can also be used to save 3DNow! register states. Thus no special modifications are required to be made to operating systems which would otherwise not know about them.<ref>{{cite web |title=3DNow!™ Technology Manual |url=https://www.amd.com/system/files/TechDocs/21928.pdf |publisher=Advanced Micro Devices |access-date=5 June 2022}}</ref>

==={{vanchor|SSE}} and AVX===
{{Main|Streaming SIMD Extensions|SSE2|SSE3|SSSE3|SSE4|SSE5}}

In 1999, Intel introduced the Streaming SIMD Extensions (SSE) [[instruction set]], following in 2000 with SSE2. The first addition allowed offloading of basic floating-point operations from the x87 stack and the second made MMX almost obsolete and allowed the instructions to be realistically targeted by conventional compilers. Introduced in 2004 along with the [[Intel Prescott|''Prescott'']] revision of the [[Pentium 4]] processor, SSE3 added specific memory and [[Thread (computing)|thread]]-handling instructions to boost the performance of Intel's [[HyperThreading]] technology. AMD licensed the SSE3 instruction set and implemented most of the SSE3 instructions for its revision E and later Athlon 64 processors. The Athlon 64 does not support HyperThreading and lacks those SSE3 instructions used only for HyperThreading.<ref name="tomshardware">{{cite news |title=Upgrading And Repairing PCs 21st Edition: Processor Features |url=https://www.tomshardware.com/reviews/processors-cpu-apu-features-upgrade,3569-3.html |access-date=5 June 2022 |work=Tom's Hardware |date=31 October 2013 |language=en}}</ref>

SSE discarded all legacy connections to the FPU stack. This also meant that this instruction set discarded all legacy connections to previous generations of SIMD instruction sets like MMX. But it freed the designers up, allowing them to use larger registers, not limited by the size of the FPU registers. The designers created eight 128-bit registers, named XMM0 through XMM7. (In [[x86-64|AMD64]], the number of SSE XMM registers has been increased from 8 to 16.) However, the downside was that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. So Intel created a slightly modified version of Protected mode, called Enhanced mode which enables the usage of SSE instructions, whereas they stay disabled in regular Protected mode. An OS that is aware of SSE will activate Enhanced mode, whereas an unaware OS will only enter into traditional Protected mode.

SSE is a SIMD instruction set that works only on floating-point values, like 3DNow!. However, unlike 3DNow! it severs all legacy connection to the FPU stack. Because it has larger registers than 3DNow!, SSE can pack twice the number of [[single precision]] floats into its registers. The original SSE was limited to only single-precision numbers, like 3DNow!. The SSE2 introduced the capability to pack [[double precision]] numbers too, which 3DNow! had no possibility of doing since a double precision number is 64-bit in size which would be the full size of a single 3DNow! MMn register. At 128&nbsp;bits, the SSE XMMn registers could pack two double precision floats into one register. Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow!, which were limited to only single precision. SSE3 does not introduce any additional registers.<ref name="tomshardware" />

{{main|Advanced Vector Extensions|AVX-512}}
The Advanced Vector Extensions (AVX) doubled the size of SSE registers to 256-bit YMM registers. It also introduced the VEX coding scheme to accommodate the larger registers, plus a few instructions to permute elements. AVX2 did not introduce extra registers, but was notable for the addition for masking, [[Gather-scatter (vector addressing)|gather]], and shuffle instructions.

AVX-512 features yet another expansion to 32 512-bit ZMM registers and a new EVEX scheme. Unlike its predecessors featuring a monolithic extension, it is divided into many subsets that specific models of CPUs can choose to implement.

===Physical Address Extension (PAE)===
{{Main|Physical Address Extension}}
[[Physical Address Extension]] or PAE was first added in the Intel [[Pentium Pro]], and later by [[AMD]] in the Athlon processors,<ref name="Athlon PAE">{{cite book|chapter-url=http://pdf.datasheetcatalog.com/datasheet/AdvancedMicroDevices/mXvyvs.pdf|access-date=2017-04-13|author=AMD, Inc.|title=AMD Athlon™ Processor x86 Code Optimization Guide|chapter=Appendix E|page=250|date=February 2002|edition=Revision K|quote=A 2-bit index consisting of PCD and PWT bits of the page table entry is used to select one of four PAT register fields when PAE (page address extensions) is enabled, or when the PDE doesn’t describe a large page.|archive-date=April 13, 2017|archive-url=https://web.archive.org/web/20170413235648/http://pdf.datasheetcatalog.com/datasheet/AdvancedMicroDevices/mXvyvs.pdf|url-status=live}}</ref> to allow up to 64&nbsp;GB of RAM to be addressed. Without PAE, physical RAM in 32-bit protected mode is usually limited to 4&nbsp;[[gigabyte|GB]]. PAE defines a different page table structure with wider page table entries and a third level of page table, allowing additional bits of physical address. Although the initial implementations on 32-bit processors theoretically supported up to 64&nbsp;GB of RAM, chipset and other platform limitations often restricted what could actually be used. [[x86-64]] processors define page table structures that theoretically allow up to 52 bits of physical address, although again, chipset and other platform concerns (like the number of DIMM slots available, and the maximum RAM possible per DIMM) prevent such a large physical address space to be realized. On x86-64 processors PAE mode must be active before the switch to [[long mode]], and must remain active while [[long mode]] is active, so while in long mode there is no "non-PAE" mode. PAE mode does not affect the width of linear or virtual addresses.

===x86-64===
{{More citations needed section|date=March 2016}}
{{Main|x86-64}}

[[File:Processor families in TOP500 supercomputers.svg|thumb|upright=1.25|In [[supercomputer]] [[computer cluster|cluster]]s (as tracked by [[TOP 500]] data and visualized on the diagram above, last updated 2013), the appearance of 64-bit extensions for the x86 architecture enabled 64-bit x86 processors by AMD and Intel (teal hatched and blue hatched, in the diagram, respectively) to replace most RISC processor architectures previously used in such systems (including [[PA-RISC]], [[SPARC]], [[DEC Alpha|Alpha]], and others), and 32-bit x86 (green on the diagram), even though Intel initially tried unsuccessfully to replace x86 with a new incompatible 64-bit architecture in the [[Itanium]] processor. The main non-x86 architecture which is still used, as of 2014, in supercomputing clusters is the [[Power ISA]] used by [[IBM Power microprocessors]] (blue with diamond tiling in the diagram), with SPARC as a distant second.]]

By the 2000s, 32-bit x86 processors' limits in memory addressing were an obstacle to their use in high-performance computing clusters and powerful desktop workstations. The aged 32-bit x86 was competing with much more advanced 64-bit RISC architectures which could address much more memory. Intel and the whole x86 ecosystem needed 64-bit memory addressing if x86 was to survive the 64-bit computing era, as workstation and desktop software applications were soon to start hitting the limits of 32-bit memory addressing. However, Intel felt that it was the right time to make a bold step and use the transition to 64-bit desktop computers for a transition away from the x86 architecture in general, an experiment which ultimately failed.

In 2001, Intel attempted to introduce a non-x86 64-bit architecture named [[IA-64]] in its [[Itanium]] processor, initially aiming for the [[high-performance computing]] market, hoping that it would eventually replace the 32-bit x86.<ref>{{cite web
 |url          = http://features.techworld.com/operating-systems/2690/will-intel-abandon-the-itanium/
 |title        = Will Intel abandon the Itanium?
 |date         = July 20, 2006
 |author       = Manek Dubash
 |quote        = Once touted by Intel as a replacement for the x86 product line, expectations for Itanium have been throttled well back.
 |publisher    = [[Techworld]]
 |access-date  = December 19, 2010
 |archive-date = February 19, 2011
 |archive-url  = https://web.archive.org/web/20110219212053/http://features.techworld.com/operating-systems/2690/will-intel-abandon-the-itanium/
 |url-status   = dead
}}</ref> While IA-64 was incompatible with x86, the Itanium processor did provide [[Emulator|emulation]] abilities for translating x86 instructions into IA-64, but this affected the performance of x86 programs so badly that it was rarely, if ever, actually useful to the users: programmers should rewrite x86 programs for the IA-64 architecture or their performance on Itanium would be orders of magnitude worse than on a true x86 processor. The market rejected the Itanium processor since it broke [[backward compatibility]] and preferred to continue using x86 chips, and very few programs were rewritten for IA-64.

AMD decided to take another path toward 64-bit memory addressing, making sure backward compatibility would not suffer. In April 2003, AMD released the first x86 processor with 64-bit general-purpose registers, the [[Opteron]], capable of addressing much more than 4&nbsp;[[Gigabyte|GB]] of virtual memory using the new [[x86-64]] extension (also known as AMD64 or x64). The 64-bit extensions to the x86 architecture were enabled only in the newly introduced [[long mode]], therefore 32-bit and 16-bit applications and operating systems could simply continue using an AMD64 processor in protected or other modes, without even the slightest sacrifice of performance<ref name="x86-compat-perf">{{cite web
  |url=https://public.dhe.ibm.com/software/webserver/appserv/was/64bitPerf.pdf
  |title=IBM WebSphere Application Server 64-bit Performance Demystified
  |page=14
  |quote=Figures 5, 6 and 7 also show the 32-bit version of WAS runs applications at full native hardware performance on the POWER and x86-64 platforms. Unlike some 64-bit processor architectures, the POWER and x86-64 hardware does not emulate 32-bit mode. Therefore applications that do not benefit from 64-bit features can run with full performance on the 32-bit version of WebSphere running on the above mentioned 64-bit platforms.
  |publisher=IBM Corporation
  |date=September 6, 2007
  |access-date=April 9, 2010
  |archive-date=January 25, 2022
  |archive-url=https://web.archive.org/web/20220125121650/ftp://ftp.software.ibm.com/software/webserver/appserv/was/64bitPerf.pdf
  |url-status=live
  }}</ref> and with full compatibility back to the original instructions of the 16-bit Intel 8086.<ref name="amd-24593">{{cite web
  |url=https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
  |title=Volume 2: System Programming
  |date=March 2024
  |work=AMD64 Architecture Programmer's Manual
  |publisher=AMD Corporation
  |access-date=April 24, 2024
  |archive-date=April 4, 2024
  |archive-url=https://web.archive.org/web/20240404110900/https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
  |url-status=live
  }}</ref>{{rp|page=13–14|date=November 2012}} The market responded positively, adopting the 64-bit AMD processors for both high-performance applications and business or home computers.

Seeing the market rejecting the incompatible Itanium processor and Microsoft supporting AMD64, Intel had to respond and introduced its own x86-64 processor, the ''[[Pentium 4#Prescott|Prescott]]'' Pentium&nbsp;4, in July 2004.<ref>{{cite news
  |author= Charlie Demerjian
  |title=Why Intel's Prescott will use AMD64 extensions
  |url=http://www.theinquirer.net/inquirer/news/1029651/why-intels-prescott-will-use-amd64--extensions
  |archive-url=https://web.archive.org/web/20091010181925/http://www.theinquirer.net/inquirer/news/1029651/why-intels-prescott-will-use-amd64--extensions
  |url-status=dead
  |archive-date=October 10, 2009
  |work=[[The Inquirer]]
  |date=September 26, 2003
  |access-date=October 7, 2009
}}</ref> As a result, the Itanium processor with its IA-64 instruction set is rarely used and x86, through its x86-64 incarnation, is still the dominant CPU architecture in non-embedded computers.

x86-64 also introduced the [[NX bit]], which offers some protection against security bugs caused by [[buffer overrun]]s.

As a result of AMD's 64-bit contribution to the x86 lineage and its subsequent acceptance by Intel, the 64-bit RISC architectures ceased to be a threat to the x86 ecosystem and almost disappeared from the workstation market. x86-64 began to be utilized in powerful [[supercomputer]]s (in its [[AMD Opteron]] and [[Intel Xeon]] incarnations), a market which was previously the natural habitat for 64-bit RISC designs (such as the [[IBM Power microprocessors]] or [[SPARC]] processors). The great leap toward 64-bit computing and the maintenance of backward compatibility with 32-bit and 16-bit software enabled the x86 architecture to become an extremely flexible platform today, with x86 chips being utilized from small low-power systems (for example, [[Intel Quark]] and [[Intel Atom]]) to fast gaming desktop computers (for example, [[Intel Core i7]] and [[AMD FX]]/[[Ryzen]]), and even dominate large supercomputing [[computer cluster|cluster]]s, effectively leaving only the [[ARM architecture|ARM]] 32-bit and 64-bit RISC architecture as a competitor in the [[smartphone]] and [[tablet computer|tablet]] market.

===Virtualization===
{{Main|x86 virtualization}}
Prior to 2005, x86 architecture processors were unable to meet the [[Popek and Goldberg virtualization requirements|Popek and Goldberg requirements]] – a specification for virtualization created in 1974 by [[Gerald J. Popek]] and [[Robert P. Goldberg]]. However, both proprietary and open-source [[x86 virtualization]] hypervisor products were developed using [[Shadow page tables|software-based virtualization]]. Proprietary systems include [[Hyper-V]], [[Parallels Workstation]], [[VMware ESX]], [[VMware Workstation]], [[VMware Workstation Player]] and [[Windows Virtual PC]], while [[free and open-source]] systems include [[QEMU]], [[Kernel-based Virtual Machine]], [[VirtualBox]], and [[Xen]].

The introduction of the AMD-V and Intel VT-x instruction sets in 2005 allowed x86 processors to meet the Popek and Goldberg virtualization requirements.<ref>{{cite conference
  |last1=Adams
  |first1=Keith
  |last2=Agesen
  |first2=Ole
  |title=A Comparison of Software and Hardware Techniques for x86 Virtualization
  |conference=Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 2006
  |date=October 21–25, 2006
  |url=http://www.vmware.com/pdf/asplos235_adams.pdf
  |id=ACM 1-59593-451-0/06/0010
  |access-date=December 22, 2006
  |archive-date=August 20, 2010
  |archive-url=https://web.archive.org/web/20100820201944/http://www.vmware.com/pdf/asplos235_adams.pdf
  |url-status=live
  }}</ref>

===AES===
{{Main|AES instruction set}}

=== APX (Advanced Performance Extensions) ===
APX (Advanced Performance Extensions) are extensions to double the number of general-purpose registers from 16 to 32 and add new features to improve general-purpose performance.<ref>{{Cite web |last1=Winkel |first1=Sebastian |last2=Agron |first2=Jason |title=Advanced Performance Extensions (APX) |url=https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html |access-date=2023-10-22 |website=[[Intel]] |language=en}}</ref><ref>{{cite web |last1=Robinson |first1=Dan |title=Intel adds fresh x86 and vector instructions for future chips |url=https://www.theregister.com/2023/07/26/intel_x86_vector_instructions/ |website=The Register |access-date=22 October 2023}}</ref><ref>{{cite web |last1=Bonshor |first1=Gavin |title=Intel Unveils AVX10 and APX Instruction Sets: Unifying AVX-512 For Hybrid Architectures |url=https://www.anandtech.com/show/18975/intel-unveils-avx10-and-apx-isas-unifying-avx512-for-hybrid-architectures- |website=AnandTech |access-date=22 October 2023}}</ref><ref>{{cite web |last1=Alcorn |first1=Paul |title=Intel's New AVX10 Brings AVX-512 Capabilities to E-Cores |url=https://www.tomshardware.com/news/intels-new-avx10-brings-avx-512-capabilities-to-e-cores |website=Tom's Hardware |date=July 24, 2023 |access-date=22 October 2023}}</ref> These extensions have been called "generational"<ref>{{cite web |last1=Shah |first1=Agam |title=Intel's Generational On-Chip Change APX Will Make All the Apps Faster |url=https://thenewstack.io/intels-generational-on-chip-change-apx-will-make-all-the-apps-faster/ |website=The New Stack |date=August 9, 2023 |access-date=22 October 2023}}</ref> and "the biggest x86 addition since 64 bits".<ref>{{cite web |last1=Byrne |first1=Joseph |title=APX is Biggest x86 Addition Since 64 Bits |url=https://www.techinsights.com/blog/apx-biggest-x86-addition-64-bits |website=Tech Insights}}</ref> Intel contributed APX support to [[GNU Compiler Collection]] (GCC) 14.<ref>{{cite web |last1=Larabel |first1=Michael |title=Intel APX Code Begins Landing Within The GCC Compiler |url=https://www.phoronix.com/news/GCC-Intel-APX-Starts-Landing |website=Phoronix |access-date=22 October 2023}}</ref>

According to the architecture specification,<ref>{{Cite web |date=2023-07-21 |title=Intel® Advanced Performance Extensions (Intel® APX) Architecture Specification |url=https://www.intel.com/content/www/us/en/content-details/784266/intel-advanced-performance-extensions-intel-apx-architecture-specification.html |access-date=2023-10-22 |website=Intel}}</ref> the main features of APX are:

* 16 additional general-purpose registers, called the Extended GPRs (EGPRs)
* Three-operand instruction formats for many integer instructions
* New conditional instructions for loads, stores, and comparisons with common instructions that do not modify flags
* Optimized register save/restore operations
* A 64-bit absolute direct jump instruction

Extended GPRs for general purpose instructions are encoded using 2-byte [[REX prefix|REX2]] prefix, while new instructions and extended operands for existing [[Advanced Vector Extensions|AVX]]/[[AVX2]]/[[AVX-512]] instructions are encoded with [[EVEX prefix#Extended EVEX prefix|extended EVEX]] prefix which has four variants used for different groups of instructions.