Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
IA-64
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Instruction execution === The fetch mechanism can read up to two bundles per clock from the L1 [[CPU cache|cache]] into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the [[instruction set]], and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units. The execution unit groups include: * Six general-purpose ALUs, two integer units, one shift unit * Four data cache units * Six multimedia units, two parallel shift units, one parallel multiply, one [[Hamming weight|population count]] * Two 82-bit floating-point [[multiply–accumulate]] units, two [[SIMD]] floating-point multiply–accumulate units (two 32-bit operations each)<ref>Sharangpani, Harsh; Arora, Ken (2000). "Itanium Processor Microarchitecture". ''[[IEEE Micro]]''. pp. 38–39.</ref> * Three branch units Ideally, the compiler can often group instructions into sets of six that can execute at the same time. Since the floating-point units implement a [[multiply–accumulate operation]], a single floating-point instruction can perform the work of two instructions when the application requires a multiply followed by an add: this is very common in scientific processing. When it occurs, the processor can execute four [[FLOP]]s per cycle. For example, the 800 MHz Itanium had a theoretical rating of 3.2 [[GFLOPS]] and the fastest Itanium 2, at 1.67 GHz, was rated at 6.67 GFLOPS. In practice, the processor may often be underutilized, with not all slots filled with useful instructions due to e.g. data dependencies or limitations in the available bundle templates. The densest possible code requires 42.6 bits per instruction, compared to 32 bits per instruction on traditional RISC processors of the time, and no-ops due to wasted slots further decrease the density of code. Additional instructions for speculative loads and hints for branches and cache are impractical to generate optimally, because a compiler cannot predict the contents of the different cache levels on a system running multiple processes and taking interrupts.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
IA-64
(section)
Add topic