Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Intel 8086
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Example code == The following 8086 [[assembly language|assembly]] source code is for a subroutine named <code>_strtolower</code> that copies a null-terminated [[ASCIIZ]] character string from one location to another, converting all alphabetic characters to lower case. The string is copied one byte (8-bit character) at a time. <!--NOTE: The hex codes were assembled by hand, so there may be errors--> {| style="font-size:70%" | <!--NOTE: DO NOT REMOVE BLANK LINES, 0000 line sets block width--><pre> 0000 0000 55 0001 89 E5 0003 56 0004 57 0005 8B 75 06 0008 8B 7D 04 000B FC 000C AC 000D 3C 41 000F 7C 06 0011 3C 5A 0013 7F 02 0015 04 20 0017 AA 0018 08 C0 001A 75 F0 001C 5F 001D 5E 001E 5D 001F C3 001F </pre> | <syntaxhighlight lang="nasm"> ; _strtolower: ; Copy a null-terminated ASCII string, converting ; all alphabetic characters to lower case. ; ES=DS ; Entry stack parameters ; [SP+4] = src, Address of source string ; [SP+2] = dst, Address of target string ; [SP+0] = Return address ; _strtolower proc push bp ;Set up the call frame mov bp,sp push si push di mov si,[bp+6] ;Set si = src (+2 due to push bp) mov di,[bp+4] ;Set di = dst cld ;string direction ascending loop: lodsb ;Load al from [si], inc si cmp al,'A' ;If al < 'A', jl copy ; skip conversion cmp al,'Z' ;If al > 'Z', jg copy ; skip conversion add al,'a'-'A' ;Convert al to lowercase copy: stosb ;Store al to es:[di], inc di or al,al ;If al <> 0, jne loop ; repeat the loop done: pop di ;restore di and si pop si pop bp ;Restore the prev call frame ret ;Return to caller end proc </syntaxhighlight> |} The example code uses the BP (base pointer) register to establish a [[call frame]], an area on the stack that contains all of the parameters and local variables for the execution of the subroutine. This kind of [[calling convention]] supports [[reentrancy (computing)|reentrant]] and [[recursion (computer science)|recursive]] code and has been used by Algol-like languages since the late 1950s. A flat memory model is assumed, specifically, that the DS and ES segments address the same region of memory. ===Performance=== [[File:Intel 8086 block scheme.svg|thumb|405px|''Simplified block diagram over Intel 8088 (a variant of 8086); 1=main & index registers; 2=segment registers and IP; 3=address adder; 4=internal address bus; 5=instruction queue; 6=control unit (very simplified!); 7=bus interface; 8=internal databus; 9=ALU; 10/11/12=external address/data/control bus.'']] Although partly shadowed by other design choices in this particular chip, the [[multiplexed]] address and [[Bus (computing)|data buses]] limit performance slightly; transfers of 16-bit or 8-bit quantities are done in a four-clock memory access cycle, which is faster on 16-bit, although slower on 8-bit quantities, compared to many contemporary 8-bit based CPUs. As instructions vary from one to six bytes, fetch and execution are made [[Concurrency (computer science)|concurrent]] and decoupled into separate units (as it remains in today's x86 processors): The ''bus interface unit'' feeds the instruction stream to the ''execution unit'' through a 6-byte prefetch queue (a form of loosely coupled [[Pipeline (computing)|pipelining]]), speeding up operations on [[Processor register|register]]s and [[Operand|immediate]]s, while memory operations became slower (four years later, this performance problem was fixed with the [[80186]] and [[80286]]). However, the full (instead of partial) 16-bit architecture with a full width [[Arithmetic logic unit|ALU]] meant that 16-bit arithmetic instructions could now be performed with a single ALU cycle (instead of two, via internal carry, as in the 8080 and 8085), speeding up such instructions considerably. Combined with [[orthogonalization]]s of operations versus [[operand]] types and [[addressing mode]]s, as well as other enhancements, this made the performance gain over the 8080 or 8085 fairly significant, despite cases where the older chips may be faster (see below). {| class="wikitable" style="text-align: center; width: 100px; height: 50px;" |+ Execution times for typical instructions (in clock cycles)<ref>{{cite book|title=Microsoft Macro Assembler 5.0 Reference Manual|year=1987|publisher=Microsoft Corporation| quote=Timings and encodings in this manual are used with permission of Intel and come from the following publications: Intel Corporation. iAPX 86, 88, 186 and 188 User's Manual, Programmer's Reference, Santa Clara, Calif. 1986.|title-link=MASM}} (Similarly for iAPX 286, 80386, 80387.)</ref> |- style="vertical-align:bottom; border-bottom:3px double #999;" !align=left | instruction !align=left | register-register !align=left | register immediate !align=left | register-memory !align=left | memory-register !align=left | memory-immediate |- style="vertical-align:top; border-bottom:1px solid #999;" |mov || 2 || 4|| 8+EA || 9+EA || 10+EA |- style="vertical-align:top; border-bottom:1px solid #999;" |ALU || 3 ||4|| 9+EA, || 16+EA,|| 17+EA |- style="vertical-align:top; border-bottom:1px solid #999;" |jump || colspan="5" | ''register'' β₯ 11 ; ''label'' β₯ 15 ; ''condition,label'' β₯ 16 |- style="vertical-align:top; border-bottom:1px solid #999;" |integer multiply || colspan="5" | 70~160 (depending on operand ''data'' as well as size) ''including'' any EA |- style="vertical-align:top; border-bottom:1px solid #999;" |integer divide || colspan="5" | 80~190 (depending on operand ''data'' as well as size) ''including'' any EA |} * EA = time to compute effective address, ranging from 5 to 12 cycles. * Timings are best case, depending on prefetch status, instruction alignment, and other factors. As can be seen from these tables, operations on registers and immediates were fast (between 2 and 4 cycles), while memory-operand instructions and jumps were quite slow; jumps took more cycles than on the simple [[Intel 8080|8080]] and [[Intel 8085|8085]], and the 8088 (used in the IBM PC) was additionally hampered by its narrower bus. The reasons why most memory related instructions were slow were threefold: * Loosely coupled fetch and execution units are efficient for instruction prefetch, but not for jumps and random data access (without special measures). * No dedicated address calculation adder was afforded; the microcode routines had to use the main ALU for this (although there was a dedicated ''segment'' + ''offset'' adder). * The address and data buses were [[multiplexing|multiplex]]ed, forcing a slightly longer (33~50%) bus cycle than in typical contemporary 8-bit processors.{{Dubious|1=Multiplexed bus|reason=The multiplexed bus is unlikely to slow things by "33~50%." The address was only delayed by the 18 nanosecond max propagation delay of the 74LS373 transparent latch.|date=May 2023}} However, memory access performance was drastically enhanced with Intel's next generation of 8086 family CPUs. The [[Intel 80186|80186]] and [[Intel 80286|80286]] both had dedicated address calculation hardware, saving many cycles, and the 80286 also had separate (non-multiplexed) address and data buses. ===Floating point=== The 8086/8088 could be connected to a mathematical coprocessor to add hardware/microcode-based [[floating-point]] performance. The [[Intel 8087]] was the standard math coprocessor for the 8086 and 8088, operating on 80-bit numbers. Manufacturers like [[Cyrix]] (8087-compatible) and [[Weitek]] (''not'' 8087-compatible) eventually came up with high-performance floating-point coprocessors that competed with the 8087.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Intel 8086
(section)
Add topic