Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Floating-point arithmetic
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Addition and subtraction === A simple method to add floating-point numbers is to first represent them with the same exponent. In the example below, the second number (with the smaller exponent) is shifted right by three digits, and one then proceeds with the usual addition method: 123456.7 = 1.234567 Γ 10^5 101.7654 = 1.017654 Γ 10^2 = 0.001017654 Γ 10^5 Hence: 123456.7 + 101.7654 = (1.234567 Γ 10^5) + (1.017654 Γ 10^2) = (1.234567 Γ 10^5) + (0.001017654 Γ 10^5) = (1.234567 + 0.001017654) Γ 10^5 = 1.235584654 Γ 10^5 In detail: e=5; s=1.234567 (123456.7) + e=2; s=1.017654 (101.7654) e=5; s=1.234567 + e=5; s=0.001017654 (after shifting) -------------------- e=5; s=1.235584654 (true sum: 123558.4654) This is the true result, the exact sum of the operands. It will be rounded to seven digits and then normalized if necessary. The final result is e=5; s=1.235585 (final sum: 123558.5) The lowest three digits of the second operand (654) are essentially lost. This is [[round-off error]]. In extreme cases, the sum of two non-zero numbers may be equal to one of them: e=5; s=1.234567 + e=β3; s=9.876543 e=5; s=1.234567 + e=5; s=0.00000009876543 (after shifting) ---------------------- e=5; s=1.23456709876543 (true sum) e=5; s=1.234567 (after rounding and normalization) In the above conceptual examples it would appear that a large number of extra digits would need to be provided by the adder to ensure correct rounding; however, for binary addition or subtraction using careful implementation techniques only a ''guard'' bit, a ''rounding'' bit and one extra ''sticky'' bit need to be carried beyond the precision of the operands.<ref name="Goldberg_1991"/><ref name="Patterson-Hennessy_2014"/>{{rp|218β220}} Another problem of loss of significance occurs when ''approximations'' to two nearly equal numbers are subtracted. In the following example ''e'' = 5; ''s'' = 1.234571 and ''e'' = 5; ''s'' = 1.234567 are approximations to the rationals 123457.1467 and 123456.659. e=5; s=1.234571 β e=5; s=1.234567 ---------------- e=5; s=0.000004 e=β1; s=4.000000 (after rounding and normalization) The floating-point difference is computed exactly because the numbers are closeβthe [[Sterbenz lemma]] guarantees this, even in case of underflow when [[gradual underflow]] is supported. Despite this, the difference of the original numbers is ''e'' = β1; ''s'' = 4.877000, which differs more than 20% from the difference ''e'' = β1; ''s'' = 4.000000 of the approximations. In extreme cases, all significant digits of precision can be lost.<ref name="Goldberg_1991"/><ref name="Sierra_1962"/> This ''[[Catastrophic cancellation|cancellation]]'' illustrates the danger in assuming that all of the digits of a computed result are meaningful. Dealing with the consequences of these errors is a topic in [[numerical analysis]]; see also [[#Accuracy problems|Accuracy problems]].
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Floating-point arithmetic
(section)
Add topic