Editing Rounding (section)

==Rounding in other contexts==

===Dithering and error diffusion===
When digitizing [[continuous signal]]s, such as sound waves, the overall effect of a number of measurements is more important than the accuracy of each individual measurement. In these circumstances, [[dithering]], and a related technique, [[error diffusion]], are normally used. A related technique called [[pulse-width modulation]] is used to achieve analog type output from an inertial device by rapidly pulsing the power with a variable duty cycle.

Error diffusion tries to ensure the error, on average, is minimized. When dealing with a gentle slope from one to zero, the output would be zero for the first few terms until the sum of the error and the current value becomes greater than 0.5, in which case a 1 is output and the difference subtracted from the error so far. [[Floyd–Steinberg dithering]] is a popular error diffusion procedure when digitizing images.

As a one-dimensional example, suppose the numbers {{math|0.9677}}, {{math|0.9204}}, {{math|0.7451}}, and {{math|0.3091}} occur in order and each is to be rounded to a multiple of {{math|0.01}}. In this case the cumulative sums, {{math|0.9677}}, {{math|1=1.8881 = 0.9677 + 0.9204}}, {{math|1=2.6332 = 0.9677 + 0.9204 + 0.7451}}, and {{math|1=2.9423 = 0.9677 + 0.9204 + 0.7451 + 0.3091}}, are each rounded to a multiple of {{math|0.01}}: {{math|0.97}}, {{math|1.89}}, {{math|2.63}}, and {{math|2.94}}. The first of these and the differences of adjacent values give the desired rounded values: {{math|0.97}}, {{math|1=0.92 = 1.89 − 0.97}}, {{math|1=0.74 = 2.63 − 1.89}}, and {{math|1=0.31 = 2.94 − 2.63}}.

===Monte Carlo arithmetic===
Monte Carlo arithmetic is a technique in [[Monte Carlo method]]s where the rounding is randomly up or down. Stochastic rounding can be used for Monte Carlo arithmetic, but in general, just rounding up or down with equal probability is more often used. Repeated runs will give a random distribution of results which can indicate the stability of the computation.<ref>{{cite web |url=http://web.cs.ucla.edu/~stott/mca/CSD-970014.ps.gz |title=Monte Carlo Arithmetic: a framework for the statistical analysis of roundoff errors |last1=Parker |first1=D. Stott |last2=Eggert |first2=Paul R. |date=28 March 2000 |publisher=IEEE Computation in Science and Engineering |last3=Pierce |first3=Brad}}</ref>

===Exact computation with rounded arithmetic===
It is possible to use rounded arithmetic to evaluate the exact value of a function with integer domain and range. For example, if an integer {{mvar|n}} is known to be a perfect square, its square root can be computed by converting {{mvar|n}} to a floating-point value {{mvar|z}}, computing the approximate square root {{mvar|x}} of {{mvar|z}} with floating point, and then rounding {{mvar|x}} to the nearest integer {{mvar|y}}. If {{mvar|n}} is not too big, the floating-point round-off error in {{mvar|x}} will be less than 0.5, so the rounded value {{mvar|y}} will be the exact square root of {{mvar|n}}. This is essentially why [[slide rule]]s could be used for exact arithmetic.

===Double rounding===
{{Anchor|Sticky rounding|Rounding to odd}}
Rounding a number twice in succession to different levels of precision, with the latter precision being coarser, is not guaranteed to give the same result as rounding once to the final precision except in the case of directed rounding.<ref group="nb" name="NB_Double"/> For instance rounding 9.46 to one decimal gives 9.5, and then 10 when rounding to integer using rounding half to even, but would give 9 when rounded to integer directly. Borman and Chatfield<ref>{{cite journal |last1=Borman |first1=Phil |last2=Chatfield |first2=Marion |title=Avoid the perils of using rounded data |journal=Journal of Pharmaceutical and Biomedical Analysis |date=10 November 2015 |volume=115 |pages=506–507 |doi=10.1016/j.jpba.2015.07.021 |pmid=26299526}}</ref> discuss the implications of double rounding when comparing data rounded to one decimal place to specification limits expressed using integers.

In ''Martinez v. Allstate'' and ''Sendejo v. Farmers'', litigated between 1995 and 1997, the insurance companies argued that double rounding premiums was permissible and in fact required. The US courts ruled against the insurance companies and ordered them to adopt rules to ensure single rounding.<ref>{{cite book |title=Class Action Dilemmas: Pursuing Public Goals for Private Gain |url=https://archive.org/details/classactiondilem00debo |url-access=registration |author=Deborah R. Hensler |pages=[https://archive.org/details/classactiondilem00debo/page/255 255–293] |isbn=0-8330-2601-1 |publisher=RAND |date=2000}}</ref>

Some computer languages and the [[IEEE 754-2008 revision|IEEE 754-2008]] standard dictate that in straightforward calculations the result should not be rounded twice. This has been a particular problem with Java as it is designed to be run identically on different machines, special programming tricks have had to be used to achieve this with [[x87]] floating point.<ref>{{cite journal |title=When is double rounding innocuous? |author=Samuel A. Figueroa |publisher=ACM |journal=ACM SIGNUM Newsletter |volume=30 |issue=3 |date=July 1995 |pages=21–25 |doi=10.1145/221332.221334 |s2cid=14829295|doi-access=free }}</ref><ref>{{cite web |title=Efficiently producing default orthogonal IEEE double results using extended IEEE hardware |author=Roger Golliver |publisher=Intel |date=October 1998 |url=https://www.open-std.org/JTC1/SC22/JSG/docs/m3/docs/jsgn326.pdf}}</ref> The Java language was changed to allow different results where the difference does not matter and require a [[strictfp]] qualifier to be used when the results have to conform accurately; strict floating point has been restored in Java&nbsp;17.<ref>{{cite web |first=Joseph D. |last=Darcy |title=JEP 306: Restore Always-Strict Floating-Point Semantics |url=http://openjdk.java.net/jeps/306 |access-date=2021-09-12}}</ref>

In some algorithms, an intermediate result is computed in a larger precision, then must be rounded to the final precision. Double rounding can be avoided by choosing an adequate rounding for the intermediate computation. This consists in avoiding to round to midpoints for the final rounding (except when the midpoint is exact). In binary arithmetic, the idea is to round the result toward zero, and set the least significant bit to 1 if the rounded result is inexact; this rounding is called ''sticky rounding''.<ref>{{cite journal |last1=Moore |first1=J. Strother |last2=Lynch |first2=Tom |last3=Kaufmann |first3=Matt |date=1996 |title=A mechanically checked proof of the correctness of the kernel of the AMD5K86 floating-point division algorithm |url=https://www.cs.utexas.edu/users/moore/publications/divide_paper.pdf |journal=IEEE Transactions on Computers |volume=47 |citeseerx=10.1.1.43.3309 |access-date=2016-08-02 |doi=10.1109/12.713311}}</ref> Equivalently, it consists in returning the intermediate result when it is exactly representable, and the nearest floating-point number with an odd significand otherwise; this is why it is also known as ''rounding to odd''.<ref>{{cite journal |url=https://inria.hal.science/inria-00080427v2/document |title=Emulation of a FMA and correctly-rounded sums: proved algorithms using rounding to odd |last1=Boldo |first1=Sylvie |author1-link=Sylvie Boldo |last2=Melquiond |first2=Guillaume |date=2008 |journal=IEEE Transactions on Computers |volume=57 |issue=4 |pages=462–471 |format=PDF |doi=10.1109/TC.2007.70819 |s2cid=1850330 |access-date=2016-08-02}}</ref><ref>{{cite web |url=https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21718#c25 |title=21718 – real.c rounding not perfect |website=gcc.gnu.org}}</ref> A concrete implementation of this approach, for binary and decimal arithmetic, is implemented as [[#Rounding to prepare for shorter precision|Rounding to prepare for shorter precision]].

===Table-maker's dilemma===
[[William M. Kahan]] coined the term "The Table-Maker's Dilemma" for the unknown cost of rounding [[transcendental function]]s:
{{Blockquote|Nobody knows how much it would cost to compute {{math|''y''{{sup|''w''}}}} correctly rounded for {{em|every}} two floating-point arguments at which it does not over/underflow. Instead, reputable math libraries compute elementary [[transcendental function]]s mostly within slightly more than half an [[unit in the last place|ulp]] and almost always well within one ulp. Why can't {{math|''y''{{sup|''w''}}}} be rounded within half an ulp like SQRT? Because nobody knows how much computation it would cost... No general way exists to predict how many extra digits will have to be carried to compute a transcendental expression and round it {{em|correctly}} to some preassigned number of digits. Even the fact (if true) that a finite number of extra digits will ultimately suffice may be a deep theorem.<ref>{{cite web |last=Kahan |first=William Morton |author-link=William Morton Kahan |title=A Logarithm Too Clever by Half |url=https://people.eecs.berkeley.edu/~wkahan/LOG10HAF.TXT |access-date=2008-11-14}}</ref>}}

The [[IEEE 754]] floating-point standard guarantees that add, subtract, multiply, divide, [[fused multiply–add]], square root, and floating-point remainder will give the correctly rounded result of the infinite-precision operation. No such guarantee was given in the 1985 standard for more complex functions and they are typically only accurate to within the last bit at best. However, the 2008 standard guarantees that conforming implementations will give correctly rounded results which respect the active rounding mode; implementation of the functions, however, is optional.

Using the [[Gelfond–Schneider theorem]] and [[Lindemann–Weierstrass theorem]], many of the standard elementary functions can be proved to return [[transcendental number|transcendental]] results, except on some well-known arguments; therefore, from a theoretical point of view, it is always possible to correctly round such functions. However, for an implementation of such a function, determining a limit for a given precision on how accurate results need to be computed, before a correctly rounded result can be guaranteed, may demand a lot of computation time or may be out of reach.<ref name="Muller_2010">{{cite book |author-last1=Muller |author-first1=Jean-Michel |author-last2=Brisebarre |author-first2=Nicolas |author-last3=de Dinechin |author-first3=Florent |author-last4=Jeannerod |author-first4=Claude-Pierre |author-last5=Lefèvre |author-first5=Vincent |author-last6=Melquiond |author-first6=Guillaume |author-last7=Revol |author-first7=Nathalie |author7-link=Nathalie Revol |author-last8=Stehlé |author-first8=Damien |author-last9=Torres |author-first9=Serge |title=Handbook of Floating-Point Arithmetic |date=2010 |publisher=[[Birkhäuser]] |edition=1 |isbn=978-0-8176-4704-9<!-- print --> |doi=10.1007/978-0-8176-4705-6 |lccn=2009939668<!-- |id={{isbn |978-0-8176-4705-6}} (online), {{isbn|0-8176-4704-X}} (print) -->|chapter=Chapter 12: Solving the Table Maker's Dilemma|chapter-url=https://cds.cern.ch/record/1315760}}</ref> In practice, when this limit is not known (or only a very large bound is known), some decision has to be made in the implementation (see below); but according to a probabilistic model, correct rounding can be satisfied with a very high probability when using an intermediate accuracy of up to twice the number of digits of the target format plus some small constant (after taking special cases into account).

Some programming packages offer correct rounding. The [[GNU MPFR]] package gives correctly rounded arbitrary precision results. Some other libraries implement elementary functions with correct rounding in [[Double-precision floating-point format#IEEE 754 double-precision binary floating-point format: binary64|IEEE 754 double precision]] (binary64):
* [[IBM]]'s ''ml4j'', which stands for ''Mathematical Library for Java'', written by [[Abraham Ziv]] and Moshe Olshansky in 1999, correctly rounded to nearest only.<ref>{{cite web |url=https://netlib.org/na-digest-html/99/v99n16.html |title=NA Digest Sunday, April 18, 1999 Volume 99 : Issue 16 |date=18 April 1999| accessdate=29 August 2022}}</ref><ref>{{cite web |url=http://www.alphaworks.ibm.com/tech/mathlibrary4java/ |title=Math Library for Java |url-status=dead |archive-url=https://web.archive.org/web/19990508150807/http://www.alphaworks.ibm.com/tech/mathlibrary4java/ |archive-date=8 May 1999}}</ref> This library was claimed to be portable, but only binaries for [[PowerPC]]/[[IBM AIX|AIX]], [[SPARC]]/[[Oracle Solaris|Solaris]] and [[x86]]/[[Windows NT]] were provided. According to its documentation, this library uses a first step with an accuracy a bit larger than double precision, a second step based on [[double-double arithmetic]], and a third step with a 768-bit precision based on arrays of IEEE 754 double-precision floating-point numbers.
* IBM's ''Accurate portable mathematical library'' (abbreviated as APMathLib or just MathLib),<ref>{{cite web |url=http://oss.software.ibm.com/mathlib/ |title=Accurate Portable Mathematical Library |archive-url=https://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/ |archive-date=7 February 2005}}</ref><ref>{{GitHub|dreal-deps/mathlib}}.</ref> also called libultim,<ref>{{cite web |url=http://www.math.utah.edu/cgi-bin/man2html.cgi?/usr/local/man/man3/libultim.3 |title=libultim – ultimate correctly-rounded elementary-function library |url-status=dead |archive-url=https://web.archive.org/web/20210301025744/http://www.math.utah.edu/cgi-bin/man2html.cgi?/usr/local/man/man3/libultim.3 |archive-date=1 March 2021}}</ref> in rounding to nearest only. This library uses up to 768 bits of working precision. It was included in the [[GNU C Library]] in 2001,<ref>{{cite web|url=https://sourceware.org/git/?p=glibc.git;a=commit;h=e4d8276142b9c07b23043ef44b0fe8fa7bcc3121 |title=Git - glibc.git/commit |publisher=Sourceware.org |date= |accessdate=2022-07-18}}</ref> but the "slow paths" (providing correct rounding) were removed from 2018 to 2021.<!-- Search for "slow path" in the git log. -->
* CRlibm, written in the old Arénaire team (LIP, [[ENS Lyon]]), first distributed in 2003.<ref>{{cite journal |last1=de Dinechin |first1=Florent |last2=Lauter |first2=Christoph |last3=Muller |first3=Jean-Michel |title=Fast and correctly rounded logarithms in double-precision |journal=RAIRO-Theor. Inf. Appl. |date=January–March 2007 |volume=41 |number=1 |pages=85–102 |doi=10.1051/ita:2007003 |id={{HAL|ensl-00000007v2}} |citeseerx=10.1.1.106.6652 }}</ref> It supports the 4 rounding modes and is proved, using the knowledge of the hardest-to-round cases.<ref>{{cite web |url=http://lipforge.ens-lyon.fr/www/crlibm/ |archive-url=https://web.archive.org/web/20161027224938/http://lipforge.ens-lyon.fr/www/crlibm |archive-date=2016-10-27 |title=CRlibm – Correctly Rounded mathematical library}}</ref><ref>{{GitHub|taschini/crlibm}}</ref> More efficient than IBM MathLib.<ref name=coremath-ieee29/> Succeeded by Metalibm (2014), which automates the formal proofs.<ref>{{cite conference|last1=Kupriianova |first1=Olga |last2=Lauter |first2=Christoph |title=Metalibm: A Mathematical Functions Code Generator |conference=Mathematical Software – ICMS 2014|url=https://hal.science/hal-01513490/ |date=2014 |volume=8592 |pages=713–717 |doi=10.1007/978-3-662-44199-2_106}}</ref>
* [[Sun Microsystems]]'s libmcr of 2004, in the 4 rounding modes.<ref>{{cite web |url=http://www.math.utah.edu/cgi-bin/man2html.cgi?/usr/local/man/man3/libmcr.3 |title=libmcr – correctly-rounded elementary-function library |url-status=dead |archive-url=https://web.archive.org/web/20210225085748/http://www.math.utah.edu/cgi-bin/man2html.cgi?/usr/local/man/man3/libmcr.3 |archive-date=25 February 2021}}</ref><ref>{{GitHub|simonbyrne/libmcr}}.</ref> For the difficult cases, this library also uses multiple precision, and the number of words is increased by 2 each time the Table-maker's dilemma occurs (with undefined behavior in the very unlikely event that some limit of the machine is reached).<!-- in the sources: "nw += 2;" in a loop -->
* The CORE-MATH project (2022) provides some correctly rounded functions in the 4 rounding modes for [[x86-64]] processors. Proved using the knowledge of the hardest-to-round cases.<ref>{{cite web |url=https://core-math.gitlabpages.inria.fr/ |title=The CORE-MATH project |accessdate=30 August 2022}}</ref><ref name=coremath-ieee29>{{cite conference |url=https://inria.hal.science/hal-03721525v3 |last1=Sibidanov |first1=Alexei |last2=Zimmermann |first2=Paul |last3=Glondu |first3=Stéphane |title=The CORE-MATH Project |conference=29th IEEE Symposium on Computer Arithmetic (ARITH 2022) |conference-url=https://arith2022.arithsymposium.org/ |date=2022 |accessdate=30 August 2022}}</ref>
* [[LLVM]] libc provides some correctly rounded functions in the 4 rounding modes.<ref>{{cite web |title=Math Functions — The LLVM C Library |url=https://libc.llvm.org/math |website=libc.llvm.org}}</ref>

There exist [[computable number]]s for which a rounded value can never be determined no matter how many digits are calculated. Specific instances cannot be given but this follows from the undecidability of the [[halting problem]]. For instance, if [[Goldbach's conjecture]] is true but [[unprovable]], then the result of rounding the following value, {{mvar|n}}, [[#Rounding up|up to the next integer]] cannot be determined: either {{mvar|n}}=1+10<sup>−{{mvar|k}}</sup> where {{mvar|k}} is the first even number greater than 4 which is not the sum of two primes, or {{mvar|n}}=1 if there is no such number. The rounded result is 2 if such a number {{mvar|k}} exists and 1 otherwise. The value before rounding can however be approximated to any given precision even if the conjecture is unprovable.

===Interaction with string searches===
Rounding can adversely affect a string search for a number. For example, {{pi}} rounded to four digits is "3.1416" but a simple search for this string will not discover "3.14159" or any other value of {{pi}} rounded to more than four digits. In contrast, truncation does not suffer from this problem; for example, a simple string search for "3.1415", which is {{pi}} truncated to four digits, will discover values of {{pi}} truncated to more than four digits.