Why not to represent Money as Floating Point Numbers

It can be tempting to reach for floating point numbers to represent monetary values like $10.35 since floats are a basic data type that most programming environments support. The problem with floats is that they cannot correctly represent certain numbers and are implicitly rounded, which can lead to surprising behavior. Therefore, in the context of financial applications it is generally recommended to avoid floats and our goal here is to explain why that is the case.

Integer vs Floats

To get a better feeling for what numbers floats can and cannot represent, it helps to contrast them to the integers. More specifically, let us compare 32-bit signed integers and 32-bit floats. Although they occupy the exact same space in memory, namely 32 bits, they span a vastly different range of numbers. Integers cover all whole numbers from $-2^{31}$ to $2^{31}-1$ , whereas floats roughly cover all real numbers from $-2^{128}$ to $2^{128}$ (plus some special values like infinity, etc). This means that floats do not only cover much larger absolute values than integers, but they also cover an infinitely many real numbers in between.

Floats achieve this feat because they sacrifice precision and resort to rounding when a number cannot be represented precisely. Importantly, floats try to minimize this rounding error by confining the precision loss to the least significant digits. This means, relative to the magnitude of the represented value, the error is small, yet in absolute terms, the error can be big. Take for example the integer $281488458451187 \approx 2^{48}$, which is too big for a 32-bit integer. If we attempt to store this value as 32-bit float, we see that it cannot be represented precisely as float and hence is rounded to the next float that can, namely $281488465592320$. The absolute error $7141133$ looks big, but is comparatively small given the scale of the numbers.

The Effect of Rounding

The following table shows examples of numbers and their equivalent 32-bit floating point values along with the incurred rounding error (we computed these values with this excellent floating point converter).

NumberFloatRounding Error
$281488458451187$$281488465592320$$7141133$
$1.25$$1.25$$0$
$1.20$$1.2000000476837158203125$$0.0000000476837158203125$
$0.10$$0.100000001490116119384765625$$0.000000001490116119384765625$
$0.20$$0.20000000298023223876953125$$0.00000000298023223876953125$
$0.25$$0.25$$0$
$0.30$$0.300000011920928955078125$$0.000000011920928955078125$
$0.50$$0.50$$0$
$0.70$$0.699999988079071044921875$$-0.000000011920928955078125$

Notice that some numbers are exact, while others are not. If you look more closely, you can see that the numbers that can be represented exactly are multiples of powers of two (e.g., $0.25 = 2^{-2}$). This is because floats use a binary representation for numbers under the hood as we will see later. On the other hands, powers of ten (e.g., $0.1 = 10^{-1}$) cannot be represented exactly, which is a problem for monetary values because we often need to represent cents, etc.

Rounding leads at times to some counter-intuitive results when we perform arithmetic operations. Try out the following boolean expressions in the programming language of your choice. If it adheres to the IEEE 754 standard for floating point numbers you will see the following results:

ExpressionResult
0.5 + 0.25 == 0.75true
0.1 + 0.2 == 0.3false
0.1 + 0.2 > 0.3true
(0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3)false

While the first example leads to a correct result, the remaining three rows do not. Clearly, $\frac{1}{10} + \frac{2}{10}$ should be equal to $\frac{3}{10}$, but this is not the case in IEEE 754 due to rounding. In fact, as the third example shows, $\frac{1}{10} + \frac{2}{10} > \frac{3}{10}$. The fourth example shows that addition is not always associative for floats.

Say you have two bank accounts $A_1 = \$12.1$ and $A_2 = \$7.2$ with a total of $ \$19.3$ and you transfer one dollar from $A_1$ to $A_2$. You would correctly expect that the sum of money in the two accounts remains the same, but this is not necessarily the case since 12.1 + 7.2 != 11.1 + 8.2.

IEEE 754 Floating Point Numbers

$$ (-1)^s \cdot c \cdot b^q $$

where $s \in {0, 1}$ determines the sign of the number, c is called the significand, $b$ the base (or radix), and $q$ is the exponent. The base is typically an agreed upon constant, like 2 or 10, which means the three remaining variables $(s, c, q)$ describe a float. For example, in base 10 the triple $(0, 14225, -2)$ defines the number $(-1)^0 \cdot 14225 \cdot 10^{-2} = 142.25$.

The IEEE 754 standard for floating point numbers builds on top of this basic representation of floats with a few tweaks to optimize them for computers. The standard represents 32-bit floats as a triple $(s, c, q)$ where 1 bit is allocated for sign $s$, 23 bits for significand $c$, and 8 bits for exponent $q$. The corresponding float is given by:

$$ (-1)^s \cdot (1+c) \cdot 2^{(q-127)} $$

Notice the following properties:

  • Floats use binary numbers (base 2) to optimize them for computers (actually, IEEE 754 also defines base-10 floats, but they are not commonly used)
  • In 32-bit floats, the exponent $q$ is stored as an 8-bit unsigned integer in the range $[0,255]$. Values $q \in [0, 127)$ yield negative exponents, $q=127$ means the exponent is zero, and $q \in [128, 255]$ yields positive exponents.
  • The significand is given by $(1+c)$, where $0 \leq c < 1$. As a result, the significand always has its first bit set to 1, which means it does not have to be stored (called a hidden bit), which gives one extra bit of precision.

Here is how number $-0.40625$ is encoded as a IEEE 754 float:

Sign $s$Exponent $q$Significand $c$
Binary Representation10111110110100000000000000000000

Let us turn this binary representation back into a decimal representation:

  • Sign $s$ is simple, it remains value 1
  • Exponent $q$ in decimal is 125 and subtracting 127 we get $-2$.
  • For significand $c$ we compute the sum of $2^{-i}$ for every $i$-th bit that is set:
    • bit 1: $2^{-1} = 0.5$
    • bit 3: $2^{-3} = 0.125$

Putting this together we get: $(-1)^1 \cdot (1 + 0.5 + 0.125) \cdot 2^{125-127} = -0.40625$. We were able to represent this number precisely as float without any rounding error.

Things get more interesting for a number like 0.2 that cannot be represented precisely as float. Consider the following table:

Sign $s$Exponent $q$Significand $c$Decimal
00111110010011001100110011001101$0.20000000298023223876953125$
00111110010011001100110011001100$0.199999988079071044921875$

These are the two floats closest to 0.2, where one is slightly bigger and the other is slightly smaller than 0.2. The only difference in the binary representation in these two numbers is the least significant digit of the significand. Since we try to minimize the error, we round to the nearest number that can be represented accurately, which is the first number in this example. Had we used base-10 we we could have expressed 0.2 precisely as triple $(0, 2, -1)$, but since computers use base-2 floats, we need rounding.

Besides IEEE 754, there are also other floating point standards that, e.g., focus on smaller floats (e.g., 8 bits), which are particularly useful as weights in AI models where memory size is a major issue.

Closing Thoughts

Floating point numbers are a very clever piece of engineering and they are ideal for many use cases (e.g., graphics, etc.). In fact, learning more about how floats work under the hood gave me a great deal of appreciation for the ingenuity in their design. However, as usual, a piece of technology does not exist in a vacuum. Context matters. And in the context of financial applications, in most cases floats are not suited to represent monetary values due to their imprecisions and rounding errors.

Resources