Floating Points:
living in a discretized, computerized world

Benjamin LeRoy

August 21st, 2019

Potential Titles

Motivation 1: Testing (raw floating point errors)

print(.1 + .2); print(.3)
## [1] 0.3
## [1] 0.3

and you think “wow, I really like coding, and I’m like a good mathematician”, and then this happens…

.1 + .2 == .3
## [1] FALSE

And that’s when you go into pure mathematics cause no one likes computers anyway.

Technically here’s a correction if you encounter this (but we’ll talk about why you did soon):

testthat::expect_equal(.1 + .2, .3, tolerance = 0) # says something useful
## Error: 0.1 + 0.2 not equal to 0.3.
## 1/1 mismatches
## [1] 0.3 - 0.3 == 5.55e-17
testthat::expect_equal(.1 + .2, .3, tolerance = .Machine$double.eps)
# or 
all.equal(.1 + .2, .3, tolerance = .Machine$double.eps)
## [1] TRUE

Motivation 2: Matrix Problems and Stability

Suppose you came back to your computer and given that you’re into Stats, you’ve coded up some iterative algorithm to solve some linear algebra stuffs (you know matrix multiplication, inversion etc).

Say you have some code that can solve \(Ax = b\) where

\[ A = \begin{bmatrix} .78 &.563 \\ .913 & .659 \end{bmatrix} \quad \quad b = \begin{bmatrix} .217 \\ .254 \end{bmatrix} \]

Now, as mathematicians we know that \(x = A^{-1}b\) (aka we have a closed form solution), but for this example, we will just use the fact that we know the answer mathematically and we’d like to assess preformance of our algorithm through tests.

So, the set up:

A <- matrix(c(.78, .563,
              .913, .659),
            byrow = T, nrow = 2)
b <- c(.217, .254)
x_true <- solve(a = A,b = b); x_true
## [1]  1 -1

That is, we know the true solution is \(\begin{bmatrix} 1 \\ -1 \end{bmatrix}\).

So you decide that you want your answer (from your function) to be close to solving the equation, that is \(A\hat{x} - b\) is really close to zero (say \(l_2\) norm < 1e-5). Note that, since we know the true solution (\(x\)), we will also be checking the relative error of the \(\hat{x}\) (your answer from your function), i.e. \[|x-\hat{x}|/|x|\]

The first function you write returns the following value for \(\hat{x}_1\):

x_hat1 <- c(.999,-.999)
res_1 <- b - A %*% x_hat1
res_1
##          [,1]
## [1,] 0.000217
## [2,] 0.000254
sprintf("l_2 norm of residual is %.6f", norm(res_1))
## [1] "l_2 norm of residual is 0.000471"
norm(x_true - x_hat1, type = "2")/norm(x_true, type = "2") 
## [1] 0.0009999999

A friend says they think they have a function that solved the problem, and sends over \(\hat{x}_2\).

x_hat2 <- c(.341, -.087)
res_2 <- b - A %*% x_hat2
res_2
##       [,1]
## [1,] 1e-06
## [2,] 0e+00
sprintf("l_2 norm of residual is %.6f", norm(res_2))
## [1] "l_2 norm of residual is 0.000001"
norm(x_true - x_hat2, type = "2")/norm(x_true, type = "2")
## [1] 0.7961941

Dang it…

Outline

  1. R and floating points
    • IEEE representation & underlying structure
  2. Understanding & Avoiding Errors (algebraic)
  3. Understanding & Avoiding Errors (linear algebra)
  4. Summary

Part I

Introduction to floating points

Introduction

print(.1 + .2); print(.3)
## [1] 0.3
## [1] 0.3
.1 + .2 == .3
## [1] FALSE

What’s really happening?

What’s really happening?

\(0.1 + 0.2\):

##   0.1000000000000000055511151231257827021181583404541015625
## + 0.2000000000000000111022302462515654042363166809082031250
## ------------------------------------------------------------
##   0.3000000000000000444089209850062616169452667236328125000

vs. \(0.3\):

##   0.2999999999999999888977697537484345957636833190917968750

why is “\(.3\)” is printed?

#?print.default

getOption("digits") #7
signif(.1 + .2, digits = 7)
## [1] 0.3

And above we were showing 55 significant digits is base 10… Technically we just used sprintf("%1.55f", .1) (see Rmd file).

IEEE 754 usage in R (finite storage)

  1. R only has double precision floats

  2. Which means the raw digit information is contained in “53 bits, and represents to that precision a range of absolute values from about 2e-308 to 2e+308” ~ help(float)

Example - from youtube (in only 32 bits - we use 64):

Express how to store: 263.3 in R (not going over).

int2bin <- function(x) { 
  R.utils::intToBin(x)
}
frac2bin <- function(x){
  stringr::str_pad(R.utils::intToBin(x * 2^31), 31, pad="0")
}
  1. 263 = 100000111

    0.3 = 0100110011001100110011001100110

  2. \(\Rightarrow\) 263.3 = 100000111.0100110011001100110011001100110

  3. (get exp bits): 1.000001110100110011001100110011001100110 \(\times \; 2^8\)

  4. tranform exp relative to ‘bias’ (below 1): 1023 + 8 = 1031 (bit = 10000000111) note: ‘bias’ for IEEE 754 double = 1023

  5. grab fraction bits from part 3.

final answer:

## 1 12345678901 12345678901234567890123456789012345678901234567890123
## 0 10000000111 000001110100110011001100110011001100110...

How can we think about this type of storage (with bits)

(2^53 + 0:12) - 2^53 
##  [1]  0  0  2  4  4  4  6  8  8  8 10 12 12
  9007199254740992 (2^53)
+                5
------------------
  9007199254740997 (exact result)
  9007199254740996 (rounded to nearest representable float)
  9007199254740992 
+                0.0001
-----------------------
  9007199254740992.0001 (exact result)
  9007199254740992      (rounded to nearest representable float)

Example from Floating Point Demystified, Part 1)

Associating the storage with precision

Aside: What about the integers?

class(100L); class(100)
## [1] "integer"
## [1] "numeric"
is.integer(1); is.integer(1:2)
## [1] FALSE
## [1] TRUE
identical(1,1L); 1 == 1L
## [1] FALSE
## [1] TRUE

Part II

Understanding and Avoiding Errors (algebraic)

Overview

In this section I’ll give you some examples to showcase certain principles that should give you useful starting points

When calculating sample variance…

Be careful when you’re taking the difference of 2 number (that are large in absolute value), but their difference is relatively small

For example: calculation of sample variance1

They observe that if we use that trick we learning in stats class (i.e. calculate the \(\sum x_i^2\) and \(\sum x_i\) terms seperately we can run into problems).

\[\text{recall: } \quad S = \frac{1}{n-1} \left( \sum_i x_i^2 - \frac{1}{n} \left(\sum_i x_i\right)^2\right) = \frac{1}{n-1} \left( \sum_i \left(x_i - \frac{1}{n}\sum_j x_j \right)^2 \right)\]

x_sample <- c(50000000, 50000001, 50000002)
x_sum <- sum(x_sample); x2_sum <- sum(x_sample^2)
n <- length(x_sample)

1/(n-1) * (x2_sum - 1/n* (x_sum)^2) # "shortcut"
## [1] 1.5
var(x_sample) # calculated the better way (iterative updates actually)
## [1] 1

Relative error:

abs(1 - 1/(n-1) * (x2_sum - 1/n* (x_sum)^2))
## [1] 0.5

An extra slide for the math

Thinking about cancellation (that is \(a-b\)):

We can express the estimated value \(\hat{x}\) of the output value \(x\) for the subtraction between \(a\) and \(b\) in the following way. We have \(\hat{x} = \hat{a} - \hat{b}\) where \(\hat{a} = a(1-\Delta a)\), \(\hat{b} = b(1-\Delta b)\).

Then the relative error can be bounded above by

\[\bigg|\frac{x-\hat{x}}{x}\bigg| = \bigg|\frac{-a\Delta a + b \Delta b}{a - b} \bigg| \leq max(|\Delta a|, |\Delta b|) \frac{|a|+|b|}{|a-b|}\]

Higham, Nicholas J. Accuracy and stability of numerical algorithms. Vol. 80. Siam, 2002, pg 9

Notice how this is an upper bound for the error and includes the term \(\frac{|a|+|b|}{|a-b|}\). This implies that, “the relative error bound for \(\hat{x}\) is large when \(|a - b| << |a| + |b|\)” (subtractive cancelation can bring emphasis on early errors/ approximations)

See 10 digit rounding example on Higham, Nicholas J. Accuracy and stability of numerical algorithms. Vol. 80. Siam, 2002, pg 9 for a extreme example.

Take away 1:

Myth busting: (accumulation of errors will always be bad)

“Most often, instability is caused by just a few rounding errors” ~ Higham, Nicholas J. Accuracy and stability of numerical algorithms. Vol. 80. Siam, 2002, pg 9

Example:

\[ e := \underset{n \to \infty}{\text{lim}} (1+1/n)^n \]

f <- function(n) { (1 + 1/n)^n }
n <- 10^seq(2,20, by = 2)

Early take aways:

Part III

Understanding and Avoiding Errors (linear algebra)

Matrix multiplications (designing stable algorithms)

To understanding what’s going on in our matrix problems we need a few more concepts. They are related to the above and empahasize that stability of an algorithm is based on more the just truncation of true values (based on floating point representation)

**Message:** even if you know a mathmatical representation - look for stable algorithms instead of writing your own

General Stability Analysis

  1. backward and forward errors (perturbation theory)
  2. Condition number (tools: kappa)

Stability in terms of backwards and forward errors

Higham, Nicholas J. Accuracy and stability of numerical algorithms. Vol. 80. Siam, 2002

Stability

Stability of a Matrix

Recalling the \(Ax = b\) problem:

\[ A = \begin{bmatrix} .78 &.563 \\ .913 & .659 \end{bmatrix} \quad \quad b = \begin{bmatrix} .217 \\ .254 \end{bmatrix} \]

Suppose we were going to use the built in solve function to calculate x. How stable would this approach be (in terms of condition number)?

# adding Error
E <- .001 * matrix(c(1,1,
                     1,-1),
                   byrow = T, nrow = 2)
A2 <- A + E
A2; A
##       [,1]  [,2]
## [1,] 0.781 0.564
## [2,] 0.914 0.658
##       [,1]  [,2]
## [1,] 0.780 0.563
## [2,] 0.913 0.659
x_hat3 <- solve(a = A2,b = b)
x_hat3
## [1]  0.29411765 -0.02252816
# delta_y
rc_x <- norm(x_true - x_hat3, type = "2")/norm(x_true, type = "2"); rc_x
## [1] 0.8525612
# delta_x
rc_A <- norm(E, type = "2")/norm(A, type = "2"); rc_A
## [1] 0.0009549354

This seems pretty instable, and in R we have ways to that the matrix would lead to this type of thing

Condition Numbers for Matrices

kappa(A) # remember large is bad
## [1] 2482775
kappa(diag(2))
## [1] 1

Why should you care? this should seem useful if you’re doing things like linear regression (if you code it up yourself or even if you don’t) - as a high kappa value means you have a matrix that is close to having zero eigenvalues (which could suggest that your matrix is almost “non-invertible”)

For lm, Chamber and Hastie2 say we should only worry when

.Machine$double.eps * kappa(A) # isn't super large 
## [1] 5.512868e-10

Random last slides: log probabilities

Naive Bayes

Imagine that you’re doing a naive bayes model for bag-of-word analysis (say the spam/not-spam example)

\[\begin{align*} H^{MAP} &= \underset{i \in \{spam, not-spam\}}{\text{argmax}} \left[\prod_{w=1}^W \mathbb{P}(\text{Freq}_w|H_i) \right] \mathbb{P}(H_i) \\ &= \underset{i \in \{spam, not-spam\}}{\text{argmax}} \left[\sum_{w=1}^W \text{log } \mathbb{P}(\text{Freq}_w|H_i) \right] + \text{log } \mathbb{P}(H_i) \end{align*}\]
(1/2)^1000
## [1] 9.332636e-302

Part IV

Summary

Designing Stable Algorithms

  1. Try to avoid subtracting quantities contaminated by error (though such subtractions may be unavoidable) .

  2. Minimize the size of intermediate quantities relative to the final solution.

  3. Look for different formulations of a computation that are mathematically but not numerically equivalent. Don’t assume just because you know a mathematical expression that it’s going to be stable

  4. It is advantageous to express update formulae as \[\text{new-value} = \text{old-value} + \text{small-correction}\] if the small correction can be computed with many correct significant figures. Natural examples: Netwon’s method, some ODEs

  5. Use only well-conditioned transformations of the problem. (associated with number 3)

  6. Take precautions to avoid unnecessary overflow and underflow (log prob)

From: Higham, Nicholas J. Accuracy and stability of numerical algorithms. Vol. 80. Siam, 2002.


  1. Algorithms for Computing the Sample Variance: Analysis and Recommendations)

  2. Chambers, John M., and Trevor J. Hastie, eds. Statistical models in S. Vol. 251. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books & Software, 1992.