Comparing Floating Point Numbers in C/C++

Nytro · September 3, 2017

Comparing Floating Point Numbers in C/C++

Published September 2nd, 2017by Elliot Chance

Comparing floating point numbers for equality can be problematic. It’s difficult because often we are comparing small or large numbers that are not represented exactly. There is also issues with rounding errors caused by not being able to represent and exact value.

Rather than doing a strict value comparison (==), we treat two values as equal if there values are very close to each other. So what does “very close" mean? Well to answer that we have to take a look at how the numbers are represented in memory.

Here is an example 32bit float:

And here is the layout of a double precision (64bit) float:

The number of bits in the fraction can be thought of as the number of significant bits (or, accuracy) of the number. We do not want to use all of the fraction bits otherwise we would be doing a strict comparison, but we will use most. For both sized float types I will use 4 less significant bits:

#define INT64 52 - 4
#define INT32 23 - 4

We will use this to calculate the epsilon (the small difference that is still considered equal). However, we have to be careful that the number of bits that we use to calculate the epsilon is based on the smallest precision value in the comparison. For example is we compare a 32bit float with a 64bit float we must use only the precision of the 32bit float.

For this was can use a macro:

#define approx(actual, expected) approxf(actual, expected, \
    sizeof(actual) != sizeof(expected) ? INT32 : INT64)

There are two more gotchas:

Zero is a special case because the epsilon would also be zero causing an exact comparison. So we need to treat this as a separate case.
Non-rational numbers, such as NaNs and infinities are never equal to each other in any combination. If either side is non-rational the comparison is never equal.

static int approxf(double actual, double expected, int bits) {
    // We do not handle infinities or NaN.
    if (isinf(actual) || isinf(expected) || isnan(actual) || isnan(expected)) {
        return 0;
    }

    // If we expect zero (a common case) we have a fixed epsilon from actual. If
    // allowed to continue the epsilon calculated would be zero and we would be
    // doing an exact match which is what we want to avoid.
    if (expected == 0.0) {
        return fabs(actual) < (1 / pow(2, bits));
    }

    // The epsilon is calculated based on significant bits of the actual value.
    // The amount of bits used depends on the original size of the float (in
    // terms of bits) subtracting a few to allow for very slight rounding
    // errors.
    double epsilon = fabs(expected / pow(2, bits));

    // The numbers are considered equal if the absolute difference between them
    // is less than the relative epsilon.
    return fabs(actual - expected) <= epsilon;
}

You can find the full commented solution as part of the test suite in the c2go project.

Thank you for reading. I'd really appreciate any and all feedback, please leave your comments below or consider subscribing. Happy coding.

Sursa: https://elliot.land/post/comparing-floating-point-numbers-in-c-c

Sign In

Comparing Floating Point Numbers in C/C++

Recommended Posts

Nytro

Comparing Floating Point Numbers in C/C++

Join the conversation

Browse

Activity

Pages