Checking Against Infinity and NaN (not a number)
Validating a Sequence has no Infinity or Not a Number Values#
Given a sequence of floats, it may be necessary to check that there are only finite inputs, since many functions behave poorly with inputs of infinity or not a number (NaN). To do this, I wrote a quick function to check against three values, math.inf
, -math.inf
and math.nan
.
from collections.abc import Sequence
import math
def validate_sequence_is_finite(input_seq: Sequence[float]) -> bool:
"""
Given an input sequence of floats checks that none of the values
are infinity, negative infinity, or not a number (therefore finite).
Returns True if all values are finite, otherwise False.
"""
for val in input_seq:
if val in [math.inf, -math.inf, math.nan]:
return False
return True
I also wrote a series of tests to show that everything works as intended:
class TestSequenceValuesAreFinite(unittest.TestCase):
def test_empty_sequence(self) -> None:
self.assertTrue(sequence_values_are_finite([]))
def test_valid_sequence(self) -> None:
self.assertTrue(sequence_values_are_finite([1.0, 0.0, -42.0, 47102378931.0]))
def test_has_math_infinity(self) -> None:
self.assertFalse(sequence_values_are_finite([1.0, math.inf, -2.0]))
def test_has_negative_math_infinity(self) -> None:
self.assertFalse(sequence_values_are_finite([1.0, -2.0, -math.inf]))
def test_has_math_nan(self) -> None:
self.assertFalse(sequence_values_are_finite([math.nan, 1.0, -2.0]))
Not soon after I started getting reports that NaNs were not being caught by the above errors, so I added another test:
def test_has_float_nan(self) -> None:
self.assertFalse(sequence_values_are_finite([float("nan"), 1.0, -2.0]))
This test fails because float("nan")
is not triggering matching math.nan
but for some reason math.nan
does match.
To validate to myself that it isn’t a difference between creating the values manually with float
and the math
library, I created tests that use float
for the infinities, and these pass. Thus, it looks like only NaNs are exhibiting this behavior:
def test_has_float_infinity(self) -> None:
self.assertFalse(sequence_values_are_finite([float("inf"), 1.0, -2.0]))
def test_has_float_negative_infinity(self) -> None:
self.assertFalse(sequence_values_are_finite([float("-inf"), 1.0, -2.0]))
The function and all tests are in original_validation_code.py
on GitHub.
How does in
behave?#
In my post about iterable membership checks, while I discussed how there different ways it can go through objects to determine membership, I didn’t discuss how it does comparisons.
The membership test operations documentation details how this works. Personally, I always thought it used the value comparison operator (==
), but that is only half of it. Specifically, it performs both identity and value equality verifications. The check passes if either val1 is val2
or val1 == val2
is True.
Going back to the original code, I then modified it in value_equality_validation_code.py to not use in
but just check against value equality:
def value_equality_based_sequence_values_are_finite(input_seq: Sequence[float]) -> bool:
for val in input_seq:
if val == math.inf or val == -math.inf or val == math.nan:
return False
return True
For the same tests as above, now both math.nan
and float("nan")
are not detected, so what is the difference?
IEEE 754#
IEEE 754 is the “IEEE Standard for Floating-Point Arithmetic”, which is source of our interesting conundrum. The IEEE 754 Wikipedia article goes into more details, but a separate article for NaN or the Python math.nan documentation we learn that NaNs are not equal to anything else in IEEE 754, even themselves.
Looking at value comparisons between different nan
objects, we see the result is always False
:
>>> math.nan == float("nan")
False
>>> float("nan") == float("nan")
False
>>> math.nan == math.nan
False
Therefore, we cannot compare NaN
to another NaN
for equality. Instead, there is the math.isnan
function which should be used to compare to nan
.
How does is
behave with nan
?#
IEEE 754 explains why doing a check with equality will always fail to find a nan
, but in the original code, in
also did comparisons by identity, which does not follow IEEE 754! In this case, if you have two floats which both reference the same nan
object, the is
evaluation will return True
.
float("nan")
will create a new object each time. This is also true for math computations that create a nan
, like float("inf") * 0
. math.nan
is an object in the math
library and hold a reference to an object that has a value of nan
. Thus, is
comparisons between two objects that came from math.nan
will equate to True
, while other comparisons will equate to False
. Doing similar tests as the value equalities above, we see that only math.nan is math.nan
equates to True, while others equate to False.
>>> math.nan is float("nan")
False
>>> float("nan") is float("nan")
False
>>> math.nan is math.nan
True
Therefore, the difference of how is
behaves for math.nan
vs. the ==
comparison is why our original function worked with in
for comparisons to math.nan
but not float("nan")
.
What is math.nan
?#
I’d like to make a minor detour to show how Python can cause problems if you are not careful or there is a malicious actor. The source code for CPython shows that nan is added with PyModule_Add. PyModule_Add
is new in 3.13 based on the documentation but specifically acts similarly to PyModule_AddObjectRef which will “Add an object to module as name” per its documentation.
This means that unlike some other values, like False
, math.nan
is just a normal object. When the PyModule_Add
function is run, a new nan
float is created and assigned to math.nan
. This does mean that you can overwrite math.nan
like most other objects in a module!
Putting a simple line like math.nan = 42
would break any references to math.nan
in any locations that import and run that line of code. A quick manual run shows this behavior is not protected:
>>> math.nan = 42
>>> math.isnan(math.nan)
False
>>> math.nan == 42
True
Lesson of the day is that Python will let you sometimes do things that you shouldn’t. It could be very easy overwrite math.nan
(or any of the other “constants” in math
like e
, pi
, tau
, etc.) and cause very strange and unexpected behavior to later callers.
How to fix the code?#
As discussed earlier, there is the math.isnan
function. Similarly there is a math.isinf
that verifies if a number is positive or negative infinity. Instead of having to use two checks, it is possible to just use math.isfinite
which does both checks at the same time. This makes our old code easily able to map to a one line solution using all
, map
, and math.isfinite
to check all values easily and properly:
def fixed_sequence_values_are_finite(input_seq: Sequence[float]) -> bool:
"""
Given an input sequence of floats checks that none of the values
are infinity, negative infinity, or not a number (therefore finite).
Returns True if all values are finite, otherwise False.
"""
return all(map(math.isfinite, input_seq))
The tests are the same as original code and now all pass. The code and tests are all in fixed_validation_code.py
on GitHub.
Conclusion#
nan
is a special case that needs to be treated with care. Since nan
does not equal itself due to IEEE 754, comparisons must be done with math.isnan
or if checking for finite values, math.isfinite
.
It is also important to understand how in
not just checks value equality (==
) but also identity equality (is
). While nan
is a special case that does not equal itself, many other Python objects behave the same way. For example dataclasses
where eq
is False
and therefore do not have a valid __eq__
method could encounter the same type of error.