Input contains infinity or value too large for dtype(float64)

avatar
Borislav Hadzhiev

Last updated: Apr 11, 2024
4 min

banner

# Input contains infinity or value too large for dtype(float64)

The article addresses the following 2 related errors:

  • ValueError: Input X contains infinity or a value too large for dtype('float64').
  • ValueError: Input X contains NaN.

The Python "ValueError: Input X contains infinity or a value too large for dtype('float64')" occurs when your matrix contains infinite or NaN values.

To solve the error, remove the infinite and NaN values from your matrix before the computation.

Here is an example of how the error occurs.

main.py
import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression # Contains inf and nan data = {'A': [1, np.inf, 3], 'B': [4, np.nan, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) X = df[['A', 'B']] y = df['C'] model = LinearRegression() # โ›”๏ธ ValueError: Input X contains infinity or a value too large for dtype('float64'). model.fit(X, y)

value error input contains infinity or value too large for dtype float64

The code for this article is available on GitHub

Notice that the DataFrame contains inf (infinite) and NaN (not a number) values.

main.py
import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression # Contains inf and nan data = {'A': [1, np.inf, 3], 'B': [4, np.nan, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) # A B C # 0 1.0 4.0 7 # 1 inf NaN 8 # 2 3.0 6.0 9 print(df)

dataframe contains infinite and nan values

The code for this article is available on GitHub

We have to remove the infinite and NaN values from the DataFrame before calling the fit() method.

You can check if your DataFrame contains NaN or inf values by using 2 methods:

main.py
import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression # Contains inf and nan data = {'A': [1, np.inf, 3], 'B': [4, np.nan, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) print(np.any(np.isnan(df))) # ๐Ÿ‘‰๏ธ True print(np.all(np.isfinite(df))) # ๐Ÿ‘‰๏ธ False
The code for this article is available on GitHub

The np.any(np.isnan(df)) method call will return True if the DataFrame contains at least one NaN value.

The np.all(np.isfinite(df)) method call will return False if the DataFrame contains at least one inf value.

# Remove the NaN and inf values from the DataFrame to solve the error

You can solve the error by removing the NaN and inf values from the DataFrame.

main.py
import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression data = {'A': [1, np.inf, 3], 'B': [4, np.nan, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) # Replace the `inf` values with `NaN` df.replace([np.inf, -np.inf], np.nan) # Drop the rows that have `NaN` values df.dropna(inplace=True) print(np.any(np.isnan(df))) # ๐Ÿ‘‰๏ธ False print(np.all(np.isfinite(df))) # ๐Ÿ‘‰๏ธ True # A B C # 0 1.0 4.0 7 # 2 3.0 6.0 9 print(df)

remove inf and nan values from dataframe

The code for this article is available on GitHub

I used the following 2 lines to remove the inf and NaN values from the DataFrame.

main.py
# Replace the `inf` values with `NaN` df.replace([np.inf, -np.inf], np.nan) # Drop the rows that have `NaN` values df.dropna(inplace=True)

The call to the replace() method replaces inf values with NaN.

The call to the dropna() method drops the rows that have NaN values.

As the following code sample shows, the DataFrame no longer contains any inf or NaN values.

main.py
print(np.any(np.isnan(df))) # ๐Ÿ‘‰๏ธ False print(np.all(np.isfinite(df))) # ๐Ÿ‘‰๏ธ True # A B C # 0 1.0 4.0 7 # 2 3.0 6.0 9 print(df)

Here is a complete example that demonstrates that the error has been resolved.

main.py
import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression data = {'A': [1, np.inf, 3], 'B': [4, np.nan, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) # Replace the `inf` values with `NaN` df.replace([np.inf, -np.inf], np.nan) # Drop the rows that have `NaN` values df.dropna(inplace=True) X = df[['A', 'B']] y = df['C'] model = LinearRegression() reg = model.fit(X, y) print(reg.score(X, y)) # ๐Ÿ‘‰๏ธ 1.0

removed inf and nan before computation

The code for this article is available on GitHub

You can also define a reusable function that removes the inf and NaN values from a DataFrame.

main.py
import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression data = {'A': [1, np.inf, 3], 'B': [4, np.nan, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) def remove_inf_nan(df): # Replace the `inf` values with `NaN` df.replace([np.inf, -np.inf], np.nan) # Drop the rows that have `NaN` values df.dropna(inplace=True) return df df = remove_inf_nan(df) print(np.any(np.isnan(df))) # ๐Ÿ‘‰๏ธ False print(np.all(np.isfinite(df))) # ๐Ÿ‘‰๏ธ True # A B C # 0 1.0 4.0 7 # 2 3.0 6.0 9 print(df)

The remove_inf_nan function takes a DataFrame as a parameter, removes the inf and NaN values and returns the result.

# Replacing the inf and NaN values

If you instead want to replace the inf and NaN values, use the DataFrame.fillna() method.

main.py
import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression data = {'A': [1, np.inf, 3], 'B': [4, np.nan, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) # Replace the `inf` values with `NaN` df.replace([np.inf, -np.inf], np.nan, inplace=True) # ๐Ÿ‘‡๏ธ Replace `NaN` values with 555 df.fillna(555, inplace=True) print(np.any(np.isnan(df))) # ๐Ÿ‘‰๏ธ False print(np.all(np.isfinite(df))) # ๐Ÿ‘‰๏ธ True # A B C # 0 1.0 4.0 7 # 1 555.0 555.0 8 # 2 3.0 6.0 9 print(df)
The code for this article is available on GitHub

We used the DataFrame.replace() method to replace the inf values with NaN.

We then used the DataFrame.fillna() method to replace the NaN values with the number 555.

The fillna() method fills NA/NaN values.

# Try to reset the index of your DataFrame

If the error persists, try to reset the index of your DataFrame before the computation.

main.py
df = df.reset_index()

The DataFrame.reset_index method resets the index of the given DataFrame or a level of it.

This often resolves the error if it was caused by removing entries from your DataFrame.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev