Pandas: Remove non-numeric rows in a DataFrame column

# Table of Contents

# Pandas: Remove non-numeric rows in a DataFrame column

To remove the non-numeric rows in a column in a Pandas DataFrame:

Use the pandas.to_numeric() method to convert the values in the column to numeric.
Set the errors argument to "coerce", so non-numeric values get set to NaN.
Remove the NaN values using DataFrame.notnull().

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'salary': [175.1, 'b', 190.3, 'd'],
    'experience': [10, 15, 20, 25]
})

print(df)

print('-' * 50)

only_numeric = df[
    pd.to_numeric(df['salary'], errors='coerce').notnull()
]

print(only_numeric)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
  first_name salary  experience
0      Alice  175.1          10
1      Bobby      b          15
2       Carl  190.3          20
3        Dan      d          25
--------------------------------------------------
  first_name salary  experience
0      Alice  175.1          10
2       Carl  190.3          20

remove non numeric rows in column in dataframe

The pandas.to_numeric() method converts the given argument to a numeric type.

We set the errors keyword argument to "coerce" so that values that cannot be parsed, get set as NaN.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'salary': [175.1, 'b', 190.3, 'd'],
    'experience': [10, 15, 20, 25]
})

# 0    175.1
# 1      NaN
# 2    190.3
# 3      NaN
# Name: salary, dtype: float64
print(pd.to_numeric(df['salary'], errors='coerce'))

The code for this article is available on GitHub

Notice that the 2 values in the "salary" column that cannot get converted to numeric values are set as NaN after calling to_numeric() are set to NaN.

The last step is to use the DataFrame.notnull() method to get a mask of boolean values that indicates whether an element is not an NA value.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'salary': [175.1, 'b', 190.3, 'd'],
    'experience': [10, 15, 20, 25]
})

# 0     True
# 1    False
# 2     True
# 3    False
# Name: salary, dtype: bool
print(pd.to_numeric(df['salary'], errors='coerce').notnull())

The method returns True if the value is not NA and False otherwise.

You can then use the boolean mask to select the numeric rows.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'salary': [175.1, 'b', 190.3, 'd'],
    'experience': [10, 15, 20, 25]
})

only_numeric = df[
    pd.to_numeric(df['salary'], errors='coerce').notnull()
]

#   first_name salary  experience
# 0      Alice  175.1          10
# 2       Carl  190.3          20
print(only_numeric)

# Pandas: Remove non-numeric rows in a column in DataFrame using `apply()`

You can also use the DataFrame.apply() method to remove the non-numeric rows in a column.

First, make sure that you have the numpy module installed.

shell

Copied!
pip install numpy

# or with pip3
pip3 install numpy

Now, import the module and use the isinstance function with DataFrame.apply().

main.py

Copied!
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'salary': [175.1, 'b', 190.3, 'd'],
    'experience': [10, 15, 20, 25]
})

only_numeric = df[df['salary'].apply(
    lambda x: isinstance(
        x, (int, np.int64, float, np.float64)))
]

#   first_name salary  experience
# 0      Alice  175.1          10
# 2       Carl  190.3          20
print(only_numeric)

remove non numeric rows in column using apply

The code for this article is available on GitHub

The DataFrame.apply() method applies a function along an axis of the DataFrame.

We used the isinstance() function to check if each value is numeric.

The resulting DataFrame only contains the "salary" rows that store numeric values.

The code sample considers numeric values ones that have a type of:

int
np.int64
float
np.float64

You can adjust this in the call to isinstance() depending on your use case.

# Pandas: Remove non-numeric rows in a column in DataFrame using `isnumeric()`

If you only consider integer values numeric, you can also use the str.isnumeric method.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'salary': [175.1, 180.5, 190.3, 203.3],
    'experience': [10, 'b', 20, 'd']
})

only_numeric = df[df['experience'].astype('str').str.isnumeric()]

#   first_name  salary experience
# 0      Alice   175.1         10
# 2       Carl   190.3         20
print(only_numeric)

only considering integer values numeric

The code for this article is available on GitHub

We used the DataFrame.astype method to convert the values in the "experience" column to strings.

The str.isnumeric method returns True if all characters in the string are numeric, and there is at least one character, otherwise False is returned.

Note that the str.isnumeric() method returns False for negative numbers (they contain a minus) and for floats (they contain a period).

main.py

Copied!
print('10'.isnumeric())  # 👉️ True
print('50'.isnumeric())  # 👉️ True

print('-100'.isnumeric())  # 👉️ False
print('3.14'.isnumeric())  # 👉️ False
print('A'.isnumeric())  # 👉️ False

This approach should only be used when you only consider integers to be numeric values.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Pandas: Remove non-numeric rows in a DataFrame column

# Table of Contents