Pandas: Remove non-numeric rows in a DataFrame column

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: Remove non-numeric rows in a DataFrame column
  2. Pandas: Remove non-numeric rows in a column in DataFrame using apply()
  3. Pandas: Remove non-numeric rows in a column in DataFrame using isnumeric()

# Pandas: Remove non-numeric rows in a DataFrame column

To remove the non-numeric rows in a column in a Pandas DataFrame:

  1. Use the pandas.to_numeric() method to convert the values in the column to numeric.
  2. Set the errors argument to "coerce", so non-numeric values get set to NaN.
  3. Remove the NaN values using DataFrame.notnull().
main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) print(df) print('-' * 50) only_numeric = df[ pd.to_numeric(df['salary'], errors='coerce').notnull() ] print(only_numeric)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first_name salary experience 0 Alice 175.1 10 1 Bobby b 15 2 Carl 190.3 20 3 Dan d 25 -------------------------------------------------- first_name salary experience 0 Alice 175.1 10 2 Carl 190.3 20

remove non numeric rows in column in dataframe

The pandas.to_numeric() method converts the given argument to a numeric type.

We set the errors keyword argument to "coerce" so that values that cannot be parsed, get set as NaN.

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) # 0 175.1 # 1 NaN # 2 190.3 # 3 NaN # Name: salary, dtype: float64 print(pd.to_numeric(df['salary'], errors='coerce'))
The code for this article is available on GitHub

Notice that the 2 values in the "salary" column that cannot get converted to numeric values are set as NaN after calling to_numeric() are set to NaN.

The last step is to use the DataFrame.notnull() method to get a mask of boolean values that indicates whether an element is not an NA value.

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) # 0 True # 1 False # 2 True # 3 False # Name: salary, dtype: bool print(pd.to_numeric(df['salary'], errors='coerce').notnull())

The method returns True if the value is not NA and False otherwise.

You can then use the boolean mask to select the numeric rows.

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) only_numeric = df[ pd.to_numeric(df['salary'], errors='coerce').notnull() ] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)

# Pandas: Remove non-numeric rows in a column in DataFrame using apply()

You can also use the DataFrame.apply() method to remove the non-numeric rows in a column.

First, make sure that you have the numpy module installed.

shell
pip install numpy # or with pip3 pip3 install numpy

Now, import the module and use the isinstance function with DataFrame.apply().

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) only_numeric = df[df['salary'].apply( lambda x: isinstance( x, (int, np.int64, float, np.float64))) ] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)

remove non numeric rows in column using apply

The code for this article is available on GitHub

The DataFrame.apply() method applies a function along an axis of the DataFrame.

We used the isinstance() function to check if each value is numeric.

The resulting DataFrame only contains the "salary" rows that store numeric values.

The code sample considers numeric values ones that have a type of:

You can adjust this in the call to isinstance() depending on your use case.

# Pandas: Remove non-numeric rows in a column in DataFrame using isnumeric()

If you only consider integer values numeric, you can also use the str.isnumeric method.

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 180.5, 190.3, 203.3], 'experience': [10, 'b', 20, 'd'] }) only_numeric = df[df['experience'].astype('str').str.isnumeric()] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)

only considering integer values numeric

The code for this article is available on GitHub

We used the DataFrame.astype method to convert the values in the "experience" column to strings.

The str.isnumeric method returns True if all characters in the string are numeric, and there is at least one character, otherwise False is returned.

Note that the str.isnumeric() method returns False for negative numbers (they contain a minus) and for floats (they contain a period).

main.py
print('10'.isnumeric()) # ๐Ÿ‘‰๏ธ True print('50'.isnumeric()) # ๐Ÿ‘‰๏ธ True print('-100'.isnumeric()) # ๐Ÿ‘‰๏ธ False print('3.14'.isnumeric()) # ๐Ÿ‘‰๏ธ False print('A'.isnumeric()) # ๐Ÿ‘‰๏ธ False

This approach should only be used when you only consider integers to be numeric values.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev