Last updated: Apr 12, 2024
Reading timeยท4 min
To remove the non-numeric rows in a column in a Pandas DataFrame
:
pandas.to_numeric()
method to convert the values in the column to
numeric.errors
argument to "coerce"
, so non-numeric values get set to
NaN
.NaN
values using DataFrame.notnull()
.import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) print(df) print('-' * 50) only_numeric = df[ pd.to_numeric(df['salary'], errors='coerce').notnull() ] print(only_numeric)
Running the code sample produces the following output.
first_name salary experience 0 Alice 175.1 10 1 Bobby b 15 2 Carl 190.3 20 3 Dan d 25 -------------------------------------------------- first_name salary experience 0 Alice 175.1 10 2 Carl 190.3 20
The pandas.to_numeric() method converts the given argument to a numeric type.
We set the errors
keyword argument to "coerce"
so that values that cannot be
parsed, get set as NaN
.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) # 0 175.1 # 1 NaN # 2 190.3 # 3 NaN # Name: salary, dtype: float64 print(pd.to_numeric(df['salary'], errors='coerce'))
Notice that the 2 values in the "salary"
column that cannot get converted to
numeric values are set as NaN
after calling to_numeric()
are set to NaN
.
The last step is to use the DataFrame.notnull() method to get a mask of boolean values that indicates whether an element is not an NA value.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) # 0 True # 1 False # 2 True # 3 False # Name: salary, dtype: bool print(pd.to_numeric(df['salary'], errors='coerce').notnull())
The method returns True
if the value is not NA and False
otherwise.
You can then use the boolean mask to select the numeric rows.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) only_numeric = df[ pd.to_numeric(df['salary'], errors='coerce').notnull() ] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)
apply()
You can also use the DataFrame.apply() method to remove the non-numeric rows in a column.
First, make sure that you have the numpy module installed.
pip install numpy # or with pip3 pip3 install numpy
Now, import the module and use the
isinstance function with
DataFrame.apply()
.
import pandas as pd import numpy as np df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) only_numeric = df[df['salary'].apply( lambda x: isinstance( x, (int, np.int64, float, np.float64))) ] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)
The DataFrame.apply()
method applies a function along an axis of the
DataFrame
.
We used the isinstance()
function to check if each value is numeric.
The resulting DataFrame
only contains the "salary"
rows that store numeric
values.
The code sample considers numeric values ones that have a type of:
int
float
np.float64
You can adjust this in the call to isinstance()
depending on your use case.
isnumeric()
If you only consider integer values numeric, you can also use the str.isnumeric method.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 180.5, 190.3, 203.3], 'experience': [10, 'b', 20, 'd'] }) only_numeric = df[df['experience'].astype('str').str.isnumeric()] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)
We used the
DataFrame.astype
method to convert the values in the "experience"
column to strings.
The str.isnumeric
method returns True
if all characters in the string are
numeric, and there is at least one character, otherwise False
is returned.
Note that the str.isnumeric()
method returns False
for negative numbers
(they contain a minus) and for floats (they contain a period).
print('10'.isnumeric()) # ๐๏ธ True print('50'.isnumeric()) # ๐๏ธ True print('-100'.isnumeric()) # ๐๏ธ False print('3.14'.isnumeric()) # ๐๏ธ False print('A'.isnumeric()) # ๐๏ธ False
This approach should only be used when you only consider integers to be numeric values.
You can learn more about the related topics by checking out the following tutorials: