Pandas: Strip whitespace from Column Headers in DataFrame

# Table of Contents

# Pandas: Strip whitespace from Column Headers in DataFrame

To strip the whitespace from the column headers in a Pandas DataFrame:

Use the DataFrame.rename() method to rename the columns of the DataFrame.
Set the columns argument to a lambda function that calls the str.strip() method on each column.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    '  name   ': ['Alice', 'Bobby', 'Carl', 'Dan'],
    ' experience   ': [1, 3, 5, 7],
    'salary    ': [175.1, 179.4, 190.3, 199.3],
})

print(df)

df = df.rename(columns=lambda x: x.strip())

print('-' * 50)

print(df)

The code for this article is available on GitHub

Running the code sample with python main.py produces the following output.

shell

Copied!
    name      experience     salary
0     Alice               1       175.1
1     Bobby               3       179.4
2      Carl               5       190.3
3       Dan               7       199.3
--------------------------------------------------
    name  experience  salary
0  Alice           1   175.1
1  Bobby           3   179.4
2   Carl           5   190.3
3    Dan           7   199.3

strip whitespace from headers in pandas dataframe

The DataFrame.rename() method is used to rename the columns or index labels of a DataFrame.

The columns parameter of the method can be set to a function.

main.py

Copied!
df = df.rename(columns=lambda x: x.strip())

The supplied lambda gets called with each column name and uses the str.strip() method to remove the leading and trailing whitespace.

main.py

Copied!
# 👇️ 'name'
print(repr('   name  '.strip()))

The str.strip method returns a copy of the string with the leading and trailing whitespace removed.

# Checking if each column name is a string before calling `str.strip()`

If your DataFrame column names might be of different types, you should check if each column name is of type str before calling str.strip().

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    '  name   ': ['Alice', 'Bobby', 'Carl', 'Dan'],
    ' experience   ': [1, 3, 5, 7],
    'salary    ': [175.1, 179.4, 190.3, 199.3],
})

# ['  name   ', ' experience   ', 'salary    ']
print(list(df.columns))

df = df.rename(columns=lambda x: x.strip()
               if isinstance(x, str) else x)

print('-' * 50)

# ['name', 'experience', 'salary']
print(list(df.columns))

The code for this article is available on GitHub

We used the isinstance() function to check if each column name is of type str.

If the condition is met, we return the result of calling str.strip() on the column name, otherwise, we return the column name as is.

# Pandas: Strip whitespace from Column Headers using `df.columns`

You can also use the DataFrame.columns attribute to strip the whitespace from the column headers in a DataFrame.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    '  name   ': ['Alice', 'Bobby', 'Carl', 'Dan'],
    ' experience   ': [1, 3, 5, 7],
    'salary    ': [175.1, 179.4, 190.3, 199.3],
})

print(df)

df.columns = df.columns.str.strip()

print('-' * 50)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name      experience     salary
0     Alice               1       175.1
1     Bobby               3       179.4
2      Carl               5       190.3
3       Dan               7       199.3
--------------------------------------------------
    name  experience  salary
0  Alice           1   175.1
1  Bobby           3   179.4
2   Carl           5   190.3
3    Dan           7   199.3

strip whitespace from headers in dataframe using df columns

The DataFrame.columns attribute returns the column labels of the DataFrame.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    '  name   ': ['Alice', 'Bobby', 'Carl', 'Dan'],
    ' experience   ': [1, 3, 5, 7],
    'salary    ': [175.1, 179.4, 190.3, 199.3],
})

# Index(['  name   ', ' experience   ', 'salary    '], dtype='object')
print(df.columns)

We called the str.strip() method on the Index object to strip the leading and trailing whitespace from each column name.

main.py

Copied!
df.columns = df.columns.str.strip()

You can use the DataFrame.columns attribute to print the columns before and after removing the leading and trailing whitespace.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    '  name   ': ['Alice', 'Bobby', 'Carl', 'Dan'],
    ' experience   ': [1, 3, 5, 7],
    'salary    ': [175.1, 179.4, 190.3, 199.3],
})

# ['  name   ', ' experience   ', 'salary    ']
print(list(df.columns))

df.columns = df.columns.str.strip()

print('-' * 50)

# ['name', 'experience', 'salary']
print(list(df.columns))

Using this approach should be a bit faster than using the lambda function from the previous subheading, in case you need to optimize for performance.

# Stripping whitespace from specific columns in a DataFrame

You can also call the str.strip() method on a specific column to strip the leading and trailing whitespace from the column's values.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 3, 5, 7],
    'salary    ': [175.1, 179.4, 190.3, 199.3],
    'tasks': ['    dev    ', '    test     ', 'ship       ', 'preview  ']
})

print(df)

df['name'] = df['name'].str.strip()

df['tasks'] = df['tasks'].str.strip()

print('-' * 50)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name  experience  salary              tasks
0  Alice           1       175.1        dev
1  Bobby           3       179.4      test
2   Carl           5       190.3    ship
3    Dan           7       199.3      preview
--------------------------------------------------
    name  experience  salary        tasks
0  Alice           1       175.1      dev
1  Bobby           3       179.4     test
2   Carl           5       190.3     ship
3    Dan           7       199.3  preview

strip whitespace from specific column values in pandas dataframe

We used square brackets to access specific columns that contain string values and then called the str.strip() method to strip the leading and trailing whitespace.

Make sure the column's values are of type str, otherwise, you'd get the error:

AttributeError: Can only use .str accessor with string values.

# Stripping initial spaces when using `pandas.read_csv`

If you need to strip the initial spaces when using pandas.read_csv, set the skipinitialspace argument to True.

Suppose we have the following employees.csv file.

employees.csv

Copied!
first_name,last_name,date
   Alice,Smith,2023-01-05
      Bobby,  Hadz,2023-03-25
 Carl,Lemon,  2021-01-24

Here is how you can skip the initial spaces by setting skipinitialspace to True.

main.py

Copied!
import pandas as pd


df = pd.read_csv('employees.csv', skipinitialspace=True)

#   first_name last_name        date
# 0      Alice     Smith  2023-01-05
# 1      Bobby      Hadz  2023-03-25
# 2       Carl     Lemon  2021-01-24
print(df)

The code for this article is available on GitHub

The argument makes it so the spaces after the delimiter are skipped.

# Stripping whitespace from specific columns in a DataFrame using `str.replace`

If you need to strip all whitespace from specific columns in a DataFrame, use the str.replace() method.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 3, 5, 7],
    'salary    ': [175.1, 179.4, 190.3, 199.3],
    'tasks': ['    dev    ', '    test     ', 'ship       ', 'preview  ']
})

print(df)

df['name'] = df['name'].str.replace(' ', '')

df['tasks'] = df['tasks'].str.replace(' ', '')

print('-' * 50)

print(df)

Running the code sample produces the following output.

shell

Copied!
    name  experience  salary              tasks
0  Alice           1       175.1        dev
1  Bobby           3       179.4      test
2   Carl           5       190.3    ship
3    Dan           7       199.3      preview
--------------------------------------------------
    name  experience  salary        tasks
0  Alice           1       175.1      dev
1  Bobby           3       179.4     test
2   Carl           5       190.3     ship
3    Dan           7       199.3  preview

strip all spaces from column values in pandas dataframe

Note that the str.replace method removes all spaces from the string values in the DataFrame, not just the leading and trailing ones (as str.strip() does).

# Stripping whitespace from a Pandas DataFrame with converters

You can also use converters to strip the whitespace from the column values in a Pandas DataFrame.

Suppose we have the following employees.csv file.

employees.csv

Copied!
first_name,last_name,date
   Alice,Smith,2023-01-05
      Bobby,  Hadz,2023-03-25
 Carl,Lemon,  2021-01-24

And here is the related main.py file.

main.py

Copied!
import pandas as pd


df = pd.read_csv('employees.csv', converters={
    'first_name': str.strip,
    'last_name': str.strip,
    'date': str.strip
})

#   first_name last_name        date
# 0      Alice     Smith  2023-01-05
# 1      Bobby      Hadz  2023-03-25
# 2       Carl     Lemon  2021-01-24
print(df)

The code for this article is available on GitHub

The converters argument can be set to a dictionary of column names pointing to functions.

Each column name points to the str.strip() function which will remove the leading and trailing whitespace when parsing the CSV file.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Pandas: Strip whitespace from Column Headers in DataFrame

# Table of Contents