Select all Columns starting with a given String in Pandas

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. Select all Columns starting with a given String in Pandas
  2. Select all Columns starting with a given String by creating a Series
  3. Select all Columns starting with a given String using DataFrame.loc
  4. Select all Columns starting with a given String using DataFrame.filter()

# Select all Columns starting with a given String in Pandas

To select all columns starting with a given string in a Pandas DataFrame:

  1. Use a list comprehension to iterate over the DataFrame.
  2. Use the str.startswith() method to check if each column name starts with the given string.
  3. Optionally use bracket notation to select the matching columns.
main.py
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) column_names = [column for column in df if column.startswith('Employee')] # ๐Ÿ‘‡๏ธ ['Employee_ID', 'Employee_Name', 'Employee_Salary'] print(column_names) print('-' * 50) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])

select all columns starting with given string

The code for this article is available on GitHub

We used a list comprehension to iterate over the DataFrame.

List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.startswith method to check if the current column name starts with a given string and return the result.

You can then use bracket notation if you only want to select the matching columns from the DataFrame.

Note that the str.startswith() method is case-sensitive.

If you need to select all columns starting with a given string in a case-insensitive manner, convert the column name and the substring to lowercase.

main.py
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) column_names = [column for column in df if column.lower().startswith('EMPLOYEE'.lower())] # ๐Ÿ‘‡๏ธ ['Employee_ID', 'Employee_Name', 'Employee_Salary'] print(column_names) print('-' * 50) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])
The code for this article is available on GitHub

By converting each column name and the substring we are checking for to lowercase, we are able to perform a case-insensitive string comparison.

# Select all Columns starting with a given String by creating a Series

You can also create a Series and use str.startswith to select all columns that start with a given string.

main.py
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) column_names = df.columns[ pd.Series(df.columns).str.startswith('Employee') ] # ๐Ÿ‘‡๏ธ Index(['Employee_ID', 'Employee_Name', 'Employee_Salary'], dtype='object') print(column_names) print('-' * 50) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])

select all columns starting with string by creating series

The code for this article is available on GitHub

We used the pandas.Series() method to create a Series from the columns of the DataFrame.

main.py
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) # 0 Employee_ID # 1 Employee_Name # 2 Age # 3 Employee_Salary # dtype: object print(pd.Series(df.columns))

We are then able to use the Series.str.startswith() method to check if the start of each string element matches a pattern.

You can use bracket notation to select the matching columns from the DataFrame.

main.py
# Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])

# Select all Columns starting with a given String using DataFrame.loc

You can also use the DataFrame.loc indexer to select all columns starting with a given string.

main.py
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) df2 = df.loc[:, df.columns.str.startswith('Employee')] # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df2)
The code for this article is available on GitHub

The DataFrame.loc indexer is primarily label-based but can also be used with a boolean array.

We used the DataFrame.columns.str.startswith() method to get a boolean array with True values for the matching column names.

main.py
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) # ๐Ÿ‘‡๏ธ [ True True False True] print(df.columns.str.startswith('Employee'))

The df2 variable only stores the matching columns.

main.py
df2 = df.loc[:, df.columns.str.startswith('Employee')] # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df2)

# Select all Columns starting with a given String using DataFrame.filter()

You can also use the DataFrame.filter() method to select all columns starting with a given string.

main.py
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df.filter(regex=r'^Employee'))
The code for this article is available on GitHub

The DataFrame.filter() method returns a subset of the DataFrame rows or columns according to the specified index labels.

The regex argument can be set to a regular expression to only keep the labels from the specified axis for which re.search(regex, label) == True.

The caret ^ matches the beginning of the input, so the regex matches column names starting with "Employee".

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev