Last updated: Apr 12, 2024
Reading timeยท5 min

To select all columns starting with a given string in a Pandas DataFrame:
DataFrame.str.startswith() method to check if each column name starts with
the given string.import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) column_names = [column for column in df if column.startswith('Employee')] # ๐๏ธ ['Employee_ID', 'Employee_Name', 'Employee_Salary'] print(column_names) print('-' * 50) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])

We used a
list comprehension to
iterate over the DataFrame.
On each iteration, we use the str.startswith method to check if the current column name starts with a given string and return the result.
You can then use bracket notation if you only want to select the matching
columns from the DataFrame.
Note that the str.startswith() method is case-sensitive.
If you need to select all columns starting with a given string in a case-insensitive manner, convert the column name and the substring to lowercase.
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) column_names = [column for column in df if column.lower().startswith('EMPLOYEE'.lower())] # ๐๏ธ ['Employee_ID', 'Employee_Name', 'Employee_Salary'] print(column_names) print('-' * 50) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])
By converting each column name and the substring we are checking for to lowercase, we are able to perform a case-insensitive string comparison.
SeriesYou can also create a Series and use str.startswith to select all columns
that start with a given string.
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) column_names = df.columns[ pd.Series(df.columns).str.startswith('Employee') ] # ๐๏ธ Index(['Employee_ID', 'Employee_Name', 'Employee_Salary'], dtype='object') print(column_names) print('-' * 50) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])

We used the
pandas.Series()
method to create a Series from the columns of the DataFrame.
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) # 0 Employee_ID # 1 Employee_Name # 2 Age # 3 Employee_Salary # dtype: object print(pd.Series(df.columns))
We are then able to use the Series.str.startswith() method to check if the start of each string element matches a pattern.
You can use bracket notation to select the matching columns from the
DataFrame.
# Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df[column_names])
DataFrame.locYou can also use the DataFrame.loc indexer to select all columns starting with a given string.
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) df2 = df.loc[:, df.columns.str.startswith('Employee')] # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df2)
The DataFrame.loc indexer is primarily label-based but can also be used with a
boolean array.
We used the DataFrame.columns.str.startswith() method to get a boolean array
with True values for the matching column names.
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) # ๐๏ธ [ True True False True] print(df.columns.str.startswith('Employee'))
The df2 variable only stores the matching columns.
df2 = df.loc[:, df.columns.str.startswith('Employee')] # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df2)
DataFrame.filter()You can also use the DataFrame.filter() method to select all columns starting with a given string.
import pandas as pd df = pd.DataFrame({ 'Employee_ID': [1, 2, 3, 4], 'Employee_Name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'Age': [29, 30, 31, 32], 'Employee_Salary': [1500, 1000, 2000, 3000], }) # Employee_ID Employee_Name Employee_Salary # 0 1 Alice 1500 # 1 2 Bobby 1000 # 2 3 Carl 2000 # 3 4 Dan 3000 print(df.filter(regex=r'^Employee'))
The DataFrame.filter() method returns a subset of the DataFrame rows or
columns according to the specified index labels.
The regex argument can be set to a regular expression to only keep the labels
from the specified axis for which re.search(regex, label) == True.
The caret ^ matches the beginning of the input, so the regex matches column
names starting with "Employee".
You can learn more about the related topics by checking out the following tutorials: