Reading specific columns from an Excel File in Pandas

# Table of Contents

# Reading specific columns from an Excel File in Pandas

You can use column indices or letters to read specific columns from an Excel file in Pandas.

The usecols argument can be set to a comma-separated string or a list containing the column identifying letters or the corresponding indices.

Suppose we have the following example.xlsx file.

starter example xlsx file

If you want to select only columns A, D and E, you would use the following code sample.

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols="A,D,E"
)

print(df)

The code for this article is available on GitHub

Notice that the columns we want to select are separated by commas.

The code sample produces the following output.

shell

Copied!
  first_name  age  experience
0      Alice   29           2
1      Bobby   30           3
2       Carl   31           4
3        Dan   32           5

read specific columns from excel file in pandas

Make sure you have the openpyxl module to be able to run the code sample.

shell

Copied!
pip install pandas openpyxl

pip3 install pandas openpyxl

We could've achieved the same result by using the column indices (and not identifying letters).

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols=[0, 3, 4]
)

print(df)

The code sample produces the following output.

shell

Copied!
  first_name  age  experience
0      Alice   29           2
1      Bobby   30           3
2       Carl   31           4
3        Dan   32           5

# Selecting a range of columns from an Excel file in Pandas

You can select a range of columns by setting the usecols argument to a string where the start and stop columns are separated by a colon :.

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols="A:B"
)

#   first_name last_name
# 0      Alice     Smith
# 1      Bobby      Hadz
# 2       Carl     Jones
# 3        Dan     Adams
print(df)

select range of columns

The code for this article is available on GitHub

Notice that the start and stop column names are inclusive.

This is the starting example.xlsx file.

starter example xlsx file

# Selecting a range of columns and individual columns

You can also select other columns by separating each slice/column with a comma.

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols="A:B,E"
)

#   first_name last_name  experience
# 0      Alice     Smith           2
# 1      Bobby      Hadz           3
# 2       Carl     Jones           4
# 3        Dan     Adams           5
print(df)

select range of columns and individual columns

The code for this article is available on GitHub

The code sample selects columns A and B (a slice) and the individual column E.

# Selecting multiple ranges of columns from an Excel file in Pandas

The same approach can be used to select multiple ranges of columns from an Excel file - you simply have to separate the column ranges by a comma ,.

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols="A:B,D:E"
)

#   first_name last_name  age  experience
# 0      Alice     Smith   29           2
# 1      Bobby      Hadz   30           3
# 2       Carl     Jones   31           4
# 3        Dan     Adams   32           5
print(df)

select multiple column ranges from excel file in pandas

The code for this article is available on GitHub

The code sample selects the column ranges A:B and D:E.

The column ranges are inclusive, so all columns in the range are read from the Excel file.

You can also specify individual columns when reading multiple column ranges from the file.

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols="A:B,C,D:E"
)

#   first_name last_name       date  age  experience
# 0      Alice     Smith 2023-01-05   29           2
# 1      Bobby      Hadz 2023-03-25   30           3
# 2       Carl     Jones 2021-01-24   31           4
# 3        Dan     Adams 2023-04-21   32           5
print(df)

select multiple column ranges and individual columns

# Using the `range()` class to read specific columns from an Excel file

You can also use the range class to read specific columns from an Excel file with Pandas.

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols=range(0, 2)
)

#   first_name last_name
# 0      Alice     Smith
# 1      Bobby      Hadz
# 2       Carl     Jones
# 3        Dan     Adams
print(df)

The code for this article is available on GitHub

However, when using the range() class, note that the stop index is exclusive (up to, but not including).

The code sample selects the columns at index 0 and 1 (Python indices are zero-based).

# Selecting specific columns from an Excel file by column names

You can also select specific columns by using their names.

The usecols argument can be set to an array of string, column names.

main.py

Copied!
import pandas as pd

file_path = 'example.xlsx'

df = pd.read_excel(
    file_path,
    usecols=['first_name', 'date', 'experience']
)

#   first_name       date  experience
# 0      Alice 2023-01-05           2
# 1      Bobby 2023-03-25           3
# 2       Carl 2021-01-24           4
# 3        Dan 2023-04-21           5
print(df)

select specific columns by name

The code for this article is available on GitHub

When specifying the names of the columns, make sure to use the correct case.

The names of the columns in the usecols list are case-sensitive and should match their names in your .xlsx file.

Some things to note about the usecols parameter:

It is set to None by default, which means that all columns are returned.
If it is set to a string, then it should be a comma-separated string of Excel column letters or ranges (e.g. A,B or A:C or A,B,E:F).
Column ranges are inclusive on both sides.
If the argument is set to a list of integers, then the integers are column indices (zero-based).
If the argument is set to a list of strings, then the strings are column names that you want to parse (e.g. first_name, last_name, etc).

If you get the error Usecols do not match columns, columns expected but not found, click on the link and follow the instructions.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Reading specific columns from an Excel File in Pandas

# Table of Contents