Get the column names of a NumPy ndarray in Python

avatar
Borislav Hadzhiev

Last updated: Apr 11, 2024
4 min

banner

# Table of Contents

  1. Get the column names of a NumPy ndarray in Python
  2. Adding column names to a plain NumPy ndarray
  3. Getting the column names of a Pandas DataFrame

# Get the column names of a NumPy ndarray in Python

Use the dtype.names attribute to get the column names of a NumPy ndarray in Python.

The dtype.names attribute returns a tuple of the field names of the ndarray.

main.py
import numpy as np data = np.genfromtxt( 'employees.txt', names=True, encoding='utf-8', delimiter=',', dtype=None, ) # [('Alice', 'Smith', '2023-01-05') ('Bobby', 'Hadz', '2023-03-25') print(data) # ('first_name', 'last_name', 'date') print(data.dtype.names)

get column names of numpy ndarray

The code sample assumes that you have the following employees.txt file in the same directory as your main.py script.

employees.txt
first_name,last_name,date Alice,Smith,2023-01-05 Bobby,Hadz,2023-03-25 Carl,Lemon,2021-01-24
The code for this article is available on GitHub

We used the numpy.genfromtext() method to load the data from the file into a NumPy ndarray.

main.py
data = np.genfromtxt( 'employees.txt', names=True, encoding='utf-8', delimiter=',', dtype=None, )

Each line past the first is split at the specified delimiter character (a , in the example).

We then used the dtype.names attribute to get the column names of the ndarray.

main.py
# ('first_name', 'last_name', 'date') print(data.dtype.names)

The dtype.names attribute returns a tuple of the field names of the array, or None if there are no field names.

You can use indexing to access specific column names in the tuple.

main.py
import numpy as np data = np.genfromtxt( 'employees.txt', names=True, encoding='utf-8', delimiter=',', dtype=None, ) # [('Alice', 'Smith', '2023-01-05') ('Bobby', 'Hadz', '2023-03-25') print(data) # ('first_name', 'last_name', 'date') print(data.dtype.names) print(data.dtype.names[0]) # ๐Ÿ‘‰๏ธ first_name print(data.dtype.names[1]) # ๐Ÿ‘‰๏ธ last_name print(data.dtype.names[2]) # ๐Ÿ‘‰๏ธ date

get specific column names of ndarray

The code for this article is available on GitHub

# Adding column names to a plain NumPy ndarray

If you need to add column names to a plain NumPy ndarray, use the unstructured_to_structured method.

main.py
import numpy as np import numpy.lib.recfunctions as rfn arr = np.array([[1, 2, 3], [4, 5, 6]]) new_arr = rfn.unstructured_to_structured( arr, np.dtype( [ ('Column_1', int), ('Column_2', int), ('Column_3', int) ] ) ) # ๐Ÿ‘‡๏ธ [(1, 2, 3) (4, 5, 6)] print(new_arr) # ๐Ÿ‘‡๏ธ ('Column_1', 'Column_2', 'Column_3') print(new_arr.dtype.names) print(new_arr['Column_1']) # ๐Ÿ‘‰๏ธ [1 4]

adding column names to plain numpy ndarray

The code for this article is available on GitHub

We defined a plain NumPy array and used the unstructured_to_structured() method to convert the unstructured array to a structured array.

We set the array's column names and used dtype.names to get a tuple containing the names.

As shown in the code sample, you can use square brackets to access the elements of the array by column names.

# Getting the column names of a Pandas DataFrame

If you need to read values from a CSV file and get the column names, you can also use the pandas module.

First, make sure you have the pandas module installed by running the following command from your terminal.

shell
pip install pandas # or with pip3 pip3 install pandas

Here is the employee.csv file for the example.

employees.csv
first_name,last_name,date Alice,Smith,2023-01-05 Bobby,Hadz,2023-03-25 Carl,Lemon,2021-01-24

And here is the related main.py file.

main.py
import pandas as pd df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8' ) # first_name last_name date # 0 Alice Smith 2023-01-05 # 1 Bobby Hadz 2023-03-25 # 2 Carl Lemon 2021-01-24 print(df) print('-' * 50) # ๐Ÿ‘‡๏ธ Index(['first_name', 'last_name', 'date'], dtype='object') print(df.columns) print('-' * 50) # ๐Ÿ‘‡๏ธ ['first_name', 'last_name', 'date'] print(df.columns.tolist())

get column names of pandas dataframe

The code for this article is available on GitHub

We used the pandas.read_csv() method to read the comma-separated CSV file into a DataFrame object.

You can then use the DataFrame.columns() method to get the column labels of the DataFrame.

If you need to convert the object to a list, use the tolist() method.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev