Pandas: Find length of longest String in DataFrame column

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. Pandas: Find length of longest String in DataFrame column
  2. Finding the longest string in a DataFrame column in Pandas
  3. Find length of longest String in DataFrame column using map()
  4. Getting the length of the longest string in bytes
  5. Getting the index of the longest string in a DataFrame column and its value
  6. Get the maximum length of each column in a Pandas DataFrame

# Pandas: Find length of longest String in DataFrame column

To find the length of the longest string in a DataFrame column:

  1. Use bracket notation to access the column.
  2. Use the str.len() method to get the length of each value.
  3. Call the max() method on the result.
main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], }) print(df) print('-' * 50) print(df['A'].str.len().max()) # πŸ‘‰οΈ 3 print('-' * 50) print(df['B'].str.len().max()) # πŸ‘‰οΈ 4
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
A B 0 A BC 1 AB BCD 2 ABC BCDE -------------------------------------------------- 3 --------------------------------------------------

find length of longest string in dataframe column

We used bracket notation to get the column for which we want to find the max length.

The str.len() method computes the length of each value in the DataFrame.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], }) # 0 1 # 1 2 # 2 3 # Name: A, dtype: int64 print(df['A'].str.len())
The code for this article is available on GitHub

The last step is to use the Series.max() method to get the max length.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], }) # A B # 0 A BC # 1 AB BCD # 2 ABC BCDE print(df) print('-' * 50) print(df['A'].str.len().max()) # πŸ‘‰οΈ 3 print('-' * 50) print(df['B'].str.len().max()) # πŸ‘‰οΈ 4

# Finding the longest string in a DataFrame column in Pandas

Use the max() function if you need to find the longest string in a DataFrame column.

The function can be passed a key argument that specifies a one-argument ordering function like the one used for list.sort().

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], }) # A B # 0 A BC # 1 AB BCD # 2 ABC BCDE print(df) print('-' * 50) print(max(df['A'], key=len)) # πŸ‘‰οΈ ABC print('-' * 50) print(max(df['B'], key=len)) # πŸ‘‰οΈ BCDE

find longest string in dataframe column

The code for this article is available on GitHub

The max() function returns the largest item in an iterable or the largest of two or more arguments.

We set the key argument to the len() function to compare the length of the strings in the given column.

The len() function returns the length (the number of items) of an object.

The argument the function takes may be a sequence (a string, tuple, list, range or bytes) or a collection (a dictionary, set, or frozen set).

# Find length of longest String in DataFrame column using map()

You can also use the map() method to find the length of the longest string in a DataFrame column.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], }) print(df) print('-' * 50) print(df['A'].map(len).max()) # πŸ‘‰οΈ 3 print('-' * 50) print(df['B'].map(len).max()) # πŸ‘‰οΈ 4 print('-' * 50)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
A B 0 A BC 1 AB BCD 2 ABC BCDE -------------------------------------------------- 3 -------------------------------------------------- 4 --------------------------------------------------

find length of longest string in dataframe column using map

The map() method calls the supplied function with each item of the iterable.

We passed the len function to map, so the length of each string in the column is returned.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], }) # 0 1 # 1 2 # 2 3 # Name: A, dtype: int64 print(df['A'].map(len))

The last step is to call the max() method on the result.

main.py
print(df['A'].map(len).max()) # πŸ‘‰οΈ 3

# Getting the length of the longest string in bytes

In some cases, you might want to get the length of the longest string in bytes.

This is useful if the strings in your DataFrame contain non-ASCII characters that are represented by multiple bytes.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABCΓ€'], 'B': ['BC', 'BCD', 'BCDEΓΆ'], }) # πŸ‘‡οΈ 5 print(int(df['A'].str.encode('utf-8').str.len().max())) # πŸ‘‡οΈ 6 print(int(df['B'].str.encode('utf-8').str.len().max()))

find length of longest string in dataframe in bytes

The code for this article is available on GitHub

The str.encode() method returns an encoded version of the string as a bytes object.

We used the utf-8 encoding in the example because some of the strings in the DataFrame contain umlauts and they cannot be handled using the ASCII encoding.

# Getting the index of the longest string in a DataFrame column and its value

If you need to get the index of the longest string in a DataFrame column, use the DataFrame.idxmax() method.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], }) longest_index = df['A'].str.len().idxmax() print(longest_index) # πŸ‘‰οΈ 2 print(df['A'][longest_index]) # πŸ‘‰οΈ ABC

get index of longest string in dataframe column

The code for this article is available on GitHub

The idxmax() method returns the index of the first occurrence of the maximum value over the requested axis.

By default, the axis argument is set to 0 (row-wise).

Notice that indices are zero-based, so the index of the first value in the DataFrame column is zero and the index of the third is 2.

Once you have the index of the longest string in the column, use bracket notation to get the corresponding value.

# Get the maximum length of each column in a Pandas DataFrame

If you need to get the maximum length of each column in a DataFrame, use the numpy.vectorize() method.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], 'C': [1, 12, 12345], }) vfunc = np.vectorize(len) result = vfunc(df.values.astype(str)).max(axis=0) print(result) # πŸ‘‰οΈ [3 4 5]

get maximum length of each column in dataframe

The code for this article is available on GitHub

The numpy.vectorize() method returns an object that acts as a function but takes arrays as input.

Make sure you have the numpy module installed to be able to run the code sample.

shell
pip install numpy # or with pip3 pip3 install numpy

The code sample shows how to find the maximum length of each column.

If you need to only get the length of object columns, use the DataFrame.select_types() method.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], 'C': [1, 12, 12345], }) vfunc = np.vectorize(len) result = vfunc(df.select_dtypes( include=[object]).values.astype(str)).max(axis=0) print(result) # πŸ‘‰οΈ [3 4]

Notice that the max length of the int column ("C") is not included in the result.

If you need to construct a mapping that contains the column names and the maximum lengths, use the dict class.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], 'C': [1, 12, 12345], }) vfunc = np.vectorize(len) a_dict = dict(zip(df, vfunc(df.values.astype(str)).max(axis=0))) # πŸ‘‡οΈ {'A': 3, 'B': 4, 'C': 5} print(a_dict)
The code for this article is available on GitHub

The zip function iterates over several iterables in parallel and produces tuples with an item from each iterable.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': ['A', 'AB', 'ABC'], 'B': ['BC', 'BCD', 'BCDE'], 'C': [1, 12, 12345], }) vfunc = np.vectorize(len) # πŸ‘‡οΈ [('A', 3), ('B', 4), ('C', 5)] print(list(zip(df, vfunc(df.values.astype(str)).max(axis=0))))

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.