Using pandas.read_csv() with multiple delimiters in Python

avatar
Borislav Hadzhiev

Last updated: Apr 11, 2024
5 min

banner

# Table of Contents

  1. Using pandas.read_csv() with multiple delimiters in Python
  2. Using pandas.read_csv() with multiple delimiters with a character class
  3. Specifying multiple delimiters when parsing CSV file in Python

# Using pandas.read_csv() with multiple delimiters in Python

Set the sep argument to a regular expression to use the pandas.read_csv() method with multiple delimiters.

The sep argument is used to specify the delimiter(s) that should be used when parsing the CSV file.

Suppose we have the following employees.csv file.

employees.csv
first_name,last_name,date Alice;Smith;2023-01-05 Bobby,Hadz,2023-03-25 Carl@Lemon@2021-01-24

And here is the related main.py script.

main.py
import pandas as pd df = pd.read_csv( 'employees.csv', sep=r',|;|@', encoding='utf-8', engine='python' ) # first_name last_name date # 0 Alice Smith 2023-01-05 # 1 Bobby Hadz 2023-03-25 # 2 Carl Lemon 2021-01-24 print(df)

using pandas read csv with multiple delimiters

The code for this article is available on GitHub

The pandas.read_csv() method reads a comma-separated values (CSV) file into a DataFrame.

We set the sep argument to a regular expression to be able to specify multiple delimiters when parsing the CSV file.

The CSV file in the example has 3 delimiters:

  • A comma ,.
  • A semicolon ;.
  • An @ symbol.
main.py
df = pd.read_csv( 'employees.csv', sep=r',|;|@', encoding='utf-8', engine='python' )

We prefixed the string with an r to mark it as a raw string.

The pipe | special character means OR, e.g. X|Y matches X or Y.

In its entirety, the regular expression means: "The separator is a comma ,, a semicolon ; or an @ symbol".

Some commonly used delimiters include:

  • a comma ,
  • a semicolon ;
  • a space
  • a tab character \t
  • a pipe |
Notice that we also set the engine keyword argument to "python".

If you forget to set the engine explicitly, you'd get a warning:

  • ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

The C engine doesn't support regex separators, so we should explicitly set the engine to "python" to solve the issue.

main.py
df = pd.read_csv( 'employees.csv', sep=r',|;|@', encoding='utf-8', engine='python' )

Let's look at a CSV file that also uses a space as the delimiter.

employees.csv
first_name,last_name,date Alice;Smith;2023-01-05 Bobby Hadz 2023-03-25 Carl@Lemon@2021-01-24

You can use the space approach to read the CSV file with a comma ,, a semicolon ;, a space and an @ symbol delimiters.

main.py
import pandas as pd df = pd.read_csv( 'employees.csv', sep=r',|;|@| ', encoding='utf-8', engine='python' ) # first_name last_name date # 0 Alice Smith 2023-01-05 # 1 Bobby Hadz 2023-03-25 # 2 Carl Lemon 2021-01-24 print(df)
The code for this article is available on GitHub

Notice that we added a space after the last pipe | character.

You can also use the special \s character to match whitespace characters.

main.py
df = pd.read_csv( 'employees.csv', sep=r',|;|@|\s+', encoding='utf-8', engine='python' )

The \s+ characters match one or more whitespace characters.

# Using pandas.read_csv() with multiple delimiters with a character class

You can also set the sep argument to a character class to use the pandas.read_csv() method with multiple delimiters.

Suppose we have the following employees.csv file.

employees.csv
first_name,last_name,date Alice;Smith;2023-01-05 Bobby Hadz 2023-03-25 Carl@Lemon@2021-01-24

Here is the related main.py file.

main.py
import pandas as pd df = pd.read_csv( 'employees.csv', sep=r'[ ,;@]', encoding='utf-8', engine='python' ) # first_name last_name date # 0 Alice Smith 2023-01-05 # 1 Bobby Hadz 2023-03-25 # 2 Carl Lemon 2021-01-24 print(df)
The code for this article is available on GitHub

Note that this approach should only be used when your delimiters only consist of a single character.

If your delimiters consist of multiple characters, use the approach from the previous subheading.

The square brackets [] syntax is called a character class and matches any of the characters between the brackets.

The example uses a space, a comma ,, a semicolon ; and an @ symbol as the delimiters.

# Specifying multiple delimiters when parsing CSV file in Python

If you need to specify multiple delimiters when parsing a CSV file in pure Python, without loading any third-party libraries, use the re.split method.

Here is the employee.csv file for the example.

employees.csv
first_name,last_name,date Alice;Smith;2023-01-05 Bobby Hadz 2023-03-25 Carl@Lemon@2021-01-24

And here is the related main.py file.

main.py
import re with open('employees.csv', 'r', encoding='utf-8') as csv_file: for line in csv_file: line_values = re.split(r',|;| |@', line) print(line_values)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
['first_name', 'last_name', 'date\n'] ['Alice', 'Smith', '2023-01-05\n'] ['Bobby', 'Hadz', '2023-03-25\n'] ['Carl', 'Lemon', '2021-01-24\n']

specify multiple delimiters when parsing csv file in python

We used the with open() statement to open the CSV file.

The with statement takes care of automatically closing the file even if an error occurs.

The next step is to use a for loop to iterate over the lines in the file.

On each iteration, we use the re.split() method to split each line string on multiple delimiters.

The re.split() method takes a pattern and a string and splits the string on each occurrence of the pattern.

You can use the pipe | character to separate your delimiters.

Alternatively, you can use a character class and specify your delimiters between the square brackets.

main.py
import re with open('employees.csv', 'r', encoding='utf-8') as csv_file: for line in csv_file: line_values = re.split(r'[,; @]', line) # ['first_name', 'last_name', 'date\n'] # ['Alice', 'Smith', '2023-01-05\n'] # ['Bobby', 'Hadz', '2023-03-25\n'] # ['Carl', 'Lemon', '2021-01-24\n'] print(line_values)
The code for this article is available on GitHub

The example uses a comma ,, a semicolon ;, a space and the @ symbol as delimiters.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.