Last updated: Apr 13, 2024
Reading time·5 min

To compare two CSV files and print the differences in Python:
with open() statement to open the two CSV files.Here are the contents of the first CSV file, named csv-file-1.csv.
Alice,Smith,500 Bob,Smith,600 Carl,Smith,400
And here are the contents of the second CSV file, named csv-file-2.csv.
Alice,Smith,500 Bob,Smith,600 Dan,Smith,2500
Here is the related Python script, named main.py.
with open( 'csv-file-1.csv', encoding='utf-8' ) as file_1, open( 'csv-file-2.csv', encoding='utf-8' ) as file_2: file_1_lines = file_1.readlines() file_2_lines = file_2.readlines() with open( 'differences.csv', 'w', encoding='utf-8' ) as differences_file: for line in file_2_lines: if line not in file_1_lines: print(line) differences_file.write(line)

We used the with open() statement to open the two CSV files.
The with open() statement takes care of closing the files after we're done
(even if an exception has occurred).
The file.readlines() method reads the lines of the file into a Python list.
with open( 'csv-file-1.csv', encoding='utf-8' ) as file_1, open( 'csv-file-2.csv', encoding='utf-8' ) as file_2: file_1_lines = file_1.readlines() # 👇️ ['Alice,Smith,500\n', 'Bob,Smith,600\n', 'Carl,Smith,400'] print(file_1_lines) file_2_lines = file_2.readlines() # 👇️ ['Alice,Smith,500\n', 'Bob,Smith,600\n', 'Dan,Smith,2500\n'] print(file_2_lines)
Once we have the two lists of lines, we use the with open() statement to open
a differences.csv file in w (write) mode.
with open( 'differences.csv', 'w', encoding='utf-8' ) as differences_file: for line in file_2_lines: if line not in file_1_lines: print(line) differences_file.write(line)
The for loop iterates over the lines of
csv-file-2 and.
On each iteration, we use the not in operator to check if the current line is
NOT present in the lines of csv-file-1.
If the condition is met, we print the line and write it to the output CSV file.
You can open the differences.csv file (located in the same directory) to see
lines in csv-file-2 that are not present in csv-file-1.

csv-diff package to compare two CSV files in PythonYou can also use the csv-diff package to compare two CSV files in Python.
First, install the module by running the following command.
pip install csv-diff # or with pip3 pip3 install csv-diff
Here are the contents of the first CSV file, named csv-file-1.csv.
first,last,salary Alice,Smith,500 Bob,Smith,600 Carl,Smith,400
And here are the contents of the second CSV file, named csv-file-2.csv.
first,last,salary Alice,Smith,500 Bob,Smith,600 Dan,Smith,2500
Here is the related Python script, named main.py.
from csv_diff import load_csv, compare diff = compare( load_csv(open('csv-file-1.csv'), 'first'), load_csv(open('csv-file-2.csv'), 'first'), ) # {'added': [{'first': 'Dan', 'last': 'Smith', 'salary': '2500'}], 'removed': [{'first': 'Carl', 'last': 'Smith', 'salary': '400'}], 'changed': [], 'columns_added': [], 'columns_removed': []} print(diff) print('-' * 50) # [{'first': 'Dan', 'last': 'Smith', 'salary': '2500'}] print(diff['added']) print('-' * 50) # [{'first': 'Carl', 'last': 'Smith', 'salary': '400'}] print(diff['removed'])

You can also use the module via the command line.
The key parameter is the name of the column by which you want to compare.
csv-diff csv-file-1.csv csv-file-2.csv --key=first

You can also use the --json option if you want to get the output as JSON.
csv-diff csv-file-1.csv csv-file-2.csv --key=first --json

pandas package to compare two CSV files in PythonYou can also use the pandas package to compare two CSV files in Python.
First, make sure you
have the pandas module installed.
pip install pandas # or with pip3 pip3 install pandas
Here are the contents of the first CSV file, named csv-file-1.csv.
first,last,salary Alice,Smith,500 Bob,Smith,600 Carl,Smith,400
And here are the contents of the second CSV file, named csv-file-2.csv.
first,last,salary Alice,Smith,500 Bob,Smith,600 Dan,Smith,2500
Here is the related Python script, named main.py.
import pandas as pd df1 = pd.read_csv('csv-file-1.csv') print(df1) print('-' * 50) df2 = pd.read_csv('csv-file-2.csv') print(df2) print('-' * 50) df3 = df1.compare(df2) print(df3)
Running the code sample produces the following output.
first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Carl Smith 400 -------------------------------------------------- first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Dan Smith 2500 -------------------------------------------------- first salary self other self other 2 Carl Dan 400.0 2500.0

If your CSV files don't start with the column names, you can supply them when calling pandas.read_csv.
Here are the updated contents of the first CSV file, named csv-file-1.csv.
Alice,Smith,500 Bob,Smith,600 Carl,Smith,400
And here are the updated contents of the second CSV file, named
csv-file-2.csv.
Alice,Smith,500 Bob,Smith,600 Dan,Smith,2500
Here is the related Python code.
import pandas as pd df1 = pd.read_csv('csv-file-1.csv', names=['first', 'last', 'salary']) print(df1) print('-' * 50) df2 = pd.read_csv('csv-file-2.csv', names=['first', 'last', 'salary']) print(df2) print('-' * 50) df3 = df1.compare(df2) print(df3)
Running the code sample produces the following output.
first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Carl Smith 400 -------------------------------------------------- first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Dan Smith 2500 -------------------------------------------------- first salary self other self other 2 Carl Dan 400.0 2500.0

The code sample uses the pandas.read_csv() method to read the CSV files into
DataFrame objects and the
DataFrame.compare()
method to compare the two DataFrames.
The DataFrame.compare() method returns a DataFrame that shows the
differences stacked side by side.
The resulting DataFrame has a MultiIndex with self and other stacked
alternately at the inner level.
pandas package with DataFrame.merge()You can also use the pandas package with
DataFrame.merge()
to compare two CSV files.
Here are the contents of the first CSV file, named csv-file-1.csv.
Alice,Smith,500 Bob,Smith,600 Carl,Smith,400
And here are the contents of the second CSV file, named csv-file-2.csv.
first,last,salary Alice,Smith,500 Bob,Smith,600 Dan,Smith,2500
Here is the related Python script, named main.py.
import pandas as pd df1 = pd.read_csv('csv-file-1.csv') print(df1) print('-' * 50) df2 = pd.read_csv('csv-file-2.csv') print(df2) print('-' * 50) df3 = df1.merge(df2, indicator=True, how='outer') print(df3)
Running the code sample produces the following output.
first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Carl Smith 400 -------------------------------------------------- first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Dan Smith 2500 -------------------------------------------------- first last salary _merge 0 Alice Smith 500 both 1 Bob Smith 600 both 2 Carl Smith 400 left_only 3 Dan Smith 2500 right_only
The DataFrame.merge() method merges the DataFrame objects with a
database-style join.
The second to last row in the resulting DataFrame is only contained in df1
(left_only).
The last row is only contained in df2 (right_only).
You can only get the left_only and right_only rows with a filter.
import pandas as pd df1 = pd.read_csv('csv-file-1.csv') print(df1) print('-' * 50) df2 = pd.read_csv('csv-file-2.csv') print(df2) print('-' * 50) df3 = df1.merge(df2, indicator=True, how='outer') df3 = df3[df3['_merge'] != 'both'] print(df3)
Running the code sample produces the following output.
first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Carl Smith 400 -------------------------------------------------- first last salary 0 Alice Smith 500 1 Bob Smith 600 2 Dan Smith 2500 -------------------------------------------------- first last salary _merge 2 Carl Smith 400 left_only 3 Dan Smith 2500 right_only

You can learn more about the related topics by checking out the following tutorials: