Python has provided the methods to manipulate files that too in a very concise manner. In this article we are going to discuss one of the applications of the Python’s file handling features i.e. the comparison of files.
Files in use:
Method 1: Comparing complete file at once
Python supports a module called filecmp with a method filecmp.cmp() that returns three list containing matched files, mismatched files and errors regarding those files which could not be compared. This method can operate in two modes :
- shallow mode: where only metadata of the files are compared like the size, date modified, etc.
- deep mode: where the content of the files are compared.
Syntax:
cmp(a, b)
Parameters:
a and b are the two numbers in which the comparison is being done.
Returns:
- -1 if a<b
- 0 if a=b
- 1 if a>b
Program:
Python3
import filecmp f1 = "C:/Users/user/Documents/intro.txt" f2 = "C:/Users/user/Desktop/intro1.txt" # shallow comparison result = filecmp. cmp (f1, f2) print (result) # deep comparison result = filecmp. cmp (f1, f2, shallow = False ) print (result) |
Output:
False
False
Method 2: Comparing files line by line
The drawback in the above approach is that we can not retrieve the lines where the files differ. Though this is an optional requirement we often want to watch out for the lines where files differ and then manipulate that to our advantage. The basic approach to implement this is to store each line of every file in separate lists one for each file. These lists are compared against each other two files at a time.
Approach:
- Open the files to be compared
- Loop through the files and compare each line of the two files.
- If lines are identical, output SAME on the output screen.
- Else, output the differing lines from both the files on the output screen.
Program:
Python3
# reading files f1 = open ( "C:/Users/user/Documents/intro.txt" , "r" ) f2 = open ( "C:/Users/user/Desktop/intro1.txt" , "r" ) f1_data = f1.readlines() f2_data = f2.readlines() i = 0 for line1 in f1_data: i + = 1 for line2 in f2_data: # matching line1 from both files if line1 = = line2: # print IDENTICAL if similar print ( "Line " , i, ": IDENTICAL" ) else : print ( "Line " , i, ":" ) # else print that line from both files print ( "\tFile 1:" , line1, end = '') print ( "\tFile 2:" , line2, end = '') break # closing files f1.close() f2.close() |
Output:
Method 3: Comparing complete directory
Python supports a module called filecmp with a method filecmp.cmpfiles() that returns three list containing matched files, mismatched files and errors regarding those files which could not be compared. It is similar to first approach but it is used to compare files in two different directories.
Program:
Python3
import filecmp d1 = "C:/Users/user/Documents/" d2 = "C:/Users/user/Desktop/" files = [ 'intro.txt' ] # shallow comparison match, mismatch, errors = filecmp.cmpfiles(d1, d2, files) print ( 'Shallow comparison' ) print ( "Match:" , match) print ( "Mismatch:" , mismatch) print ( "Errors:" , errors) # deep comparison match, mismatch, errors = filecmp.cmpfiles(d1, d2, files, shallow = False ) print ( 'Deep comparison' ) print ( "Match:" , match) print ( "Mismatch:" , mismatch) print ( "Errors:" , errors) |
Output:
Shallow Comparison
Match: [ ]
Mismatch: [ ‘ intro.txt ‘]
Errors: [ ]
Deep comparison
Match: []
Mismatch: [ ‘ intro.txt ‘]
Errors: [ ]