In this article, we will try to understand how to read a large text file using the fastest way, with less memory usage using Python.
To read large text files in Python, we can use the file object as an iterator to iterate over the file and perform the required task. Since the iterator just iterates over the entire file and does not require any additional data structure for data storage, the memory consumed is less comparatively. Also, the iterator does not perform expensive operations like appending hence it is time-efficient as well. Files are iterable in Python hence it is advisable to use iterators.
Problem with readline() method to read large text files
In Python, files are read by using the readlines() method. The readlines() method returns a list where each item of the list is a complete sentence in the file. This method is useful when the file size is small. Since readlines() method appends each line to the list and then returns the entire list it will be time-consuming if the file size is extremely large say in GB. Also, the list will consume a large chunk of the memory which can cause memory leakage if sufficient memory is unavailable.
Read large text files in Python using iterate
In this method, we will import fileinput module. The input() method of fileinput module can be used to read large files. This method takes a list of filenames and if no parameter is passed it accepts input from the stdin, and returns an iterator that returns individual lines from the text file being scanned.
Note: We will also use it to calculate the time taken to read the file using Python time.
Python3
# import module import fileinput import time #time at the start of program is noted start = time.time() #keeps a track of number of lines in the file count = 0 for lines in fileinput. input ([ 'sample.txt' ]): print (lines) count = count + 1 #time at the end of program execution is noted end = time.time() #total time taken to print the file print ( "Execution time in seconds: " ,(end - start)) print ( "No. of lines printed: " ,count) |
Output:
The fastest way to read a large text file using the iterator of a file object
Here, the only difference is that we will use the iterator of a file object. The open() function wraps the entire file into a file object. After that, we use an iterator to get the lines in the file object. We open the file in a ‘with’ block as it automatically closes the file as soon as the entire block executes.
Python3
import time start = time.time() count = 0 with open ( "sample.txt" ) as file : for line in file : print (line) count = count + 1 end = time.time() print ( "Execution time in seconds: " ,(end - start)) print ( "No of lines printed: " ,count) |
Output:
The time required in the second approach is comparatively less than the first method.