How to resolve a UnicodeDecodeError for a CSV file in Python?

26 July 2024

6

Several errors can arise when an attempt to decode a byte string from a certain coding scheme is made. The reason is the inability of some encoding schemes to represent all code points. One of the most common errors during these conversions is UnicodeDecode Error which occurs when decoding a byte string by an incorrect coding scheme. This article will teach you how to resolve a UnicodeDecodeError for a CSV file in Python.

Why does the UnicodeDecodeError error arise?

The error occurs when an attempt to represent code points outside the range of the coding is made. To solve the issue, the byte string should be decoded using the same coding scheme in which it was encoded. i.e., The encoding scheme should be the same when the string is encoded and decoded.

For demonstration, the same error would be reproduced and then fixed. In the below code, firstly the character a (byte string) is decoded using ASCII encoding successfully. Then an attempt to decode the byte string a\xf1 is made, which led to an error. This is because the ASCII encoding standard only allows representation of the characters within the range 0 to 127. Any attempt to address a character outside this range would lead to the ordinal not-in-range error.

Python3

t = b"a".decode("ascii") 
  
# Produces error 
t1 = b"a\xf1".decode("ascii") 

Output:

Traceback (most recent call last):
 File "C:/Users/Sauleyayan/PycharmProjects/untitled1/venv/mad philes.py", line 5, in <module>
   t1 = b"a\xf1".decode("ascii")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf1 in position 1: ordinal not in range(128)

To rectify the error, an encoding scheme would be used that would be sufficient to represent the \xf1 code point. In this case, the unicode_escape coding scheme would be used:

Python3

t1 = b"a\xf1".decode("unicode_escape") 
  
print(t1)

Output:

añ

How to Resolve a UnicodeDecodeError for a CSV file

It is common to encounter the error mentioned above when processing a CSV file. This is because the CSV file may have a different encoding than the one used by the Python program. To fix such an error, the encoding used in the CSV file would be specified while opening the file. If the encoding standard of the CSV file is known, the Python interpreter could be instructed to use a specific encoding standard while that CSV file is being opened. This method is only usable if the encoding of the CSV is known.

To demonstrate the occurrence of the error, the following CSV file will be used:

resolve a UnicodeDecodeError for a CSV file in Python

The encoding of the CSV file is UTF-16

Generating UnicodeDecodeError for a CSV file

The following code attempts to open the CSV file for processing. The above code, upon execution, led to the following error:

Python3

import pandas as pd 
  
path = "test.csv"
  
# The following statement reads the csv file at the given path 
# While decoding the contents of the file in utf-8 decoding standard 
file = pd.read_csv(path) 
  
print(file.head()) 

Output:

Understanding the Problem

The error occurred as the read_csv method could not decode the contents of the CSV file by using the default encoding, UTF-8. This is because the encoding of the file is UTF-16. Hence the encoding of the CSV file needs to be mentioned while opening the CSV file to fix the error and allow the processing of the CSV file.

Solution

Firstly, the pandas‘ library is imported, and the path to the CSV file is specified. Then the program calls the read_csv function to read the contents of the CSV file specified by the path and also passes the encoding through which the CSV file must be decoded (UTF-16 in this case). Since the decoding scheme mentioned in the argument is the one with which the CSV file was originally encoded, the file gets decoded successfully.

Python3

import pandas as pd 
  
path = "test.csv"
  
# The following statement reads the csv file at the given path 
# While decoding the contents of the file in utf-8 decoding standard 
file = pd.read_csv(path, encoding="utf-16") 
  
# Displaying the contents 
print(file.head()) 

Output:

Alternate Method to Solve UnicodeDecodeError

Another way of resolving the issue is by changing the encoding of the CSV file itself. For that, firstly, open the CSV file as a text file (using notepad or Wordpad):

Now go to file and select Save as:

A prompt would appear, and from there, select the encoding option and change it to UTF-8 (the default for Python and pandas), and select Save.

Now the following code would run without errors

The code ran without errors. This is because the default encoding of the CSV file was changed to UTF-8 before opening it with pandas. Since the default encoding used by pandas is UTF-8, the CSV file opened without error.

Python3

import pandas as pd 
  
path = "test.csv"
  
# The following statement reads the csv file at the given path 
# While decoding the contents of the file in utf-8 decoding standard 
file = pd.read_csv(path) 
  
print(file.head()) 

Output:

How to resolve a UnicodeDecodeError for a CSV file in Python?

Why does the UnicodeDecodeError error arise?

Python3

Python3

How to Resolve a UnicodeDecodeError for a CSV file

Generating UnicodeDecodeError for a CSV file

Python3

Understanding the Problem

Python3

Alternate Method to Solve UnicodeDecodeError

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

I tried a Xiaomi mid-ranger for the first time in years, and I’m glad the Pixel 8a exists in the US

Recent Comments

EDITOR PICKS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR POSTS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR CATEGORY

ABOUT US

FOLLOW US