Convert Unicode to ASCII in Python

26 July 2024

4

Unicode is the universal character set and a standard to support all the world’s languages. It contains 140,000+ characters used by 150+ scripts along with various symbols. ASCII on the other hand is a subset of Unicode and the most compatible character set, consisting of 128 letters made of English letters, digits, and punctuation, with the remaining being control characters. This article deals with the conversion of a wide range of Unicode characters to a simpler ASCII representation using the Python library anyascii.

The text is converted from character to character. The mappings for each script are based on conventional schemes. Symbolic characters are converted based on their meaning or appearance. If the input contains ASCII characters, they are untouched, the rest are all tried to be converted to ASCII. Unknown characters are removed.

Installation:

To install this module type the below command in the terminal.

pip install anyascii

Example 1: Working with Several languages

In this, various different languages like Unicode are set as input, and output is given as converted ASCII characters.

Python3

from anyascii import anyascii
 
# checking for Hindi script
hindi_uni = anyascii('नमस्ते विद्यार्थी')
 
print("The translation from hindi Script : "
      + str(hindi_uni))
 
# checking for Punjabi script
pun_uni = anyascii('ਸਤਿ ਸ੍ਰੀ ਅਕਾਲ')
 
print("The translation from Punjabi Script : "
      + str(pun_uni))

Output :

The translation from hindi Script : nmste vidyarthi
The translation from Punjabi Script : sti sri akal

Example 2: Working with Unicode Emojis and Symbols

This library also handles working with emojis and symbols, which are generally Unicode representations.

from anyascii import anyascii

# working with emoji example
emoji_uni = anyascii('???? ???? ????')

print("The ASCII from emojis : "
      + str(emoji_uni))

# checking for Symbols
sym_uni = anyascii('➕ ☆ ℳ')

print("The ASCII from Symbols : "
      + str(sym_uni))

Output:

The ASCII from emojis : :sunglasses: :crown: :apple:
The ASCII from Symbols : :heavy_plus_sign: * M

Using the iconv Utility:

Approach:

The iconv utility is a system command-line tool that can convert text from one character encoding to another. You can use the subprocess module to call the iconv utility from Python.

Python3

import subprocess
 
unicode_string = "Héllo, Wörld!"
process = subprocess.Popen(['iconv', '-f', 'utf-8', '-t', 'ascii//TRANSLIT'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
output, error = process.communicate(input=unicode_string.encode())
 
ascii_string = output.decode()
 
print(ascii_string)

Output

Hello, World!

Time Complexity: O(n)
Auxiliary Space: O(n)

Convert Unicode to ASCII in Python

Installation:

Python3

Using the iconv Utility:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

PureVPN vs. Private Internet Access 2025: Which Is Better? by Gjurgjica Panova

Recent Comments

EDITOR PICKS

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

POPULAR POSTS

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

POPULAR CATEGORY

ABOUT US

FOLLOW US