unicode_literals in Python

27 July 2024

0

Unicode is also called Universal Character set. ASCII uses 8 bits(1 byte) to represents a character and can have a maximum of 256 (2^8) distinct combinations. The issue with the ASCII is that it can only support the English language but what if we want to use another language like Hindi, Russian, Chinese, etc. We didn’t have enough space in ASCII to covers up all these languages and emojis. This is where Unicode comes, Unicode provides us a huge table to which can store ASCII table and also the extent to store other languages, symbols, and emojis.

We actually can not save the text as Unicode directly. Because Unicode is just an abstract representation of the text data. We need some kind of encoding/mapping to map each character to a certain number. If a character uses more than 1 byte(8-bits), then all that bytes need to be packed as a single unit (think of a box with more than one item). This boxing method is called the UTF-8 method. In UTF-8 character can occupy a minimum of 8 bits and in UTF-16 a character can occupy a minimum of 16-bits. UTF is just an algorithm that turns Unicode into bytes and read it back

Normally, in python2 all string literals are considered as byte strings by default but in the later version of python, all the string literals are Unicode strings by default. So to make all the strings literals Unicode in python we use the following import :

from __future__ import unicode_literals

If we are using an older version of python, we need to import the unicode_literals from the future package. This import will make python2 behave as python3 does. This will make the code cross-python version compatible.

Python 2

Python

import sys
 
# checking the default encoding of string
print "The default encoding for python2 is:",
sys.getdefaultencoding()

Output:

The default encoding for python2 is: ascii

As in python2, the default encoding is ASCII we need to switch the encoding to utf-8.

Python

from __future__ import unicode_literals
 
# creating variables to holds
# the letters in python word.
p = "\u2119"
y = "\u01b4"
t = "\u2602"
h = "\u210c"
o = "\u00f8"
n = "\u1f24"
 
# printing Python
# encoding to utf-8 from ascii
print(p+y+t+h+o+n).encode("utf-8")

Output

ℙƴ☂ℌøἤ

Python3:

Python3

# In python3
# By default the encoding is "utf-8"
import sys
 
# printing the default encoding
print("The default encoding for python3 is:", sys.getdefaultencoding())
 
# to define string as unicode
# we need to prefix every string with u"...."
p = u"\u2119"
y = u"\u01b4"
t = u"\u2602"
h = u"\u210c"
o = u"\u00f8"
n = u"\u1f24"
 
# printing Python
print(p+y+t+h+o+n)

Output:

The default encoding for python3 is: utf-8
ℙƴ☂ℌøἤ

Here,

Sr. no.	Unicode	Description
1.	U+2119	it will display double-struck capital P
2.	U+01B4	it will display the Latin small letter Y with a hook.
3.	U+2602	it will display an umbrella.
4.	U+210C	it will display the capital letter H.
5.	U+00F8	it will display the Latin small letter O with a stroke.
6.	U+1F24	it will display the Greek letter ETA.

unicode_literals in Python

Python 2

Python

Python

Python3:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US