Detect an Unknown Language using Python

28 July 2024

1

The idea behind language detection is based on the detection of the character among the expression and words in the text. The main principle is to detect commonly used words like to, of in English. Python provides various modules for language detection. In this article, the modules covered are:

langdetect
textblob
langid

Method 1: Using langdetect library This module is a port of Google’s language-detection library that supports 55 languages. This module don’t come with Python’s standard utility modules. So, it is needed to be installed externally. To install this type the below command in the terminal.

pip install langdetect

Python3

# Python program to demonstrate
# langdetect
 
 
from langdetect import detect
 
 
# Specifying the language for
# detection
print(detect("GeeksforLazyroar is a computer science portal for Lazyroar"))
print(detect("GeeksforLazyroar - это компьютерный портал для гиков"))
print(detect("GeeksforLazyroar es un portal informático para Lazyroar"))
print(detect("GeeksforLazyroar是面向极客的计算机科学门户"))
print(detect("GeeksforLazyroar Lazyroar के लिए एक कंप्यूटर विज्ञान पोर्टल है"))
print(detect("GeeksforLazyroarは、ギーク向けのコンピューターサイエンスポータルです。"))

Output:

en
ru
es
no
hi
ja

Method 2: Using textblob library This module is used for natural language processing(NLP) tasks such as noun phrase extraction, sentiment analysis, classification, translation, and more. To install this module type the below command in the terminal. (‘ru’, -641.3409600257874)

pip install textblob

Example:

Python3

# Python program to demonstrate
# textblob
  
 
from textblob import TextBlob
  
 
L = ["GeeksforLazyroar is a computer science portal for Lazyroar",
    "GeeksforLazyroar - это компьютерный портал для гиков",
    "GeeksforLazyroar es un portal informático para Lazyroar",
    "GeeksforLazyroar是面向极客的计算机科学门户",
    "GeeksforLazyroar Lazyroar के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "GeeksforLazyroarは、ギーク向けのコンピューターサイエンスポータルです。",
    ]
 
for i in L:
     
    # Language Detection
    lang = TextBlob(i)
    print(lang.detect_language())

Output:

en
ru
es
zh-CN
hi
ja

Method 3: Using langid library This module is a standalone Language Identification tool. It is pre-trained over a large number of languages (currently 97). It is a single.py file with minimal dependencies. To install this type the below command in the terminal.

pip install langid

[src: https://github.com/saffsd/langid.py]

Example:

Python3

# Python program to demonstrate
# langid
 
 
import langid
 
 
L = ["GeeksforLazyroar is a computer science portal for Lazyroar",
    "GeeksforLazyroar - это компьютерный портал для гиков",
    "GeeksforLazyroar es un portal informático para Lazyroar",
    "GeeksforLazyroar是面向极客的计算机科学门户",
    "GeeksforLazyroar Lazyroar के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "GeeksforLazyroarは、ギーク向けのコンピューターサイエンスポータルです。",
    ]
 
for i in L:
     
    # Language detection
    print(langid.classify(i))

Output:

('en', -119.93012762069702)
('ru', -641.3409600257874)
('es', -191.01083326339722)
('zh', -199.18277835845947)
('hi', -286.99300467967987)
('ja', -875.6610476970673)

Detect an Unknown Language using Python

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

How To Install PHP 8.2 on Ubuntu 22.04|20.04|18.04

Recent Comments

EDITOR PICKS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR POSTS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR CATEGORY

ABOUT US

FOLLOW US