Character encoding is an essential aspect of handling text in various programming languages including PHP. Different character encodings such as UTF-8, ISO-8859-1, and ASCII represent characters differently. Finding the correct character encoding of text is important to ensure proper data processing and display.
In PHP, the mb_detect_encoding() function allows you to detect the character encoding of a given string. This function is part of the multibyte string extension (mbstring) which must be enabled in your PHP configuration.
Syntax:
mb_detect_encoding(
string $string,
array|string|null $encodings = null,
bool $strict = false
): string|false
Parameters:
$
str
: The input string for which you want to detect the encoding.$
encoding_list
: A list of character encodings to consider during the detection process. It can be a string or an array of encoding names. it uses the valuesmb_detect_order()
set in PHP.$
strict
: A boolean flag indicating whether to use strict mode for detecting the encoding. Ifstrict
is set to true & the function only returns encoding if it is confident in the result.
Return Values: The mb_detect_encoding()
function returns the detected character encoding of the input string. If no encoding is detected or the input string is empty, it returns false
.
Approach 1: Using Default Detection Order
The mb_detect_encoding()
function with the default detection order as specified in PHP.
PHP
<?php $text = "Hi, こんにちは, 你好, привет!" ; $encoding = mb_detect_encoding( $text ); echo "The Detected Encoding : " . $encoding ; ?> |
Output:
The Detected Encoding : UTF-8
Approach 2: Specifying Custom Encoding List
The mb_detect_encoding()
function with a custom list of character encodings to consider during the detection process.
PHP
<?php $text = "Hi, こんにちは, 你好, привет!" ; $encoding_list = [ "UTF-8" , "EUC-JP" , "GBK" ]; $encoding = mb_detect_encoding( $text , $encoding_list ); echo "The Detected Encoding : " . $encoding ; ?> |
Output:
The Detected Encoding : UTF-8.