Adrenalin’s Experience

How to convert unicode code point to the character (binary) with PHP

Posted in Uncategorized by Adrenalin on October 18, 2009

You want to display a unicode code point as the char it actually represent ?

For example, display for U+00CE the Î character it represent. (here is the list of all romanian special characters)

Quite strange, I didn’t find a ready to work code for that instantly (as I usually do) 😉

For our task, we will need the berlioz’s unicode2utf8 (that support 4 bytes utf, initialy I got a function that supported only 3 bytes and got errors, if you need 6 bytes support, see the Unicode_to_UTF function) function.

Here is the trick, unicode2utf8 requires as argument an integers, 00CE in our example (and the unicode notation) is hex, everything we need to do is to apply the hexdec function.

Php code:

echo unicode2utf8(hexdec("00CE")); // Result: Î

// Or the function that will recognize U+ in front of string, and will skip it to show the character
function unicodeCodePointToChar($str) {
	if (substr($str,0,2) != "U+") return $str;
	$str = substr($str,2); // Skip U+
	return unicode_to_utf8(array(hexdec($str)));
}
echo unicodeCodePointToChar("U+00CE"); // Result: Î

Why I would ever need that, you will ask, well I need to implement sphinx’s charset_table convert logic on the user’s string. Here’s the map I used, kindly provided by someone on a pastebin.

So if a user search for “bălan”, he will actually find both “bălan” and “balan”.

Tagged with: , ,

One Response

Subscribe to comments with RSS.

  1. Anonymous said, on July 26, 2013 at 9:25 pm

    Excellent confident synthetic eye just for fine detail and may foresee difficulties just before they will take place.


Leave a reply to Anonymous Cancel reply