How to convert unicode code point to the character (binary) with PHP
You want to display a unicode code point as the char it actually represent ?
For example, display for U+00CE the Î character it represent. (here is the list of all romanian special characters)
Quite strange, I didn’t find a ready to work code for that instantly (as I usually do) 😉
For our task, we will need the berlioz’s unicode2utf8 (that support 4 bytes utf, initialy I got a function that supported only 3 bytes and got errors, if you need 6 bytes support, see the Unicode_to_UTF function) function.
Here is the trick, unicode2utf8 requires as argument an integers, 00CE in our example (and the unicode notation) is hex, everything we need to do is to apply the hexdec function.
Php code:
echo unicode2utf8(hexdec("00CE")); // Result: Î // Or the function that will recognize U+ in front of string, and will skip it to show the character function unicodeCodePointToChar($str) { if (substr($str,0,2) != "U+") return $str; $str = substr($str,2); // Skip U+ return unicode_to_utf8(array(hexdec($str))); } echo unicodeCodePointToChar("U+00CE"); // Result: Î
Why I would ever need that, you will ask, well I need to implement sphinx’s charset_table convert logic on the user’s string. Here’s the map I used, kindly provided by someone on a pastebin.
So if a user search for “bălan”, he will actually find both “bălan” and “balan”.
Excellent confident synthetic eye just for fine detail and may foresee difficulties just before they will take place.