I found a little problem today when processing a bit of text from a non-english site. I found that the text was being loaded properly, but because it was in UTF-8 encoding PHP couldn't use htmlspecialchars() or apply get_html_translation_table() to the string to properly encode the foreign characters. These methods just don't have any effect. This is because PHP (before version 5.2.x) doesn't natively support unicode character encoding and is therefore not able to translate characters in UTF-8 format.
To get around this just use the utf8_decode() function on the string to convert it into a usable format.
- // convert from uft8
- $string = utf8_decode($string);
- // translate HTML entities
- $trans = get_html_translation_table(HTML_ENTITIES);
- $string = strtr($string, $trans);
I hope this helps anyone having the same issue. Also, PHP6 will support unicode character encoding so this will probably have to be looked at again when PHP6 is released.