What To Do When get_html_translation_table() And htmlspecialchars() Doesn't Work

17th September 2008

I found a little problem today when processing a bit of text from a non-english site. I found that the text was being loaded properly, but because it was in UTF-8 encoding PHP couldn't use htmlspecialchars() or apply get_html_translation_table() to the string to properly encode the foreign characters. These methods just don't have any effect. This is because PHP (before version 5.2.x) doesn't natively support unicode character encoding and is therefore not able to translate characters in UTF-8 format.

To get around this just use the utf8_decode() function on the string to convert it into a usable format.

  1. // convert from uft8
  2. $string = utf8_decode($string);
  3.  
  4. // translate HTML entities
  5. $trans = get_html_translation_table(HTML_ENTITIES);
  6. $string = strtr($string, $trans);

I hope this helps anyone having the same issue. Also, PHP6 will support unicode character encoding so this will probably have to be looked at again when PHP6 is released.

Add new comment

The content of this field is kept private and will not be shown publicly.