Convert HTML To ASCII With PHP

26th April 2008 - 2 minutes read time

The reverse of turning ASCII text into HTML is to convert HTML into ASCII. And to this end here is a little function that does this.

  1. function html2ascii($s) {
  2. // convert links
  3. $s = preg_replace('/<a\s+.*? href="?([^\">]*)"?[^>]*>(.*?)<\/a>/i','$2 ($1)',$s);
  4.  
  5. // convert p, br and hr tags
  6. $s = preg_replace('@<(b|h)r[^>]*>@i',"\n",$s);
  7. $s = preg_replace('@<p[^>]*>@i',"\n\n",$s);
  8. $s = preg_replace('@<div[^>]*>(.*)@i',"\n".'$1'."\n",$s);
  9.  
  10. // convert bold and italic tags
  11. $s = preg_replace('@<b[^>]*>(.*?)@i','*$1*',$s);
  12. $s = preg_replace('@<strong[^>]*>(.*?)@i','*$1*',$s);
  13. $s = preg_replace('@<i[^>]*>(.*?)@i','_$1_',$s);
  14. $s = preg_replace('@<em[^>]*>(.*?)@i','_$1_',$s);
  15.  
  16. // decode any entities
  17. $s = strtr($s,array_flip(get_html_translation_table(HTML_ENTITIES)));
  18.  
  19. // decode numbered entities
  20. $s = preg_replace('/&#(\d+);/e','chr(str_replace(";", "", str_replace("&#","","$0")))', $s);
  21.  
  22. // strip any remaining HTML tags
  23. $s = strip_tags($s);
  24.  
  25. // return the string
  26. return $s;
  27. }

To use this function just pass it a string. Here is an example of it at work.

  1. $htmlString = '<p>This is some <strong>XHTML</strong> markup that <em>will</em> be<br />
  2. turned <a href="http://www.hashbangcode.com/" title="#! code">into</a> an ascii string.</p>';
  3.  
  4. echo html2ascii($htmlString);

Produces the following output.

  1. This is some *XHTML* markup that _will_ be
  2. turned into (http://www.hashbangcode.com/) an ascii string

 

Comments

Permalink
I got error at in line 19 --> $s = preg_replace('//e','chr(\\1)',$s); Warning: Wrong parameter count for chr() in C:\PHP-test\xxxxx.php (??) : regexp code on line 1

Marsel (Fri, 11/21/2008 - 03:29)

Permalink
You are quite right, that would never work! I have updated the script with the fix.

Add new comment

The content of this field is kept private and will not be shown publicly.