Extend The str_word_count Function In PHP

22nd January 2008

The str_word_count() function in PHP does exactly what is says it does. The default of this function is to simply count the number of words present. Take the following string.

$str = "This is a 'string' containing m0re than one word. This is a 'string' containing m0re than one word.";

If we pass this to the str_word_count() function with no other parameters we get the number of words.

echo str_word_count($str); // prints 20

The second parameter is the type of value returned from the function. The default value is 0, but 1 and 2 are also available. Using 1 as the second parameters returns an array containing all the words found inside the string. Using 2 returns an associative array, where the key is the numeric position of the word inside the string and the value is the actual word itself. Here are the results from setting the second parameter to 1.

  1. /* prints
  2. Array
  3. (
  4.  [0] => This
  5.  [1] => is
  6.  [2] => a
  7.  [3] => 'string'
  8.  [4] => containing
  9.  [5] => m
  10.  [6] => re
  11.  [7] => than
  12.  [8] => one
  13.  [9] => word
  14.  [10] => This
  15.  [11] => is
  16.  [12] => a
  17.  [13] => 'string'
  18.  [14] => containing
  19.  [15] => m
  20.  [16] => re
  21.  [17] => than
  22.  [18] => one
  23.  [19] => word
  24. )
  25. */

Here are the results from setting the second parameter to 2.

  1. /* prints
  2. Array
  3. (
  4.  [0] => This
  5.  [5] => is
  6.  [8] => a
  7.  [10] => 'string'
  8.  [19] => containing
  9.  [30] => m
  10.  [32] => re
  11.  [35] => than
  12.  [40] => one
  13.  [44] => word
  14.  [50] => This
  15.  [55] => is
  16.  [58] => a
  17.  [60] => 'string'
  18.  [69] => containing
  19.  [80] => m
  20.  [82] => re
  21.  [85] => than
  22.  [90] => one
  23.  [94] => word
  24. )
  25. */

The third parameter is a list of characters that should be considered as a word. Notice that the string contains the word "m0re", with the o being replaced by a 0. This function splits this into two words, the front part of "m" and the end part of "re". To force this function to use the zero as part of the word include it in a string as the third parameter.

  1. print_r(str_word_count($str, 1, '0'));
  2. /* prints
  3. Array
  4. (
  5.  [0] => This
  6.  [1] => is
  7.  [2] => a
  8.  [3] => 'string'
  9.  [4] => containing
  10.  [5] => m0re
  11.  [6] => than
  12.  [7] => one
  13.  [8] => word
  14.  [9] => This
  15.  [10] => is
  16.  [11] => a
  17.  [12] => 'string'
  18.  [13] => containing
  19.  [14] => m0re
  20.  [15] => than
  21.  [16] => one
  22.  [17] => word
  23. )
  24. */

So how about extending this function? Well lets say that you wanted to only print off an extract of some text, you would use this function as part of another function, like this.

  1. function limit_text($text,$limit)
  2. {
  3. $text = strip_tags($text);
  4. $words = str_word_count($text, 2);
  5. $pos = array_keys($words);
  6. if ( count($words) > $limit ) {
  7. $text = trim(substr($text, 0, $pos[$limit])).'...';
  8. };
  9. return $text;
  10. }

You can use this function in the following way.

  1. echo limit_text($str, 12);
  2. // prints - This is a 'string' containing m0re than one word. This is...

This function is very useful if you wanted to create a script that produces an RSS feed, or displays the starting bit of text from web page on another page of a site.

If you wanted to count the number of times each word appears, maybe as part of a keyword density calculation, then use the following bit of code.

  1. $wordfreq = array_count_values(str_word_count($str, 1, '0'));
  2. print_r($wordfreq);
  3. /*prints
  4. Array
  5. (
  6.  [This] => 2
  7.  [is] => 2
  8.  [a] => 2
  9.  ['string'] => 2
  10.  [containing] => 2
  11.  [m0re] => 2
  12.  [than] => 2
  13.  [one] => 2
  14.  [word] => 2
  15. )
  16. */

Comments

Permalink
Let's say you have a string like this: "Exilim EX-H15" the function counts if i want to first 2 words "Exilim EX-H" It sees the numers as spaces

Anonymous (Wed, 08/11/2010 - 09:34)

Permalink
You need to add numbers to your str_word_count() call so that it counts these characters as words (or parts of words). Chaning the str_word_count() function to the following should solve your issue. $words = str_word_count($text, 2, '0123456789');

philipnorton42 (Wed, 08/11/2010 - 12:14)

Add new comment

The content of this field is kept private and will not be shown publicly.