Extend The str_word_count Function In PHP

The str_word_count() function in PHP does exactly what is says it does. The default of this function is to simply count the number of words present. Take the following string.

$str = "This is a 'string' containing m0re than one word. This is a 'string' containing m0re than one word.";

If we pass this to the str_word_count() function with no other parameters we get the number of words.

echo str_word_count($str); // prints 20

The second parameter is the type of value returned from the function. The default value is 0, but 1 and 2 are also available. Using 1 as the second parameters returns an array containing all the words found inside the string. Using 2 returns an associative array, where the key is the numeric position of the word inside the string and the value is the actual word itself. Here are the results from setting the second parameter to 1.

print_r(str_word_count($str, 1));
/* prints
Array
(
 [0] => This
 [1] => is
 [2] => a
 [3] => 'string'
 [4] => containing
 [5] => m
 [6] => re
 [7] => than
 [8] => one
 [9] => word
 [10] => This
 [11] => is
 [12] => a
 [13] => 'string'
 [14] => containing
 [15] => m
 [16] => re
 [17] => than
 [18] => one
 [19] => word
)
*/

Here are the results from setting the second parameter to 2.

print_r(str_word_count($str, 2));
/* prints
Array
(
 [0] => This
 [5] => is
 [8] => a
 [10] => 'string'
 [19] => containing
 [30] => m
 [32] => re
 [35] => than
 [40] => one
 [44] => word
 [50] => This
 [55] => is
 [58] => a
 [60] => 'string'
 [69] => containing
 [80] => m
 [82] => re
 [85] => than
 [90] => one
 [94] => word
)
*/

The third parameter is a list of characters that should be considered as a word. Notice that the string contains the word "m0re", with the o being replaced by a 0. This function splits this into two words, the front part of "m" and the end part of "re". To force this function to use the zero as part of the word include it in a string as the third parameter.

print_r(str_word_count($str, 1, '0'));
/* prints
Array
(
 [0] => This
 [1] => is
 [2] => a
 [3] => 'string'
 [4] => containing
 [5] => m0re
 [6] => than
 [7] => one
 [8] => word
 [9] => This
 [10] => is
 [11] => a
 [12] => 'string'
 [13] => containing
 [14] => m0re
 [15] => than
 [16] => one
 [17] => word
)
*/

So how about extending this function? Well lets say that you wanted to only print off an extract of some text, you would use this function as part of another function, like this.

function limit_text($text,$limit)
{
  $text = strip_tags($text);
  $words = str_word_count($text, 2);
  $pos = array_keys($words);
  if ( count($words) > $limit ) {
    $text = trim(substr($text, 0, $pos[$limit])).'...';
  };
  return $text;
}

You can use this function in the following way.

echo limit_text($str, 12);
// prints  - This is a 'string' containing m0re than one word. This is...

This function is very useful if you wanted to create a script that produces an RSS feed, or displays the starting bit of text from web page on another page of a site.

If you wanted to count the number of times each word appears, maybe as part of a keyword density calculation, then use the following bit of code.

$wordfreq = array_count_values(str_word_count($str, 1, '0'));
print_r($wordfreq);
/*prints
Array
(
 [This] => 2
 [is] => 2
 [a] => 2
 ['string'] => 2
 [containing] => 2
 [m0re] => 2
 [than] => 2
 [one] => 2
 [word] => 2
)
*/

Comments

Let's say you have a string like this: "Exilim EX-H15" the function counts if i want to first 2 words "Exilim EX-H" It sees the numers as spaces
Permalink
You need to add numbers to your str_word_count() call so that it counts these characters as words (or parts of words). Chaning the str_word_count() function to the following should solve your issue.
$words = str_word_count($text, 2, '0123456789');
Name
Philip Norton
Permalink

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
4 + 6 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.