Simple Swear Filter In PHP

Tuesday, September 30, 2008 - 09:49

Use the following function to filter out words from user input. It works by having a pre-set array of words that are to be excluded, this array is then looped through and each item is used to replace any instances of that word within the text. The regular expression uses the \b character class, which stands for any word boundary. This way you don't get the middle of words being filtered out when they are not meant to be.

By using the e of the preg_replace function it is possible to run PHP functions within the output. In this case we count the number of characters found in the replace and use this to create a string of stars (*) of equal length.

1
2
3
4
5
6
7
8
function filterwords($text){
 $filterWords = array('gosh','darn','poo');
 $filterCount = sizeof($filterWords);
 for($i=0; $i<$filterCount; $i++){
  $text = preg_replace('/\b'.$filterWords[$i].'\b/ie',"str_repeat('*',strlen('$0'))",$text);
 }
 return $text;
}

When the following text is run through this function.

echo filterwords('Darn, I have a mild form of torretts, poo!');

It produces the following result.

****, I have a mild form of torretts, ***!
Category: 
philipnorton42's picture

Philip Norton

Phil is the founder and administrator of #! code and is an IT professional working in the North West of the UK.
Google+ | Twitter

Comments

philipnorton42's picture
Submitted by philipnorton42 on Tue, 07/06/2010 - 09:55

True, although the solution in this post is completely free, whereas the WebPurify seems expensive for a very simple (and not all that important) web service.

Hey, really nice script, I can tell you've taken your time on it.

 

I was dealing with a word dictionary called badwords.txt I found on Google Code which had some words in like 'sh!t'. The played havok with your script until I escaped the charachters:

 

$text = preg_replace('/\b'.preg_quote($filterWords[$i]).'\b/ie',"str_repeat('*',strlen('$0'))",$text);

 

I agree, WebPurify is a step too far.

Hey Philip this is the only script that has worked as advertized! Thanks for that.
Now I have to sit back and work out how and why it works :)

How would I just drop the swear word from the array instead of replacing it with asterisks?

Quickest is to replace
$text = preg_replace('/\b'.$filterWords[$i].'\b/ie',"str_repeat('*',strlen('$0'))",$text);

with
$text = preg_replace('/\b'.$filterWords[$i].'\b/ie',"str_repeat('',strlen('$0'))",$text);

(removed the *)

Add new comment