PHP preg_replace – some useful regular expressions

April 22, 2009

There loads of these all over the place, but here’s some useful preg_replace examples for text and html processing that were hard to find or I ended up writing  – use/praise/embellish/flame as you see fit.

Remove repeated words (case insensitive)

$text = preg_replace("/\s(\w+\s)\1/i", "$1", $text);

‘Keep your your head’ becomes ‘Keep your head’

 Remove repeated punctuation

$text = preg_replace("/\.+/i", ".", $text); 

 ‘Keep your head…’ becomes ‘Keep your head.’ Don’t forget to escape regex characters.

Clean up a sentence end that has no trailing space

$text = preg_replace("/\.(?! )/i", ". ", $text);

‘Keep your head.Don’t fall apart’ becomes ‘Keep your head. Don’t fall apart’  This uses lookahead.

Remove carriage returns, line feeds and tabs

$text = str_replace(array("\r\n", "\r", "\n", "\t"), '', $text);

An oldy but goody.

Get all image urls from an html document

$images = array();
preg_match_all('/(img|src)\=(\"|\')[^\"\'\>]+/i', $data, $media);
unset($data);
$data=preg_replace('/(img|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
foreach($data as $url)
{
	$info = pathinfo($url);
	if (isset($info['extension']))
	{
		if (($info['extension'] == 'jpg') || 
		($info['extension'] == 'jpeg') || 
		($info['extension'] == 'gif') || 
		($info['extension'] == 'png'))
		array_push($images, $url);
	}
}
Puts all the image URLs in an array

Strip non printable characters

$text = preg_replace("/[^[:print:]]+/", "", $text);

Does what it says on the tin

Remove HTML tags

$text = preg_replace
	(
	array(
	// Remove invisible content
	'@<head[^>]*?>.*?</head>@siu',
	'@<style[^>]*?>.*?</style>@siu',
	'@<script[^>]*?.*?</script>@siu',
	'@<object[^>]*?.*?</object>@siu',
	'@<embed[^>]*?.*?</embed>@siu',
	'@<applet[^>]*?.*?</applet>@siu',
	'@<noframes[^>]*?.*?</noframes>@siu',
	'@<noscript[^>]*?.*?</noscript>@siu',
	'@<noembed[^>]*?.*?</noembed>@siu',
	// Add line breaks before & after blocks
	'@<((br)|(hr))@iu',
	'@</?((address)|(blockquote)|(center)|(del))@iu',
	'@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
	'@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
	'@</?((table)|(th)|(td)|(caption))@iu',
	'@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
	'@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
	'@</?((frameset)|(frame)|(iframe))@iu',),
	array(
	' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
	"\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0",
	"\n\$0", "\n\$0",),$text
	)
// Remove all remaining tags and comments and return.
$text = strip_tags( $text );

Ok, so strip_tags sort of does this, but fails to remove script, style etc etc.

Advertisements

14 Responses to “PHP preg_replace – some useful regular expressions”


  1. You’re so interesting! I do not believe I’ve truly read a single thing like this before.
    So good to find another person with some unique thoughts on this subject.
    Seriously.. thanks for starting this up. This site is something that is needed
    on the internet, someone with some originality!


  2. I’m truly enjoying the design and layout of your blog. It’s a very easy on the eyes which makes it
    much more enjoyable for me to come here and visit more often.
    Did you hire out a developer to create your theme? Outstanding work!


  3. I think the admin of this site is truly working hard in favor
    of his web site, because here every material is quality based data.


  4. Good post. I learn something totally new and challenging on sites I stumbleupon
    on a daily basis. It’s always exciting to read articles from other writers and practice a little something from other web sites.

  5. Lenard Says:

    You’re so cool! I don’t believe I have read anything like that before.
    So great to discover somebody with unique thoughts on
    this topic. Seriously.. many thanks for starting this up.
    This website is something that is needed on the internet, someone with a little originality!


  6. […] » Fonte Remover palavras repetidas (case insensitive) […]


  7. […] »  Fuente Eliminar palabras repetidas (mayúsculas y minúsculas) […]


  8. […] »  Source Supprimer les mots répétés (insensible à la casse) […]


  9. […] expressions tester that allows visitors to construct, test, and optimize regular expressions.PHP preg_replace – some useful regular expressions « Aliens ate my GUIThere loads of these all over the place, but here’s some useful preg_replace examples for text […]


  10. Hi there, its good piece of writing concerning media print,
    we all understand media is a great source of data.


  11. Hello colleagues, good article and pleasant arguments commented here, I am actually enjoying by these.


  12. I’m curious to find out what blog platform you’re utilizing?
    I’m having some minor security issues with my latest blog
    and I would like to find something more secure.
    Do you have any recommendations?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: