Strip HTML from text strings with perl

In the age of ever-increasing spam, stripping html from text strings can be a very useful function, especially when processing form input. If you strip HTML code from form input, spammers will quickly give up.

This article will show you how to accomplish this task.

Strip HTML

This example will strip all HTML markup, including the text between tags.

$text = strip_html($text);

sub strip_html {

  my $string = shift;

  $string =~ s/<[^>]+>(.*)<[^>]+>//ig;
  $string =~ s/<[^>]+>//ig;

  return($string);

}

Strip HTML Tags

This example will strip HTML tags only, leaving the text between tags.

$text = strip_html_tags($text);

sub strip_html_tags {

  my $string = shift;

  $string =~ s/<[^>]+>//ig;

  return($string);

}

posted September 2, 2009 in Perl