Computing.Net > Forums > Programming > problem with regexp

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

problem with regexp

Reply to Message Icon

Name: crippa
Date: December 17, 2003 at 03:50:28 Pacific
OS: XP
CPU/Ram: some
Comment:

Hi!

Im having som problem with a regexp that should match all html tags with some exeptions.

I can match all tags with somthing like:
(<[^>]*>) or maby even (<.*?>)

I can also math a list of tags like:
(<(span|div|p)>)

The problem is that i want to match everything except what i get from the regexp above!

I thought that the following would be correct:
(<^(span|div|p)>)

Way doesn't this math everything ecept <span>, <div> and

and does someone have a solution to my problem?

Thanks
/crippa



Sponsored Link
Ads by Google

Response Number 1
Name: gpp
Date: December 17, 2003 at 06:10:38 Pacific
Reply:

Worked just fine for me..

if(!($line =~ /<(span|div|p)>/)){
print $line;
}


0

Response Number 2
Name: crippa
Date: December 17, 2003 at 07:10:30 Pacific
Reply:

Well i want to solve it with pure rexexp and not programaticly.

Thanks anyway!

/crippa


0

Response Number 3
Name: SN
Date: December 17, 2003 at 10:32:45 Pacific
Reply:

(<[^>]*>) or maybe even (<.*?>)
The second one would be preferable case.

(<^(span|div|p)>)
You didn't mention a language, but in Perl and those that mimic its regexes, the ^ character, when not the first character in a character set (ie [^a]), means "match at the beginning of a line", so you're telling it to match anything that has <, followed by a newline, and then either of the three tags you enumerated.

I assume you know the problems with any of the regexes you mentioned...Consider a javascript:
if (i<10 || i>20)
  //do something

I've been around the block with regexes, and can't think of any way to do what you want without using gpp's approach. I'm not saying there isn't a way, but you may consider another solution. Why can't you use programming? The only reason I can think of that one would want to do this would be if you were using a text editor of some kind that supported regular expressions, and you wanted to strip out all the html tags. Is this the case?

Good luck, and feel free to post some specifics.

-SN


0

Response Number 4
Name: FishMonger
Date: December 17, 2003 at 21:57:27 Pacific
Reply:

>> Well i want to solve it with pure rexexp and not programaticly.

Solving it with a regex, is doing it programaticly! But, if you're asking how to remove most (but not all) of the html tags with a single regex and nothing else, then here's 2 (substitution) regex's that will get you close to what you're looking for. The second one might do exactly what you need, but needs some additional testing to be sure.

s/(<[^>]+>)//g
s/(<\/?(span|div|p)[^>]*>)//g

Sorry SN, but I think I'm going to disagree with:

>> (<[^>]*>) or maybe even (<.*?>)
>> The second one would be preferable case.

Due to the amount of backtracking work the regex engine needs to do with the second case, the negated character class will always match (or fail) faster.



0

Sponsored Link
Ads by Google
Reply to Message Icon

Related Posts

See More







Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Programming Forum Home


Sponsored links

Ads by Google


Results for: problem with regexp

Problem with Java Mail www.computing.net/answers/programming/problem-with-java-mail/12388.html

problem with date() function in PHP www.computing.net/answers/programming/problem-with-date-function-in-php/8855.html

Problem with VB coding www.computing.net/answers/programming/problem-with-vb-coding/6092.html