Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi!
Im having som problem with a regexp that should match all html tags with some exeptions.
I can match all tags with somthing like:
(<[^>]*>) or maby even (<.*?>)I can also math a list of tags like:
(<(span|div|p)>)The problem is that i want to match everything except what i get from the regexp above!
I thought that the following would be correct:
(<^(span|div|p)>)Way doesn't this math everything ecept <span>, <div> and
and does someone have a solution to my problem?
Thanks
/crippa

(<[^>]*>) or maybe even (<.*?>)
The second one would be preferable case.(<^(span|div|p)>)
You didn't mention a language, but in Perl and those that mimic its regexes, the ^ character, when not the first character in a character set (ie [^a]), means "match at the beginning of a line", so you're telling it to match anything that has <, followed by a newline, and then either of the three tags you enumerated.I assume you know the problems with any of the regexes you mentioned...Consider a javascript:
if (i<10 || i>20)
//do somethingI've been around the block with regexes, and can't think of any way to do what you want without using gpp's approach. I'm not saying there isn't a way, but you may consider another solution. Why can't you use programming? The only reason I can think of that one would want to do this would be if you were using a text editor of some kind that supported regular expressions, and you wanted to strip out all the html tags. Is this the case?
Good luck, and feel free to post some specifics.
-SN

>> Well i want to solve it with pure rexexp and not programaticly.
Solving it with a regex, is doing it programaticly! But, if you're asking how to remove most (but not all) of the html tags with a single regex and nothing else, then here's 2 (substitution) regex's that will get you close to what you're looking for. The second one might do exactly what you need, but needs some additional testing to be sure.
s/(<[^>]+>)//g
s/(<\/?(span|div|p)[^>]*>)//gSorry SN, but I think I'm going to disagree with:
>> (<[^>]*>) or maybe even (<.*?>)
>> The second one would be preferable case.Due to the amount of backtracking work the regex engine needs to do with the second case, the negated character class will always match (or fail) faster.

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |