Computing.Net > Forums > Unix > Expression Matching

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Expression Matching

Reply to Message Icon

Name: arunsista
Date: September 1, 2005 at 02:03:10 Pacific
OS: Unix
CPU/Ram: P4, 1 GB RAM
Comment:


Hi All,

I have a question on how regular expressions evaluate.

I have the following regular expression that I use in SED:

echo hello world abac hello world abac| sed 's/^\([a-zA-Z0-9 <:_=\."][a-zA-Z0-9 <:_=\."]*\)abac\(.*\)/$\1arun\2/g'

I could not quite figure out why the output of this program was:

$hello world abac hello world arun

Why is it that the first instance of abac was not picked for replacement. Why was the second occurance get matched first. If I remove the second occurance of abac I still get the following output:

$hello world arun hello world

Can anyone explain the above behaviour.

Also assuming that I need to replace both occurances while expecting a particular sequence of characters preceeding abac like in the case above how would I replace both the occurances of abac.




Sponsored Link
Ads by Google

Response Number 1
Name: Jim Boothe
Date: September 5, 2005 at 20:14:55 Pacific
Reply:

A regular expression always matches the longest qualifying string, so that explains the behaviour.

It's a little late, but tomorrow I will provide a solution on how to isolate and change two of those expressions in the same line.


0

Response Number 2
Name: Jim Boothe
Date: September 7, 2005 at 11:13:41 Pacific
Reply:

When you want to change a pattern and the line may contain two of these patterns one following the other, this would not be a problem if those two back-to-back patterns were separately distinguishable. You would just include the /g flag (as you did), to make sed process all patterns on the line instead of just the first pattern.

But in your case, the concatenation of back-to-back patterns also happen to qualify as one even longer pattern, and of course sed finds the one longer pattern instead of two separate patterns based on the regexp rule of always matching the longest qualifying string.

One way to handle that is to force sed to see both patterns by making it look for an expression such as <pattern><pattern>. But if a line does not contain <pattern><pattern>, the line would not get modified, so you also have to have your single-pattern command as well. The single-pattern command will have to come last to give the double pattern command first go at it.

But since the pattern allows most characters, the first pattern in the double-pattern expression grabs all that it can, qualifying the longest possible string. That leaves the shortest possible qualifying string for the second pattern, which in this case will be just a single character preceding abac, since we insist on one-or more characters at that point.

So, the following code works, although you may not care for which string of characters is captured to end the first pattern, and which string of characters is captured to start the second pattern. But if so, then you can make your patterns a bit more restrictive.

Since I had to code back-to-back patterns, to keep the code shorter, I used a much simpler expression, and changed the test data to all lowercase accordingly.

mysed.sh:
sed \
-e 's/\([a-z ][a-z ]*\)abac\(.*\)\([a-z ][a-z ]*\)abac\(.*\)/A:\1arun\2B:\3arun
\4/g' \
-e 's/\([a-z ][a-z ]*\)abac\(.*\)/$\1arun\2/g' \
myfile

myfile:
a b c abac a b c abac xyz
a b c abac xyz

./mysed.sh
A:a b c arun a b cB: arun xyz
$a b c arun xyz
A:a b c arun a b cB: arun xyz
$a b c arun xyz


0

Sponsored Link
Ads by Google
Reply to Message Icon

Related Posts

See More


Source code for Greetng p... Seraching for records wit...



Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home


Sponsored links

Ads by Google


Results for: Expression Matching

Negate a Regular Expression? www.computing.net/answers/unix/negate-a-regular-expression/6515.html

Regular expression in awk www.computing.net/answers/unix/regular-expression-in-awk/6378.html

Parameter file for a shell script www.computing.net/answers/unix/parameter-file-for-a-shell-script/6026.html