Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi All,I have a question on how regular expressions evaluate.
I have the following regular expression that I use in SED:
echo hello world abac hello world abac| sed 's/^\([a-zA-Z0-9 <:_=\."][a-zA-Z0-9 <:_=\."]*\)abac\(.*\)/$\1arun\2/g'
I could not quite figure out why the output of this program was:
$hello world abac hello world arun
Why is it that the first instance of abac was not picked for replacement. Why was the second occurance get matched first. If I remove the second occurance of abac I still get the following output:
$hello world arun hello world
Can anyone explain the above behaviour.
Also assuming that I need to replace both occurances while expecting a particular sequence of characters preceeding abac like in the case above how would I replace both the occurances of abac.

A regular expression always matches the longest qualifying string, so that explains the behaviour.
It's a little late, but tomorrow I will provide a solution on how to isolate and change two of those expressions in the same line.

When you want to change a pattern and the line may contain two of these patterns one following the other, this would not be a problem if those two back-to-back patterns were separately distinguishable. You would just include the /g flag (as you did), to make sed process all patterns on the line instead of just the first pattern.
But in your case, the concatenation of back-to-back patterns also happen to qualify as one even longer pattern, and of course sed finds the one longer pattern instead of two separate patterns based on the regexp rule of always matching the longest qualifying string.
One way to handle that is to force sed to see both patterns by making it look for an expression such as <pattern><pattern>. But if a line does not contain <pattern><pattern>, the line would not get modified, so you also have to have your single-pattern command as well. The single-pattern command will have to come last to give the double pattern command first go at it.
But since the pattern allows most characters, the first pattern in the double-pattern expression grabs all that it can, qualifying the longest possible string. That leaves the shortest possible qualifying string for the second pattern, which in this case will be just a single character preceding abac, since we insist on one-or more characters at that point.
So, the following code works, although you may not care for which string of characters is captured to end the first pattern, and which string of characters is captured to start the second pattern. But if so, then you can make your patterns a bit more restrictive.
Since I had to code back-to-back patterns, to keep the code shorter, I used a much simpler expression, and changed the test data to all lowercase accordingly.
mysed.sh:
sed \
-e 's/\([a-z ][a-z ]*\)abac\(.*\)\([a-z ][a-z ]*\)abac\(.*\)/A:\1arun\2B:\3arun
\4/g' \
-e 's/\([a-z ][a-z ]*\)abac\(.*\)/$\1arun\2/g' \
myfilemyfile:
a b c abac a b c abac xyz
a b c abac xyz./mysed.sh
A:a b c arun a b cB: arun xyz
$a b c arun xyz
A:a b c arun a b cB: arun xyz
$a b c arun xyz

![]() |
Source code for Greetng p...
|
Seraching for records wit...
|

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |