Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
How do I retrieve text between two strings located on different lines of a file using awk.
myfile.txt contains following data
***************
print -rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is unix
****************How can i retrieve text between words "print" & "unix" ?
Desired output is as follows
"
-rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is
"

Here is a place to start using the awk range syntax:
awk '/print/, $NF ~ /unix/ ' myfile.txt
The problem is that it doesn't wipe out the beginning and ending fields ie. print unix.

awk '\
/print/,/unix/ {
if (match($0,"print"))
print substr($0,index($0,"print"))
else
if (match($0,"unix"))
print substr($0,1,index($0,"unix")-1)
else
}' myfile.txtAbove solution prints each group, and prints partial lines for both the starting line of the group and the ending line of the group. But it has one bug in it. Consider this example:
Line1 print
Line2 name
Line3 another print
Line4 address
Line5 unixawk would consider all 5 lines above as one begin-end group. Line3 does not qualify as a starting line, being in the middle of the group. But the above awk code would use its partial-line print logic on both Line1 and Line3. If your data would never have this situation, or if that is the behavior you want, then the above code should work. But the following solution solves that problem by using a flag to positively identify when we are processing the actual starting line of a group.
awk '\
/print/,/unix/ {
if (newgroup==0)
{print substr($0,index($0,"print"))
newgroup=1}
else
{if (match($0,"unix"))
{print substr($0,1,index($0,"unix")-1)
newgroup=0}
else
}
}' myfile.txtAnd finally, here is a solution that does not use the /start/,/end/ structure.
awk '{
if (mode==0) # SCAN MODE (look for start line)
if (match($0,"print"))
# Switch to output mode, print first partial
# line starting with the word "print"
{mode=1
print substr($0,index($0,"print"))
}
else # this was not a starting line, bypass
next
else # OUTPUT MODE (printing lines)
if (!match($0,"unix")) # if NOT "unix"
print # print entire line
else # print partial line up to "unix"
{mode=0
print substr($0,1,index($0,"unix")-1)
}
}' myfile.txt

Thanks a lot james & nails :) Its working.
but now the requirement has changed a bit.myfile.txt now contains following data
***************
abc
print -rmr --
lkn
print -rmr --
efg
print -rmr --
tuv
print -rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is unix
****************How can i retrieve text between words "print" (at line number 8) & "unix" ?
Should the search start from bottom & look for first occurance of print from bottom ?
Desired output is as follows
"
-rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is
"

Of course there are many ways to do this, and processing the lines in reverse order is creative thinking. And it's easy enough to reverse the lines, but then you have to buffer them in order to print each group in forward sequence.
But processing in normal sequence can easily be done with awk - just accumulate the lines until time to either print the group or abandon the group and start accumulating again.
And if you want an awk solution, I can post one, but sed lends itself well to this approach (plus, I need the sed practice).
sed -e '/print/ba' \
-e '/unix/bb' \
-e 'H;d' \
-e :a -e 's/.*print[ ]*//' -e 'h;d' \
-e :b -e 's/^\(.*\)unix.*/\1/' -e 'H;g' \
myfile.txtNotes:
line contains "print": branch to :a
line contains "unix": branch to :b
Normal line: append line to hold buffer.:a
edit the line, then start a new hold buffer:b
edit the line, append to hold buffer, then get the hold buffer, which will then get printed by default.

many thanks for the solution. I really appreciate the quick reply :)
But the requirement is to resolve this using awk. So it would be gr8 if u could provide the solution using awk

So this is an awk assignment?
If you post what you have so far, I will be happy to critique it and give suggestions.

awk 'BEGIN{ORS=" ";}
{
l=l" "$0
}
END {
m=split(l, a ,"print -rmr --")
for (i=1;i<=m;i++) {
if ( a[i] ~ /unix$/) {
print a[i]
}
}
}
' file

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |