Computing.Net > Forums > Unix > retrieve text between two strings

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

retrieve text between two strings

Reply to Message Icon

Name: patsun
Date: May 14, 2008 at 01:54:58 Pacific
OS: Unix
CPU/Ram: P4
Product: Solaris
Comment:

How do I retrieve text between two strings located on different lines of a file using awk.

myfile.txt contains following data
***************
print -rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is unix
****************

How can i retrieve text between words "print" & "unix" ?

Desired output is as follows
"
-rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is
"



Sponsored Link
Ads by Google

Response Number 1
Name: nails
Date: May 14, 2008 at 09:58:36 Pacific
Reply:

Here is a place to start using the awk range syntax:

awk '/print/, $NF ~ /unix/ ' myfile.txt

The problem is that it doesn't wipe out the beginning and ending fields ie. print unix.


0

Response Number 2
Name: James Boothe
Date: May 19, 2008 at 11:44:03 Pacific
Reply:

awk '\
/print/,/unix/ {
if (match($0,"print"))
   print substr($0,index($0,"print"))
else
   if (match($0,"unix"))
      print substr($0,1,index($0,"unix")-1)
   else
      print
}' myfile.txt

Above solution prints each group, and prints partial lines for both the starting line of the group and the ending line of the group.  But it has one bug in it. Consider this example:

Line1 print
Line2 name
Line3 another print
Line4 address
Line5 unix

awk would consider all 5 lines above as one begin-end group.  Line3 does not qualify as a starting line, being in the middle of the group.  But the above awk code would use its partial-line print logic on both Line1 and Line3.  If your data would never have this situation, or if that is the behavior you want, then the above code should work. But the following solution solves that problem by using a flag to positively identify when we are processing the actual starting line of a group.

awk '\
/print/,/unix/ {
if (newgroup==0)
  {print substr($0,index($0,"print"))
   newgroup=1}
else
  {if (match($0,"unix"))
     {print substr($0,1,index($0,"unix")-1)
      newgroup=0}
   else
      print
  }
}' myfile.txt

And finally, here is a solution that does not use the /start/,/end/ structure.

awk '{
if (mode==0)  # SCAN MODE (look for start line)
  if (match($0,"print"))
      # Switch to output mode, print first partial
      # line starting with the word "print"
     {mode=1
       print substr($0,index($0,"print"))
      }
   else  # this was not a starting line, bypass
     next
else     # OUTPUT MODE (printing lines)
  if (!match($0,"unix"))  # if NOT "unix"
      print   # print entire line
  else        # print partial line up to "unix"
     {mode=0
       print substr($0,1,index($0,"unix")-1)
      }
}' myfile.txt


0

Response Number 3
Name: patsun
Date: May 20, 2008 at 01:22:04 Pacific
Reply:

Thanks a lot james & nails :) Its working.
but now the requirement has changed a bit.

myfile.txt now contains following data
***************
abc
print -rmr --
lkn
print -rmr --
efg
print -rmr --
tuv
print -rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is unix
****************

How can i retrieve text between words "print" (at line number 8) & "unix" ?

Should the search start from bottom & look for first occurance of print from bottom ?

Desired output is as follows
"
-rmr --
My name is XYZ.
My employer is PQR
My exp is 5 yrs
My skill is
"



0

Response Number 4
Name: James Boothe
Date: May 20, 2008 at 08:42:23 Pacific
Reply:

Of course there are many ways to do this, and processing the lines in reverse order is creative thinking.  And it's easy enough to reverse the lines, but then you have to buffer them in order to print each group in forward sequence.

But processing in normal sequence can easily be done with awk - just accumulate the lines until time to either print the group or abandon the group and start accumulating again.

And if you want an awk solution, I can post one, but sed lends itself well to this approach (plus, I need the sed practice).

sed -e '/print/ba'       \
    -e '/unix/bb'        \
    -e 'H;d'             \
    -e :a -e 's/.*print[ ]*//' -e 'h;d'     \
    -e :b -e 's/^\(.*\)unix.*/\1/' -e 'H;g' \
myfile.txt

Notes:

line contains "print": branch to :a
line contains "unix":  branch to :b
Normal line: append line to hold buffer.

:a
edit the line, then start a new hold buffer

:b
edit the line, append to hold buffer, then get the hold buffer, which will then get printed by default.


0

Response Number 5
Name: patsun
Date: May 20, 2008 at 21:49:53 Pacific
Reply:

many thanks for the solution. I really appreciate the quick reply :)

But the requirement is to resolve this using awk. So it would be gr8 if u could provide the solution using awk


0

Related Posts

See More



Response Number 6
Name: James Boothe
Date: May 23, 2008 at 07:49:25 Pacific
Reply:

So this is an awk assignment?

If you post what you have so far, I will be happy to critique it and give suggestions.


0

Response Number 7
Name: ghostdog
Date: May 30, 2008 at 22:41:30 Pacific
Reply:

awk 'BEGIN{ORS=" ";}
{
l=l" "$0
}
END {
m=split(l, a ,"print -rmr --")
for (i=1;i<=m;i++) {
if ( a[i] ~ /unix$/) {
print a[i]
}
}
}
' file



0

Sponsored Link
Ads by Google
Reply to Message Icon






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home


Sponsored links

Ads by Google


Results for: retrieve text between two strings

Search text between...text www.computing.net/answers/unix/search-text-betweentext/4686.html

Communication between two programs www.computing.net/answers/unix/communication-between-two-programs/1387.html

Fail to compare two strings www.computing.net/answers/unix/fail-to-compare-two-strings-/4748.html