Is it possibile to set a full word to be a field separator?

January 7, 2012 at 04:26:52
Specs: Linux x86_64)
I'm getting pretty fascinated by bash scripting and i started using it as much as i can in order to automate most of my tasks.

What i need to do now is to read very very long single html lines and extract data from them.
All the things i learnt about manipulating strings with a basic usage of sed, awk and grep led me to know the importance and usefulness of changing the IFS to treat text in different ways.

As far as i know, it is possible to set the IFS to be any character you want, but what about setting the IFS to be an entire word? (for example to be the word "href") so that the text will be automatically split at every href tag i have?

I usually do this: IFS=$'whatever' but this way the IFSs will be "w","h","a","t","e","v" and "r"
I need "whatever" instead.


See More: Is it possibile to set a full word to be a field separator?

Report •


#1
January 7, 2012 at 04:28:07
Uh and sorry for the spelling error in the title. I'm not english. :)

Report •

#2
January 7, 2012 at 09:07:30
One way is to choose a field seperator that is not presently used in datafile.txt. In this example I use a pipe symbol. Then, use sed to change 'whatever' to a pipe symbol, and, finally, pipe it to awk:

sed 's/whatever/|/g' datafile.txt | awk ' BEGIN {FS="|" } { print "do whatever" } '

This example changes the pipe symbol back to whatever:

sed 's/whatever/|/g' datafile.txt | awk ' BEGIN {FS="|" } { print "do whatever" } ' |sed 's/|/whatever|g'

Of course, you can always use tmp files:

sed 's/whatever/|/g' datafile.txt > tmp.file

awk ' BEGIN {FS="|" } { print "do whatever" } ' tmp.file

BTW, only GNU awk (i.e. Linux) supports multiple Field Seperators. Traditional awk does not.


Report •
Related Solutions


Ask Question