Computing.Net > Forums > Unix > Split row into meaningful records

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Split row into meaningful records

Reply to Message Icon

Name: nimi
Date: April 19, 2007 at 00:45:33 Pacific
OS: Unix
CPU/Ram: N/A
Product: N/A
Comment:

Hi,

Please help me to write the following to

..cnai #Generated on Tue Mar 20 15:47:23 2007 by CNAI R12T02, user echuyau
..capabilities BASIC
.utctime 2007-03-20 07:47:23
.subnetwork ONRM_RootMo:MAXIS2G
.domain NREL
.set BSC2:AIA1B10:*:HAFID11
BSC_NAME="BSC2"
CELL_NAME="AIA1B10"
NREL_NAME="HAFID11"
AWOFFSET=5
BQOFFSET=3
BQOFFSETAFR=3
CAND="BOTH"
CS="NO"
HIHYST=5
KHYST=3
KOFFSET=0
LHYST=3
LOHYST=3
LOFFSET=0
OFFSET=0
TRHYST=2
TROFFSET=0
.set BSC2:ANTAM16:BSC2:ANTAM17
BSC_NAME="BSC2"
CELL_NAME="ANTAM16"
NREL_NAME="ANTAM17"
AWOFFSET=3
BQOFFSET=3
BQOFFSETAFR=3
CAND="BOTH"
CS="YES"
HIHYST=5
KHYST=3
KOFFSET=0
LHYST=3
LOHYST=3
LOFFSET=0
OFFSET=0
TRHYST=2
TROFFSET=0

to be this

AIA1B10|HAFID11|BSC2|AIA1B10|HAFID11|5|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
ANTAM16|ANTAM17|BSC2|ANTAM16|ANTAM17|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0

Please help


Nimi



Sponsored Link
Ads by Google

Response Number 1
Name: nails
Date: April 19, 2007 at 17:10:41 Pacific
Reply:

A way: when the line starts with .set, grab the second and last field. When the .set line is found any line from then on grab the last field if the separator is an equal sign:

#!/bin/ksh

s_set=0
recset=""
# get rid of the # double quotes
tr -d '\"' < myfile |
while read line
do
case "$line" in
.set*)
if [[ -n recset ]]
then
# get rid of the last pipe
recset=$(echo "$recset" |sed 's/|$//g')
echo $recset
fi
recset=$(echo "$line"|awk ' BEGIN { FS=":" } { print $2"|"$NF"|" } ')
s_set=1
;;

*)
if [[ s_set -eq 1 ]]
then
nl=$(echo "$line"|awk ' BEGIN { FS="=" } { print $NF"|" } ')
recset=${recset}${nl}
fi
;;
esac

done
if [[ -n recset ]]
then
recset=$(echo "$recset" |sed 's/|$//g')
echo $recset
fi


0

Response Number 2
Name: nimi
Date: April 19, 2007 at 20:21:16 Pacific
Reply:

Hi,

Thank you for the solution. Where do I replace my input file name and where do i write my output file name for the above code.

Thanks
Nimi


0

Response Number 3
Name: nimi
Date: April 19, 2007 at 20:38:51 Pacific
Reply:

what i meant was how do i replace the echo to writing to a file.

Nimi


0

Response Number 4
Name: nails
Date: April 20, 2007 at 07:41:29 Pacific
Reply:

First, I called the input myfile:

tr -d '\"' < myfile ....

Change myfile to your file.

Second, there are several ways of writing to an output file. The two 'echo $recset' lines can be changed to this:

echo $recset >> outputfile.

Perhaps the easiest is to send the output of the script to a file. If the script is called 'myscript', the command is:

myscript > outputfile


0

Response Number 5
Name: ghostdog
Date: April 21, 2007 at 21:18:25 Pacific
Reply:

you can do everything in awk
[code]
#!/bin/sh


awk 'BEGIN {FS = "=";c=0; }
NR <=5 { next}
/\.set/{ c=c+1 ;next}
{ array[c] = array[c]$2"|"}
END {
for (e in array) {
b = gensub(/(\")|(\|$)/,"","g",array[e])
n = split(b,f,"|")
printf("%s|%s|" , f[2] , f[3])

for (i=1; i<=n ; i++) {
printf("%s|", f[i])
}
print ""
}
}' "file" > "outputfile"
[/code]


0

Related Posts

See More



Response Number 6
Name: nimi
Date: April 22, 2007 at 19:26:52 Pacific
Reply:

Hi Ghostdog,

Tried your code but I am getting the following error message:

awk: Syntax error near line 7
awk : illegal statement near line 7
awk : new line in string near line 7

FYI, line 7 is END {

I replaced the file with my input file name.


Nimi


0

Response Number 7
Name: nimi
Date: April 22, 2007 at 20:17:41 Pacific
Reply:

Nails,

My file size is approximately about 6823009 Bytes. It is taking such a long time to complete if i run using your script. Please suggest another way to improvise this.

Thanks
Nimi


0

Response Number 8
Name: nimi
Date: April 22, 2007 at 20:54:24 Pacific
Reply:

Nails,

Please explain this code to me
if [[ -n recset ]]
then
# get rid of the last pipe
recset=$(echo "$recset" |sed 's/|$//g')
echo $recset
fi


Nimi


0

Response Number 9
Name: ghostdog
Date: April 22, 2007 at 21:01:33 Pacific
Reply:

use gawk instead of awk.


0

Response Number 10
Name: nimi
Date: April 22, 2007 at 22:33:30 Pacific
Reply:

ghostdog,

I replaced gawk instead of awk in your script and i got this


a.sh: gawk: not found

Please help


Nimi


0

Response Number 11
Name: ghostdog
Date: April 22, 2007 at 23:14:19 Pacific
Reply:

well, gensub is a gawk externsion. since you don't have gawk, then have to use gsub:try this

[code]
awk 'BEGIN {FS = "=";c=0; }
NR <=5 { next}
/\.set/{ c=c+1 ;next}
{ array[c] = array[c]$2"|"}
END {
for (e in array) {
gsub(/(\")|(\|$)/,"",array[e])
n = split(array[e],f,"|")
printf("%s|%s|" , f[2] , f[3])
for (i=1; i<=n ; i++) {
printf("%s|", f[i])
}
print ""
}
}' "file" > "outputfile"
[/code]


0

Response Number 12
Name: nimi
Date: April 22, 2007 at 23:44:19 Pacific
Reply:

Hi Ghostdog/nails,

I still get the same error. Furthermore i just realised that I don't need to take the values following .set. Only take records from BSC_NAME.

old row
AIA1B10|HAFID11|BSC2|AIA1B10|HAFID11|5|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0

new row should be
BSC2|AIA1B10|HAFID11|5|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0

TQ


Nimi


0

Response Number 13
Name: ghostdog
Date: April 23, 2007 at 00:06:02 Pacific
Reply:

ok..can you try nawk ? if you are using solaris, another location can be /usr/xpg4/bin/awk.
nawk works for me too.

if you don't need AIA1B10|HAFID11|, then remove printf("%s|%s|" , f[2] , f[3]) from my code.


0

Response Number 14
Name: nimi
Date: April 23, 2007 at 00:43:39 Pacific
Reply:

Hi, Ghostdog,

Finally it worked (with nawk) and it was super fast. Thanks for the code. Another help is needed. My eof has the following line [..end]

.set SHTI25:ZIGYMR8:SHTI25:ZIGYMR7
BSC_NAME="SHTI25"
CELL_NAME="ZIGYMR8"
NREL_NAME="ZIGYMR7"
AWOFFSET=3
BQOFFSET=3
BQOFFSETAFR=3
CAND="BOTH"
CS="YES"
HIHYST=5
KHYST=3
KOFFSET=0
LHYST=3
LOHYST=3
LOFFSET=0
OFFSET=0
TRHYST=2
TROFFSET=0
..end

How do i ignore the ..end. I realise that this set of record causes the whole output to go hay wire.

RWNG05|KIRAM13|PLT1B17|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0| (ok)
SHTI25|ZIGYMR8|ZIGYMR7|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0|| (not ok)
RWNG05|KIRAM13|RIS1U11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0| (ok)

Ghostdog, can you explain to me your script. Really interested to pick up awk and how can i be good at it?


Nimi


0

Response Number 15
Name: ghostdog
Date: April 23, 2007 at 01:28:05 Pacific
Reply:

if you want to skip lines beginning with ..end, you can use

...
/^\.\.end/ { next }
...


explanation:
FS is field separator. I set it '=' , so that the first field $1 will be BSC_NAME, CELL_NAME etc and second field $2 will "BSC2","ANTAM16" and so on.
I set c=0, so that i can store to array. come to that later.
NR <=5 { next} : If skip reading the first 5 lines.
/\.set/{ c=c+1 ;next} : when awk sees the line with .set, i increment c, so c = 1. and awk read the next record
{ array[c] = array[c]$2"|"} : this means store into array, array[c] will be "" (null) in the first run. since c is 1, then the value of array[1] will be concatenated with $2 (field 2) and then a "|". the result is a line with all $2 concatenated.
END {: means before awk exits after processing every line, it performs the code inside END {}
for (e in array) {: going through the array
gsub(/(\")|(\|$)/,"",array[e]) : substitue every double quotes or the pipe at the end of the line with ""
printf("%s|\n", array[e]) : after substitute, print the results.

finally, the final code can be like this:
[code]
awk 'BEGIN {FS = "=";c=0; }
NR <=5 { next}
/^\.\.end/ { next }
/\.set/{ c=c+1 ;next}
{ array[c] = array[c]$2"|"}
END {
for (e in array) {
gsub(/(\")|(\|$)/,"",array[e])
printf("%s|\n", array[e])
}

}' "file"

[/code]


As for awk reference, you can google for GNU awk. Then look thru the tutorial at its website. happy awking


0

Response Number 16
Name: nimi
Date: April 23, 2007 at 01:51:39 Pacific
Reply:

Thanks Ghostdog. I owe you a treat. :)

Nimi


0

Response Number 17
Name: nimi
Date: April 23, 2007 at 02:29:45 Pacific
Reply:

Ghostdog,

Is it alright if the contents of the file is not in order?

This is the actually sequence of the input file

BSC2|AIA1B10|HAFID11|5|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|ANTAM17|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|ANTAM18|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|EESTM18|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|FOOYM16|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|HAWPM16|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|JPN1U11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|PIN1U12|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM16|TOAHU15|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM17|ANTAM16|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0
BSC2|ANTAM17|ANTAM18|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0
BSC2|ANTAM17|EESTM18|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM17|FOOYM16|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM17|FOOYM18|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM17|GMB1U11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0
BSC2|ANTAM17|GMB1U12|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0


and the output after running your script
BSC2|AIA1B10|HAFID11|5|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|ANTAM17|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|ANTAM18|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|EESTM18|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|FOOYM16|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|HAWPM16|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|JPN1U11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|PIN1U12|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM16|TOAHU15|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|ANTAM17|ANTAM16|3|3|3|BOTH|YES|5|3|0|3|3|0|0|2|0|
BSC2|YOWCM28|KRU1B11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|YOWCM28|NCSBM11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|YOWCM28|NIN1U10|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|YOWCM28|PIREB11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|YOWCM28|SEPAU11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|
BSC2|YOWCM28|SQU1B11|3|3|3|BOTH|NO|5|3|0|3|3|0|0|2|0|

Nimi


0

Response Number 18
Name: ghostdog
Date: April 23, 2007 at 07:16:12 Pacific
Reply:

can you amend the code to something like below and try again.

END {
for (e=1;e<=c;e++) {
gsub(/(\")|(\|$)/,"",array[e])
printf("%s|\n", array[e])
}


0

Response Number 19
Name: nimi
Date: April 23, 2007 at 18:03:16 Pacific
Reply:

Ghostdog.

It worked. Explanation please?

Nimi


0

Response Number 20
Name: ghostdog
Date: April 23, 2007 at 18:37:56 Pacific
Reply:

With reference to previous:
/\.set/{ c=c+1 ;next}
{ array[c] = array[c]$2"|"}

this says the items are stored into array, with indices denoted by the value of c, which are just numbers starting from 1 to whatver. so basically when you want to call the array values, it is like this:
array[1], array[2]....
however,
when we displayed the array elements using this format of the for loop:
for (e in array) { }, it list out the items in arbitary order. If you want ordered, then we have to go by indices. fortunately we have the array indices stored as numbers starting from 1, so its easy to call them out using this for loop format:
for (e=1;e<=c;e++) { }


0

Sponsored Link
Ads by Google
Reply to Message Icon






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home


Sponsored links

Ads by Google


Results for: Split row into meaningful records

perl script - record altering www.computing.net/answers/unix/perl-script-record-altering/3546.html

Split a file into 2 files www.computing.net/answers/unix/split-a-file-into-2-files/7676.html

grep search row record www.computing.net/answers/unix/grep-search-row-record/6116.html