Computing.Net > Forums > Unix > sort data

sort data

Reply to Message Icon

Original Message
Name: bjorb
Date: September 15, 2005 at 08:09:23 Pacific
Subject: sort data
OS: Win XP
CPU/Ram: Intel Pentium
Comment:

How to sort data?
I have a set of data that I wish to sort by using a Unix script. The data is similar to the set given below. Columns represent x-, y- and z-coordinates and are supposed to be sorted considering y-coordinates. It is important that each block is sorted individually and that the separating blank row is kept.

Data set should be sorted:

From
100.000 23.000 150.000
99.000 83.000 369.000
110.000 15.000 123.000

25.000 23.000 23.000
9.000 63.000 81.000
15.000 38.000 23.000

to
110.000 15.000 123.000
100.000 23.000 150.000
99.000 83.000 369.000

25.000 23.000 23.000
15.000 38.000 23.000
9.000 63.000 81.000

Data is retrieved from data.dat and is to be re-entered into data.dat.

How can I do this?

regards

bjorb


Report Offensive Message For Removal

Response Number 1
Name: nails
Date: September 15, 2005 at 10:16:39 Pacific
Subject: sort data
Reply: (edit)

Perhaps I cheated, but I split bigfile after line 3 on the first blank:

csplit -f ss bigfile /^/3

This created two files, ss00 & ss01

I then sorted on the second field:

sort -k 2,2 ss00 > bigfile
sort -k 2,2 ss01 >> bigfile



Report Offensive Follow Up For Removal

Response Number 2
Name: bjorb
Date: September 15, 2005 at 11:05:49 Pacific
Subject: sort data
Reply: (edit)

Thanks for a quick reply Nails!

It would have worked but maybe I should have mentioned that there are more blocks following the two first. So it would be quite elaborate to use the method you presented.

I hope there is a simpler way to do this, if there is any.


Report Offensive Follow Up For Removal

Response Number 3
Name: Jim Boothe
Date: September 15, 2005 at 11:39:45 Pacific
Subject: sort data
Reply: (edit)

This solution prepends a 4-digit group number to each line to control the first sort level.  That data is then piped into the sort command, then the sorted data is piped into cut to delete the first 5 characters of each line.

I added group number as fixed-width 4 digits, not for the benefit of the sort, but because it makes it a bit easier to chop it off afterward.

Your second fields all have the same decimal alignment, but in case they do not, you need to specify a numerical sort for that sort key, as I have.

This code outputs to data.new. Your script would need to rename it to data.dat.

awk '{
if (NF==0)
   group++
printf "%4.4d %s\n",group,$0
}' data.dat |
sort -k1,1 -k3,3n |
cut -c-6- > data.new

0000 100.000 23.000 150.000
0000 99.000 83.000 369.000
0000 110.000 15.000 123.000
0001
0001 25.000 23.000 23.000
0001 9.000 63.000 81.000
0001 15.000 38.000 23.000

cat data.new
110.000 15.000 123.000
100.000 23.000 150.000
99.000 83.000 369.000

25.000 23.000 23.000
15.000 38.000 23.000
9.000 63.000 81.000


Report Offensive Follow Up For Removal

Response Number 4
Name: nails
Date: September 18, 2005 at 12:08:05 Pacific
Subject: sort data
Reply: (edit)

Sorry it took so long to get back to you. You can try this method:


csplit -ks -f xss bigfile '/^$/' {9}

rm -f bigfile
ls -1 xss*|while read y
do
echo $y
sort -k 2,2 "$y" >> bigfile
done
# end script

I've assumed the file has no more than 9 blocks with {9}. All you have to do is make sure that number is greater than the number of blocks you'll ever have. You'll get an 'out of range' error on the csplit command, but the -k flag saves the created files.



Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: sort data

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software