Sorting issues

Creative Vado pocket video cam pink camc...
November 9, 2009 at 15:40:27
Specs: KSH
Hi,
I have a file which has data as
path/aa1.txt
path/aa2.txt,...(each in separate lines)
I want a unix script to sort this file in order. It is not working when I give
"sort filename"

Thanks,

See More: Sorting issues

Report •

#1
November 9, 2009 at 20:21:04

If you are sorting the contents of filename, redrect the contents to sort this way:

sort < filename


Report •

#2
November 10, 2009 at 03:36:48
Question is not clear ... HOW is it incorrect ? The SORT command would be correct for most cases, but I'm guessing the theadstarter needs something special.

What ?


Report •

#3
November 10, 2009 at 08:19:16
My file has values like below and I want 101 after 100.Tried "<" but no luck. (:

/home/dd/1/1.xml
/home/dd/1/10.xml
/home/dd/1/100.xml
/home/dd/1/1000.xml
/home/dd/1/101.xml


Report •

Related Solutions

#4
November 10, 2009 at 14:56:17
The problem you are having is that you are mixing character and numeric data. Let me think about it.




Report •

#5
November 11, 2009 at 03:48:38
Ow, indeed, the old numeric-charcaters-sorting-problem ..

One way of solving this issue, is to not write "1" but "00001", and not "11" but "00011".
The key here is to always have the same number of digits. What that number should be, is for you to determine, but be sure all numbers will fit, or you will have the "nobody will ever use more than 640k of memory" issue. But, I'm going away from the core of the issue here.

Alternatively, you should strip the numbers from the text, and then sort again ... because, the SORT command can do a "smart" sort, based on numbers only. Example, he CAN sort to come to this conclusion:

1
2
11
100
111
120
200
201

There's some parameter in SORT to do that. But, you must only have numbers!

For the stripping, there's lot of ways, this will work:

sed "s/\/home\/dd\/1\///g"
sed "s/\.xml//g"


Report •

#6
November 11, 2009 at 11:25:57
Thanks Nails & Tvc.I did like you mentioned."0000" logic.Cool.

Report •

#7
November 11, 2009 at 13:33:11
I did a variation of what tvc suggested. If you want to keep the original text, you need to do more than strip out the numbers. Here is what the following kludge does:

1) save a copy of the last field, $NF,.
2) strip out teh .xml extension leaving a number.
3) print the original line with a new 2nd field which is numeric.
4) sort numerically by the second field.
5) discard the second field with the last awk script.

Yes, it is a kludge, but maybe somebody smarter than I can do something better:

#!/bin/ksh

nawk ' BEGIN { FS="/" }
{
   var=$NF
   gsub(".xml","", var)
   printf("%s  %d\n", $0, var)
} ' datafile.txt | sort -n -k 2,2 | nawk ' { print $1 } '

I am using nawk because of solaris


Report •

#8
April 5, 2010 at 04:05:06
How about this.

$ cat filename | sed 's~.*/\([0-9]*\)\.xml$~\1 &~' | sort -n | awk '{ print $2 }'
/home/dd/1/1.xml
/home/dd/1/10.xml
/home/dd/1/100.xml
/home/dd/1/101.xml
/home/dd/1/1000.xml
$


Report •

#9
April 5, 2010 at 07:17:24
virgil:

Yes, that works. Can you explain what you did - especially in the sed part? Thanks.


Report •

#10
April 7, 2010 at 01:44:21
I expressed the filename as a regular expression.

The file name consists of 3 parts:
- any number of characters until a slash .*/
- followed by a series of digits [0-9]*
- followed by a ".xml".

The part of interest are the digits, so put them in brackets.
([0-9]*). We use this as the sorting key for sort, and it is captured in the sed match variable \1.

Finally we put the sort key "\1" in front of the entire matched filename "&", and run it through sort and awk.

Once the keys have been sorted by sort -n, use awk to remove them and takes only the filename portion.


Report •

Ask Question