Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
I have a file with essentially two fields. I want to extract only the first occurence of each possible entry in field 1 e.g. from the file
name1 info1
name2 info2
name1 info3
name2 info4I would want
name1 info1
name2 info2I've tried things like
$ sort -u -k 1,1 filename
but there seems no consistency over which unique record the -u flag selects
Many thanks to anyone that can help.

In the previous reply, where did that $1 come from? It would be the last parameter to that shell, and if non-null, it would certainly not be welcome by the sort command.
There is an implied final sort key of the entire line, so your results, although maybe not what you are wanting, should at least be consistent and predictable. Do the same sort without the -u and maybe you can see how it is sorting.
If you want to pick up the lowest "info" for each field1, of course you could not do:
sort -u -k1,1 -k2,2 filename
because unique suppression would then be based on both k1 and k2. But you can control the sort with multiple keys, then suppress based on fewer keys with:
sort -k1,1 -k2,2 filename | sort -mu -k1,1But if you do not want LOWEST "info" but rather the first chronological "info" for each field1, then you have a tougher problem. Once you sort, you lose chronology. awk easily addresses chronology with NR (record number). If sort had a special designator like NR, then you could:
sort -u -k1,1 -kNR filename
but I can find no chronology control in the sort command.A messy solution is to use awk to add a chronology field (NR) to your file, then sort, then use awk or cut to remove the extra field.
Following is a simple solution that uses awk instead of sort:
awk '{if (k[$1]!=1) {print;k[$1]=1}}' fname
awk outputs each line if it is the first time it has seen that particular field1, and flags an array to remember it. Since there is no sort, the output is chronological also, so if you want your output to be sorted by field1, just pipe the awk output into a sort. Some notes about the awk solution:
Don't know how many field1's it can store in the array, but it is a lot.
If your target field is actually field3 instead of field1, then change both occurrences of $1 to $3.
For awk, any amount of white space is a single field delineator (by default) whereas sort would see a string of blanks as multiple delineators separating null fields. awk can specify a non-default field delineator with -F option.
James

James,
Many thanks - the awk solution does the job nicely. And it has a certain elegance too.
Cheers,
PP

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |