I don't know whether Perl is right or wrong. It just that I have nearly 30 years of DOS or DOS-like program experience and more than 30 years of BASIC experience.
This is what I would do.
1 - sort the text file using a DOS or freeware utility on the reguired position of the string.
2 - use a BASIC program to read through the file and save only one occurence of each record.
The following example is the BASIC program I used to solve the problem in Response 3. Your problem is similar except the fields should be separated before the first sort. And since you only need one of each, the count does not have to be saved, and the last sort is omitted.
rem ranklist.bas
rem input = TEXT.LST
rem output = COUNT.SRT
rem original written in PowerBasic v3.1 for DOS
rem should be compatible with QBASIC
rem tested under W98SE
t1$ = time$
rem sort list
shell "SORT TEXT.LST > TEXT.SRT"
rem process list
open "TEXT.SRT" for input as #1
open "COUNT.LST" for output as #2
cnt = 0
lcnt = 0
ncnt = 0
lastnum = -999
nextline:
if eof(1) then
goto endlist
end if
line input #1, xnum$
znum = val(xnum$)
cnt = cnt + 1
lcnt = lcnt + 1
if (znum <> lastnum) then
if (lastnum = -999) then
lastnum = znum
else
gosub makestr
end if
end if
lastnum = znum
goto nextline
endlist:
close #1
gosub makestr
close #2
t2$ = time$
rem sort by most occurrences
shell "SORT /R /+8 COUNT.LST > COUNT.SRT"
print lcnt; "numbers distributed among"; ncnt; " numbers"
print "Results in file COUNT.SRT"
print t2$, t1$
end
makestr:
rem left space filled to right align values
ynum$ = right$(" " + str$(lastnum), 7)
ycnt$ = right$(" " + str$(lcnt), 7)
print #2, ynum$; ycnt$
cnt = 1
ncnt = ncnt + 1
return
Note: Some lines were wrapped by the editor.