Computing.Net > Forums > Programming > Perl script

Perl script

Reply to Message Icon

Original Message
Name: kgbolger
Date: July 29, 2005 at 07:27:22 Pacific
Subject: Perl script
OS: XP Pro SP2
CPU/Ram: Centrino 2Giz 1gig ram
Comment:

Hey,

I'm trying to sort info from some log files I'm working with. The files are .txt and the info on each line is seperated by inverted commas.

I basically need to parse through the info and if a line has appeared before remove it and bring the next line up. Want to end up with all different line so each entry only appears once.

Files can contain 400,00+ lines with up to 80% of the lines repeated.

using perl 5 but not sure of the compiler


Report Offensive Message For Removal


Response Number 1
Name: wizard-fred
Date: July 29, 2005 at 08:41:42 Pacific
Reply: (edit)

This is not a Perl solution.
1 - Sort the file (This assumes that the identical lines are completely identical. As they are log files, I think they may differ bt date and time.)
2 - Remove duplicate lines


Report Offensive Follow Up For Removal

Response Number 2
Name: kgbolger
Date: August 1, 2005 at 14:44:40 Pacific
Reply: (edit)

hey thanks for reply,

I was going to use this solution by pasting contents of logs into excel and going through the steps as mentioned above.

Problem here is excel will only allow about 65,500 lines.

desperately need help getting started

thanks

Kev


Report Offensive Follow Up For Removal

Response Number 3
Name: wizard-fred
Date: August 1, 2005 at 18:24:44 Pacific
Reply: (edit)

There is another method using a database. Import the text with an index field set as unique. Only one of each item will be imported. This again assumes that the lines to be eliminated are true duplicates.

I don't run XP, only W98 and DOS. I would use DOS to sort and a BASIC program to eliminate the duplicates. In a similar problem it took 2min 38sec to process a file of 1,000,000 records of 1,000 different numbers and count and rank their occurrences in a W98 DOS Prompt window on a 266 MHz laptop.


Report Offensive Follow Up For Removal

Response Number 4
Name: kgbolger
Date: August 2, 2005 at 00:24:08 Pacific
Reply: (edit)

Hey Fred thanks again,

how would I go about using DOS or a BASIC program to solve this problem?

is perl completely the wrong way to go?

extreme novice at all this, any help would be brilliant

thanks

Kev


Report Offensive Follow Up For Removal

Response Number 5
Name: wizard-fred
Date: August 2, 2005 at 00:54:25 Pacific
Reply: (edit)

I don't know whether Perl is right or wrong. It just that I have nearly 30 years of DOS or DOS-like program experience and more than 30 years of BASIC experience.

This is what I would do.
1 - sort the text file using a DOS or freeware utility on the reguired position of the string.
2 - use a BASIC program to read through the file and save only one occurence of each record.

The following example is the BASIC program I used to solve the problem in Response 3. Your problem is similar except the fields should be separated before the first sort. And since you only need one of each, the count does not have to be saved, and the last sort is omitted.


rem ranklist.bas
rem input = TEXT.LST
rem output = COUNT.SRT

rem original written in PowerBasic v3.1 for DOS
rem should be compatible with QBASIC
rem tested under W98SE

t1$ = time$

rem sort list
shell "SORT TEXT.LST > TEXT.SRT"

rem process list
open "TEXT.SRT" for input as #1
open "COUNT.LST" for output as #2
cnt = 0
lcnt = 0
ncnt = 0
lastnum = -999

nextline:
if eof(1) then
goto endlist
end if
line input #1, xnum$
znum = val(xnum$)
cnt = cnt + 1
lcnt = lcnt + 1
if (znum <> lastnum) then
if (lastnum = -999) then
lastnum = znum
else
gosub makestr
end if
end if
lastnum = znum
goto nextline

endlist:
close #1
gosub makestr
close #2
t2$ = time$

rem sort by most occurrences
shell "SORT /R /+8 COUNT.LST > COUNT.SRT"
print lcnt; "numbers distributed among"; ncnt; " numbers"
print "Results in file COUNT.SRT"
print t2$, t1$
end

makestr:
rem left space filled to right align values
ynum$ = right$(" " + str$(lastnum), 7)
ycnt$ = right$(" " + str$(lcnt), 7)
print #2, ynum$; ycnt$
cnt = 1
ncnt = ncnt + 1
return

Note: Some lines were wrapped by the editor.


Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Perl script

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 5 Days.
Discuss in The Lounge