|
| Computing.Net: Over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to sign up now, it's free! |
List traverse, batch (NT)
|
Original Message
|
Name: Shr0Om
Date: June 22, 2005 at 01:14:42 Pacific
Subject: List traverse, batch (NT)OS: Win Xp Pro SP2CPU/Ram: amd 64 |
Comment: Hi.. I hope any batch gurus might help me with this.. What i want to do: I have a list.txt with about 100-4000 numbers. In some cases there is redundancy, so 1 or more number might occur several times in the list. Im trying to write a batch that prints out what numbers occurs several times. Till now i started with giving a line number for every number on the list. (findstr /n /g:list.txt list.txt >>listWnumbers.txt) My plan was to write a FOR loop to do this. I use SET /P var =>list.txt so i get the first number value. Then i remove the 1'st line from the list.txt file, and then i wanted to search the rest of the list, and echo a %var% exist several times in list>>rapport.txt. Then i hoped it would do this until it has traversed the entire list, but that didnt quite work out. I need to find a way to do this, and preferably somewhat efficient (if it takes max 2-3min for 4000lines, its acceptable). Hope what im asking for is possible.
Report Offensive Message For Removal
|
|
Response Number 1
|
Name: Mechanix2Go
Date: June 22, 2005 at 03:33:01 Pacific
|
Reply: (edit)I did something similar several years ago to get rid of duplicate lines. First I sorted: sort < list.txt > list.sor Then the BAT looked at the first line, set a var; looked at the second line, set a var and compared the two. I my case, if they were the same, one got dumped. In your case, you'd want to, probably: echo %var1%>> report.log HTH M2 If at first you don't succeed, you're about average.
Report Offensive Follow Up For Removal
|
|
Response Number 2
|
Name: Shr0Om
Date: June 22, 2005 at 05:27:57 Pacific
|
Reply: (edit)Ok, you pointed me in the right direction:) I didnt thought about Sort ;) Still, there is a problem. When i try to delete line 1 it also delete line 11, and so on.. So, it works until a certain point. Wonder if you or someone can give me a tip here. Also, i know this script if higly inefficient.. Especially if i got a list with 4000 lines.. So, if someone has a better idea, please speak out:) Here's my script so far: ------------------------ @echo off sort < Test.csv > Sorted.csv set count=1 ::Make a new list with line numbers infront findstr /n /g:Sorted.csv Sorted.csv >> SortedWnr.csv
:start ::retrive number of line x set /p var1=<SortedWnr.csv set var1=%var1:~2,8%
::Remove line on top of list findstr /v /l "%count%:" SortedWnr.csv >>tmp.txt del SortedWnr.csv ren tmp.txt SortedWnr.csv set /a count=count+1
::retrive number of line x set /p var2=<SortedWnr.csv set var2=%var2:~2,8%
::Compare line x&y. If equal echo the number to file if %var1% equ %var2% echo %var1% >>report.txt ::Dont know how the prog should figure out how many lines there is yet ::I guess if i can sort the list so its backwards, then retrive ::the last line number, and set it here below if %count% equ 100 goto exit goto start
:exit exit
Report Offensive Follow Up For Removal
|
|
Response Number 3
|
Name: Mechanix2Go
Date: June 22, 2005 at 05:51:27 Pacific
|
Reply: (edit)Hi, This is an interesting and perhaps useful exercise, but if you just want to get rid of duplicate lines, get this nifty filter: UNIQUE Then: sort < test1 | unique > trst1.unq *** Since it's a csv you could use excel. But that seems like a hard way to do an easy thing. *** If you want to make a REPORT of duplicate lines, as originally stated, I guess it's back to the BAT. HTH M2 If at first you don't succeed, you're about average.
Report Offensive Follow Up For Removal
|
|
Response Number 4
|
Name: Shr0Om
Date: June 23, 2005 at 00:19:04 Pacific
|
Reply: (edit)Hi, i tested the unique filter, with this command: sort < test1 | unique > trst1.unq but that didnt work. It gives an error that the process cannot complete coz the file is already in use, which it aint.. Is it something wrong with the command? Besides that, its fine with removing duplicates instead of just rapporting them. And, yep.. It possible to remove duplicates in exel by making a pivot and s---, but thats as you said a hard was to do a easy thing:) I later also plan on expanding the BAT so it can remove 1000 , separators and some other stuff. This because our oracle Db is VERY picky, and its lotsa work to do if we recive a file that dont follows the standard. Hope you could give me another tip, i would apreciate it alot;)
Report Offensive Follow Up For Removal
|
|
Response Number 5
|
Name: Mechanix2Go
Date: June 23, 2005 at 00:26:38 Pacific
|
Reply: (edit)Hi, I have no idea why you're getting the msg about the file in use. Better check ALL other tasks. Lemme know. M2 If at first you don't succeed, you're about average.
Report Offensive Follow Up For Removal
|
|
Response Number 6
|
Name: Shr0Om
Date: June 23, 2005 at 00:45:08 Pacific
|
Reply: (edit)I tried creating a new list, same result. File aint open anywhere, but i get the error that its in use. The output file contained this: C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt for some reason...
Report Offensive Follow Up For Removal
|
|
Response Number 8
|
Name: wizard-fred
Date: June 23, 2005 at 01:05:31 Pacific
|
Reply: (edit)I think trying to do it all in a batch is a bit difficult without using a scripting language or a program: Given: List of numbers, one on a line Find: The occurences of the numbers. 1. Sort the list of numbers 2. Count the occurences of the numbers in a second list. 3. Option - Sort the second list by number of occurences. My preference would be to use a small BASIC program calling the SORT utility. An example will follow in a subsequent post.
Report Offensive Follow Up For Removal
|
|
Response Number 9
|
Name: Shr0Om
Date: June 23, 2005 at 01:12:51 Pacific
|
Reply: (edit)Here's a copy of the error msg. "The process cannot access the file because it is being used by another process." M2: The file trst1.txt contained this: C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt Wizard-fred: Ye, i know it would be better with something else than batch, but i just want to try batch, as i try to avoid the hassle of learning new languages atm.. Still, im interested in having a look at your idea.
Report Offensive Follow Up For Removal
|
|
Response Number 10
|
Name: Mechanix2Go
Date: June 23, 2005 at 01:24:05 Pacific
|
Reply: (edit)"The file trst1.txt contained this: C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt" I have no idea how you got this mess. Can you say what directory you are in and what you typed? M2 If at first you don't succeed, you're about average.
Report Offensive Follow Up For Removal
|
|
Response Number 11
|
Name: Shr0Om
Date: June 23, 2005 at 02:47:35 Pacific
|
Reply: (edit)I just pasted "sort < test1 | unique > trst1.unq" into notepad&saved it as a cmd. I changed the filenames of course. I also tried to write this manually in CMD prompt. Get the same error whatever. All the files are located on my desktop. None of the files are open/in use, so i dont understand why it gives the error.. Im gonna try this at my computer at home later today. Hmmf.. Maybe i should consider brushing up my crappy java skills.. Atleast list sort worked great in java;)
Report Offensive Follow Up For Removal
|
|
Response Number 12
|
Name: Mechanix2Go
Date: June 23, 2005 at 03:03:38 Pacific
|
Reply: (edit)I would go to a CMD prompt, make a new directory, change to that directory, put the file to be worked on therem, put unique.com there and type: sort < infile | unique > outfile. *** And when you get to the point of creating a BATCH, you save it as FILENAME.BAT M2 If at first you don't succeed, you're about average.
Report Offensive Follow Up For Removal
|
|
Response Number 13
|
Name: Shr0Om
Date: June 23, 2005 at 03:34:00 Pacific
|
Reply: (edit)Waddaya know.. It worked now. Weird indeed. Whats wrong with my desktop? hehe.. Anyways, it didnt quite removed duplicates. I assume this is becuase it look at the whole line, not the first number before the semi colon. (101577;STK;456,21;0;0) n the batch i used parsing to get this number. Guess it cant be dont with the unique filter. Well, while doing this, i realised i cant have it remove the duplicates. This is because, this is price lists, and if there are duplicates, i need to check up what price is the correct one! So, i guess its back to the batch:/
Report Offensive Follow Up For Removal
|
|
Response Number 14
|
Name: Mechanix2Go
Date: June 23, 2005 at 03:40:41 Pacific
|
Reply: (edit)If I remember right, without reading back through all this, you were adding line numbers. Yes, it DOES look at the whole line. How else would it know if there were duplicates? M2 If at first you don't succeed, you're about average.
Report Offensive Follow Up For Removal
|
|
Response Number 15
|
Name: Shr0Om
Date: June 23, 2005 at 04:49:57 Pacific
|
Reply: (edit)This was a list without line numbers. But ye, i just forgot i only gotta find duplicates on the first, eh.. Row? Before the semicolon, thus the parsing. Sorry;)
Report Offensive Follow Up For Removal
|
|
Response Number 16
|
Name: wizard-fred
Date: June 23, 2005 at 11:37:40 Pacific
|
Reply: (edit)If the lines consist of more than line numbers you will need a initial pass to extract the number field, or use a sort routine that can sort by fields. Sample Program Results is for a list of 100,000 numbers in 100 groups. Run from DOS prompt in Win98SE.
REM --- Begin ----- REM ranklist.bas REM input = TEXT.LST REM output = COUNT.SRT REM 15 sec 233mhz pii qbasic REM 4 sec powerbasic REM original written in PowerBasic v3.1 for DOS REM tested with QBASIC under W98SE t1$ = TIME$ REM sort list SHELL "SORT TEXT.LST > TEXT.SRT" REM process list OPEN "TEXT.SRT" FOR INPUT AS #1 OPEN "COUNT.LST" FOR OUTPUT AS #2 cnt = 0 lcnt = 0 ncnt = 0 lastnum = -999 nextline: IF EOF(1) THEN GOTO endlist END IF LINE INPUT #1, xnum$ znum = VAL(xnum$) cnt = cnt + 1 lcnt = lcnt + 1 IF (znum <> lastnum) THEN IF (lastnum = -999) THEN lastnum = znum ELSE GOSUB makestr END IF END IF lastnum = znum GOTO nextline endlist: CLOSE #1 GOSUB makestr CLOSE #2 t2$ = TIME$ REM sort by most occurrences SHELL "SORT /R /+8 COUNT.LST > COUNT.SRT" PRINT lcnt; "numbers distributed among"; ncnt; " numbers" PRINT "Results in file COUNT.SRT" PRINT t2$, t1$ END makestr: REM left space filled to right align values ynum$ = RIGHT$(" " + STR$(lastnum), 7) ycnt$ = RIGHT$(" " + STR$(cnt), 7) PRINT #2, ynum$; ycnt$ cnt = 1 ncnt = ncnt + 1 RETURN REM --- End -----
Report Offensive Follow Up For Removal
|
|
Response Number 17
|
Name: Shr0Om
Date: June 24, 2005 at 01:11:26 Pacific
|
Reply: (edit)Hi wizard-fred, dont you need qbasic installed to make this run or something (dont remember if qbasic was standard in win98 SE..) Still, problem is that all our systems run Win Xp. Any way of making this script run on Xp sys?
Report Offensive Follow Up For Removal
|
|
Response Number 19
|
Name: Shr0Om
Date: June 24, 2005 at 02:45:51 Pacific
|
Reply: (edit)Hey again.. I hopefully found a much more efficient way or finding duplicates in the list.. No need for parsing anymore, which wouldnt work anyways, since some numbers in the list are longer than the others.. But since im working with a csv file, its contains ";" separated collumns, so a FOR loop with eol=; delims=1,1 extracted the first collumn, which is the interesting one. Its only in this collumn im gonna look for duplicates. Now, im not used to work with FOR loops, so i dont know if what i want to do is possible. I'll paste the code here, and write a comment where the problem lies. --- @echo off set count=0 if exist tmp.txt del tmp.txt ::Get interesting data only echo Collecting data from "Product number" echo. FOR /F "eol=; tokens=1,1 delims=;" %%i in (List.txt) do @echo %%i %%j >>tmp.txt ::Sort list echo Sorting list echo. sort <tmp.txt> sorted.txt del tmp.txt ::Find duplicates echo Looking for duplicates echo. FOR %%i in (sorted.txt) do call :FindDuplicatesX :FindDuplicatesX if count equ 1 goto FindDuplicatesY set /p varX=<sorted.txt set count=1 ****************** OK, here is m problem. I want to go back to the FOR body here, and NOT continue with FindDuplicatesY until the FOR loop is onto the next line... is it possible to call the FOR body again and then make it go to FindDuplicatesY? ****************** :FindDuplicatesY set /p varY=<sorted.txt set count=0 if %varX% equ %varY% echo %varX% >>report.txt I dont know if i want is possible, but i sure hope so, as this code showed to be very efficient!
Report Offensive Follow Up For Removal
|
|
Response Number 20
|
Name: Shr0Om
Date: June 24, 2005 at 04:47:11 Pacific
|
Reply: (edit)Ok, i almost found the solution (i think). I make 2 identical sorted list's. I delete 1row in the 2'nd list, and i use a nested(?) FOR loop. Like this: @echo off if exist tmp.txt del tmp.txt ::Get interesting data only echo Collecting data from "Product number" echo. FOR /F "eol=; tokens=1,1 delims=;" %%i in (List.txt) do @echo %%i %%j >>tmp.txt echo Data collected echo. ::Sort list echo Sorting list echo. sort <tmp.txt> sorted.txt del tmp.txt copy sorted.txt sorted2.txt echo Please remove 1'st row in Sorted2.txt ::Just a temoporary solution pause ::Find duplicates echo Looking for duplicates echo. FOR /F %%i in (sorted.txt) do call :FindDuplicatesX :FindDuplicatesX set /p varX=<sorted.txt call sort2.cmd (Sort2.cmd only contains: :FindDuplicatesY set /p varY=<sorted2.txt) if %varX% equ %varY% echo %varX% >>report.txt Still, something aint quite working with the FOR loops, so if anyone could be so kind to have a quick look at the code? :)
Report Offensive Follow Up For Removal
|
|
Response Number 21
|
Name: wizard-fred
Date: June 24, 2005 at 06:47:22 Pacific
|
Reply: (edit)QBASIC is in the \tools\oldmsdos directory of the WIN98SE CD. I'don't remember if QBASIC works in XP, but my PowerBasic (1995 for 16-bit DOS does, with the exception of long file names). Besides QB there are quit a few new versions of BASIC, both console and Windows versions, that are free. If you are using a list with items of various lengths on the same line, you can either do a quick rewrite with the elements in constant width fields, or use a sort that will do delimited lists.
Report Offensive Follow Up For Removal
|

Post Locked
This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
Go to Programming Forum Home
|
|
|