Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi..
I hope any batch gurus might help me with this..What i want to do:
I have a list.txt with about 100-4000 numbers.
In some cases there is redundancy, so 1 or more number might occur several times in the list. Im trying to write a batch that prints out what numbers occurs several times.Till now i started with giving a line number for every number on the list.
(findstr /n /g:list.txt list.txt >>listWnumbers.txt)My plan was to write a FOR loop to do this.
I use SET /P var =>list.txt so i get the first number value. Then i remove the 1'st line from the list.txt file, and then i wanted to search the rest of the list, and echo a %var% exist several times in list>>rapport.txt.Then i hoped it would do this until it has traversed the entire list, but that didnt quite work out.
I need to find a way to do this, and preferably somewhat efficient (if it takes max 2-3min for 4000lines, its acceptable).
Hope what im asking for is possible.

I did something similar several years ago to get rid of duplicate lines.
First I sorted:
sort < list.txt > list.sor
Then the BAT looked at the first line, set a var; looked at the second line, set a var and compared the two.
I my case, if they were the same, one got dumped.
In your case, you'd want to, probably:
echo %var1%>> report.log
HTH
M2
If at first you don't succeed, you're about average.

Ok, you pointed me in the right direction:)
I didnt thought about Sort ;)Still, there is a problem.
When i try to delete line 1 it also delete line 11, and so on.. So, it works until a certain point. Wonder if you or someone can give me a tip here. Also, i know this script if higly inefficient.. Especially if i got a list with 4000 lines.. So, if someone has a better idea, please speak out:)Here's my script so far:
------------------------
@echo off
sort < Test.csv > Sorted.csv
set count=1
::Make a new list with line numbers infront
findstr /n /g:Sorted.csv Sorted.csv >> SortedWnr.csv
:start
::retrive number of line x
set /p var1=<SortedWnr.csv
set var1=%var1:~2,8%::Remove line on top of list
findstr /v /l "%count%:" SortedWnr.csv >>tmp.txt
del SortedWnr.csv
ren tmp.txt SortedWnr.csv
set /a count=count+1
::retrive number of line x
set /p var2=<SortedWnr.csv
set var2=%var2:~2,8%::Compare line x&y. If equal echo the number to file
if %var1% equ %var2% echo %var1% >>report.txt
::Dont know how the prog should figure out how many lines there is yet
::I guess if i can sort the list so its backwards, then retrive
::the last line number, and set it here below
if %count% equ 100 goto exit
goto start
:exit
exit

Hi,
This is an interesting and perhaps useful exercise, but if you just want to get rid of duplicate lines, get this nifty filter:
Then:
sort < test1 | unique > trst1.unq
***
Since it's a csv you could use excel. But that seems like a hard way to do an easy thing.***
If you want to make a REPORT of duplicate lines, as originally stated, I guess it's back to the BAT.HTH
M2
If at first you don't succeed, you're about average.

Hi,
i tested the unique filter, with this command:
sort < test1 | unique > trst1.unq
but that didnt work. It gives an error that the process cannot complete coz the file is already in use, which it aint..
Is it something wrong with the command?Besides that, its fine with removing duplicates instead of just rapporting them.
And, yep.. It possible to remove duplicates in exel by making a pivot and s---, but thats as you said a hard was to do a easy thing:)
I later also plan on expanding the BAT so it can remove 1000 , separators and some other stuff. This because our oracle Db is VERY picky, and its lotsa work to do if we recive a file that dont follows the standard.
Hope you could give me another tip, i would apreciate it alot;)

Hi,
I have no idea why you're getting the msg about the file in use.
Better check ALL other tasks.
Lemme know.
M2
If at first you don't succeed, you're about average.

I tried creating a new list, same result. File aint open anywhere, but i get the error that its in use. The output file contained this:
C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt for some reason...

I think trying to do it all in a batch is a bit difficult without using a scripting language or a program:
Given: List of numbers, one on a line
Find: The occurences of the numbers.1. Sort the list of numbers
2. Count the occurences of the numbers in a second list.
3. Option - Sort the second list by number of occurences.My preference would be to use a small BASIC program calling the SORT utility. An example will follow in a subsequent post.

Here's a copy of the error msg.
"The process cannot access the file because it is being used by another process."
M2:
The file trst1.txt contained this:
C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txtWizard-fred: Ye, i know it would be better with something else than batch, but i just want to try batch, as i try to avoid the hassle of learning new languages atm..
Still, im interested in having a look at your idea.

"The file trst1.txt contained this:
C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt"I have no idea how you got this mess.
Can you say what directory you are in and what you typed?
M2
If at first you don't succeed, you're about average.

I just pasted "sort < test1 | unique > trst1.unq" into notepad&saved it as a cmd.
I changed the filenames of course. I also tried to write this manually in CMD prompt.
Get the same error whatever.All the files are located on my desktop.
None of the files are open/in use, so i dont understand why it gives the error..
Im gonna try this at my computer at home later today.Hmmf.. Maybe i should consider brushing up my crappy java skills.. Atleast list sort worked great in java;)

I would go to a CMD prompt, make a new directory, change to that directory, put the file to be worked on therem, put unique.com there and type:
sort < infile | unique > outfile.
***
And when you get to the point of creating a BATCH, you save it as FILENAME.BATM2
If at first you don't succeed, you're about average.

Waddaya know.. It worked now. Weird indeed. Whats wrong with my desktop? hehe..
Anyways, it didnt quite removed duplicates.
I assume this is becuase it look at the whole line, not the first number before the semi colon.
(101577;STK;456,21;0;0)
n the batch i used parsing to get this number. Guess it cant be dont with the unique filter.Well, while doing this, i realised i cant have it remove the duplicates. This is because, this is price lists, and if there are duplicates, i need to check up what price is the correct one!
So, i guess its back to the batch:/

If I remember right, without reading back through all this, you were adding line numbers.
Yes, it DOES look at the whole line.
How else would it know if there were duplicates?
M2
If at first you don't succeed, you're about average.

This was a list without line numbers.
But ye, i just forgot i only gotta find duplicates on the first, eh.. Row? Before the semicolon, thus the parsing. Sorry;)

If the lines consist of more than line numbers you will need a initial pass to extract the number field, or use a sort routine that can sort by fields.
Sample Program
Results is for a list of 100,000 numbers in 100 groups.
Run from DOS prompt in Win98SE.REM --- Begin -----
REM ranklist.bas
REM input = TEXT.LST
REM output = COUNT.SRTREM 15 sec 233mhz pii qbasic
REM 4 sec powerbasicREM original written in PowerBasic v3.1 for DOS
REM tested with QBASIC under W98SEt1$ = TIME$
REM sort list
SHELL "SORT TEXT.LST > TEXT.SRT"REM process list
OPEN "TEXT.SRT" FOR INPUT AS #1
OPEN "COUNT.LST" FOR OUTPUT AS #2
cnt = 0
lcnt = 0
ncnt = 0
lastnum = -999nextline:
IF EOF(1) THEN
GOTO endlist
END IF
LINE INPUT #1, xnum$
znum = VAL(xnum$)
cnt = cnt + 1
lcnt = lcnt + 1
IF (znum <> lastnum) THEN
IF (lastnum = -999) THEN
lastnum = znum
ELSE
GOSUB makestr
END IF
END IF
lastnum = znum
GOTO nextlineendlist:
CLOSE #1
GOSUB makestr
CLOSE #2
t2$ = TIME$REM sort by most occurrences
SHELL "SORT /R /+8 COUNT.LST > COUNT.SRT"
PRINT lcnt; "numbers distributed among"; ncnt; " numbers"
PRINT "Results in file COUNT.SRT"
PRINT t2$, t1$
ENDmakestr:
REM left space filled to right align values
ynum$ = RIGHT$(" " + STR$(lastnum), 7)
ycnt$ = RIGHT$(" " + STR$(cnt), 7)
PRINT #2, ynum$; ycnt$
cnt = 1
ncnt = ncnt + 1
RETURN
REM --- End -----

Hi wizard-fred,
dont you need qbasic installed to make this run or something (dont remember if qbasic was standard in win98 SE..)
Still, problem is that all our systems run Win Xp. Any way of making this script run on Xp sys?

You need QB.
You can compile it one of two ways; the 'standalone' or the version wich requires the DLL [?]
M2
If at first you don't succeed, you're about average.

Hey again..
I hopefully found a much more efficient way or finding duplicates in the list.. No need for parsing anymore, which wouldnt work anyways, since some numbers in the list are longer than the others.. But since im working with a csv file, its contains ";" separated collumns, so a FOR loop with eol=; delims=1,1 extracted the first collumn, which is the interesting one.
Its only in this collumn im gonna look for duplicates.Now, im not used to work with FOR loops, so i dont know if what i want to do is possible.
I'll paste the code here, and write a comment where the problem lies.
---
@echo off
set count=0
if exist tmp.txt del tmp.txt::Get interesting data only
echo Collecting data from "Product number"
echo.
FOR /F "eol=; tokens=1,1 delims=;" %%i in (List.txt) do @echo %%i %%j >>tmp.txt::Sort list
echo Sorting list
echo.
sort <tmp.txt> sorted.txt
del tmp.txt::Find duplicates
echo Looking for duplicates
echo.
FOR %%i in (sorted.txt) do call :FindDuplicatesX:FindDuplicatesX
if count equ 1 goto FindDuplicatesY
set /p varX=<sorted.txt
set count=1******************
OK, here is m problem. I want to go back to the FOR body here, and NOT continue with FindDuplicatesY until the FOR loop is onto the next line... is it possible to call the FOR body again and then make it go to FindDuplicatesY?
******************:FindDuplicatesY
set /p varY=<sorted.txt
set count=0if %varX% equ %varY% echo %varX% >>report.txt
I dont know if i want is possible, but i sure hope so, as this code showed to be very efficient!

Ok, i almost found the solution (i think).
I make 2 identical sorted list's.
I delete 1row in the 2'nd list, and i use a nested(?) FOR loop.Like this:
@echo off
if exist tmp.txt del tmp.txt::Get interesting data only
echo Collecting data from "Product number"
echo.
FOR /F "eol=; tokens=1,1 delims=;" %%i in (List.txt) do @echo %%i %%j >>tmp.txt
echo Data collected
echo.::Sort list
echo Sorting list
echo.
sort <tmp.txt> sorted.txt
del tmp.txt
copy sorted.txt sorted2.txt
echo Please remove 1'st row in Sorted2.txt
::Just a temoporary solution
pause::Find duplicates
echo Looking for duplicates
echo.
FOR /F %%i in (sorted.txt) do call :FindDuplicatesX:FindDuplicatesX
set /p varX=<sorted.txtcall sort2.cmd
(Sort2.cmd only contains:
:FindDuplicatesY
set /p varY=<sorted2.txt)if %varX% equ %varY% echo %varX% >>report.txt
Still, something aint quite working with the FOR loops, so if anyone could be so kind to have a quick look at the code? :)

QBASIC is in the \tools\oldmsdos directory of the WIN98SE CD.
I'don't remember if QBASIC works in XP, but my PowerBasic (1995 for 16-bit DOS does, with the exception of long file names). Besides QB there are quit a few new versions of BASIC, both console and Windows versions, that are free.
If you are using a list with items of various lengths on the same line, you can either do a quick rewrite with the elements in constant width fields, or use a sort that will do delimited lists.

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |