Computing.Net > Forums > Programming > List traverse, batch (NT)

Computing.Net: Over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to sign up now, it's free!

List traverse, batch (NT)

Reply to Message Icon

Original Message
Name: Shr0Om
Date: June 22, 2005 at 01:14:42 Pacific
Subject: List traverse, batch (NT)
OS: Win Xp Pro SP2
CPU/Ram: amd 64
Comment:

Hi..
I hope any batch gurus might help me with this..

What i want to do:

I have a list.txt with about 100-4000 numbers.
In some cases there is redundancy, so 1 or more number might occur several times in the list. Im trying to write a batch that prints out what numbers occurs several times.

Till now i started with giving a line number for every number on the list.
(findstr /n /g:list.txt list.txt >>listWnumbers.txt)

My plan was to write a FOR loop to do this.
I use SET /P var =>list.txt so i get the first number value. Then i remove the 1'st line from the list.txt file, and then i wanted to search the rest of the list, and echo a %var% exist several times in list>>rapport.txt.

Then i hoped it would do this until it has traversed the entire list, but that didnt quite work out.

I need to find a way to do this, and preferably somewhat efficient (if it takes max 2-3min for 4000lines, its acceptable).

Hope what im asking for is possible.


Report Offensive Message For Removal


Response Number 1
Name: Mechanix2Go
Date: June 22, 2005 at 03:33:01 Pacific
Reply: (edit)

I did something similar several years ago to get rid of duplicate lines.

First I sorted:

sort < list.txt > list.sor

Then the BAT looked at the first line, set a var; looked at the second line, set a var and compared the two.

I my case, if they were the same, one got dumped.

In your case, you'd want to, probably:

echo %var1%>> report.log

HTH

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal

Response Number 2
Name: Shr0Om
Date: June 22, 2005 at 05:27:57 Pacific
Reply: (edit)

Ok, you pointed me in the right direction:)
I didnt thought about Sort ;)

Still, there is a problem.
When i try to delete line 1 it also delete line 11, and so on.. So, it works until a certain point. Wonder if you or someone can give me a tip here. Also, i know this script if higly inefficient.. Especially if i got a list with 4000 lines.. So, if someone has a better idea, please speak out:)

Here's my script so far:
------------------------
@echo off
sort < Test.csv > Sorted.csv
set count=1


::Make a new list with line numbers infront
findstr /n /g:Sorted.csv Sorted.csv >> SortedWnr.csv


:start
::retrive number of line x
set /p var1=<SortedWnr.csv
set var1=%var1:~2,8%

::Remove line on top of list
findstr /v /l "%count%:" SortedWnr.csv >>tmp.txt
del SortedWnr.csv
ren tmp.txt SortedWnr.csv


set /a count=count+1


::retrive number of line x
set /p var2=<SortedWnr.csv
set var2=%var2:~2,8%

::Compare line x&y. If equal echo the number to file
if %var1% equ %var2% echo %var1% >>report.txt


::Dont know how the prog should figure out how many lines there is yet
::I guess if i can sort the list so its backwards, then retrive
::the last line number, and set it here below
if %count% equ 100 goto exit
goto start


:exit
exit


Report Offensive Follow Up For Removal

Response Number 3
Name: Mechanix2Go
Date: June 22, 2005 at 05:51:27 Pacific
Reply: (edit)

Hi,

This is an interesting and perhaps useful exercise, but if you just want to get rid of duplicate lines, get this nifty filter:

UNIQUE

Then:

sort < test1 | unique > trst1.unq

***
Since it's a csv you could use excel. But that seems like a hard way to do an easy thing.

***
If you want to make a REPORT of duplicate lines, as originally stated, I guess it's back to the BAT.

HTH

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal

Response Number 4
Name: Shr0Om
Date: June 23, 2005 at 00:19:04 Pacific
Reply: (edit)

Hi,
i tested the unique filter, with this command:
sort < test1 | unique > trst1.unq
but that didnt work. It gives an error that the process cannot complete coz the file is already in use, which it aint..
Is it something wrong with the command?

Besides that, its fine with removing duplicates instead of just rapporting them.

And, yep.. It possible to remove duplicates in exel by making a pivot and s---, but thats as you said a hard was to do a easy thing:)

I later also plan on expanding the BAT so it can remove 1000 , separators and some other stuff. This because our oracle Db is VERY picky, and its lotsa work to do if we recive a file that dont follows the standard.

Hope you could give me another tip, i would apreciate it alot;)


Report Offensive Follow Up For Removal

Response Number 5
Name: Mechanix2Go
Date: June 23, 2005 at 00:26:38 Pacific
Reply: (edit)

Hi,

I have no idea why you're getting the msg about the file in use.

Better check ALL other tasks.

Lemme know.

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal


Response Number 6
Name: Shr0Om
Date: June 23, 2005 at 00:45:08 Pacific
Reply: (edit)

I tried creating a new list, same result. File aint open anywhere, but i get the error that its in use. The output file contained this:
C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt for some reason...



Report Offensive Follow Up For Removal

Response Number 7
Name: Mechanix2Go
Date: June 23, 2005 at 01:01:27 Pacific
Reply: (edit)

WHICH file contained that?

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal

Response Number 8
Name: wizard-fred
Date: June 23, 2005 at 01:05:31 Pacific
Reply: (edit)

I think trying to do it all in a batch is a bit difficult without using a scripting language or a program:

Given: List of numbers, one on a line
Find: The occurences of the numbers.

1. Sort the list of numbers
2. Count the occurences of the numbers in a second list.
3. Option - Sort the second list by number of occurences.

My preference would be to use a small BASIC program calling the SORT utility. An example will follow in a subsequent post.


Report Offensive Follow Up For Removal

Response Number 9
Name: Shr0Om
Date: June 23, 2005 at 01:12:51 Pacific
Reply: (edit)

Here's a copy of the error msg.

"The process cannot access the file because it is being used by another process."

M2:
The file trst1.txt contained this:
C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt

Wizard-fred: Ye, i know it would be better with something else than batch, but i just want to try batch, as i try to avoid the hassle of learning new languages atm..
Still, im interested in having a look at your idea.


Report Offensive Follow Up For Removal

Response Number 10
Name: Mechanix2Go
Date: June 23, 2005 at 01:24:05 Pacific
Reply: (edit)

"The file trst1.txt contained this:
C:\Documents and Settings\aandersen\Desktop>sort 0<test1.txt | unique 1>trst1.txt"

I have no idea how you got this mess.

Can you say what directory you are in and what you typed?

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal

Response Number 11
Name: Shr0Om
Date: June 23, 2005 at 02:47:35 Pacific
Reply: (edit)

I just pasted "sort < test1 | unique > trst1.unq" into notepad&saved it as a cmd.
I changed the filenames of course. I also tried to write this manually in CMD prompt.
Get the same error whatever.

All the files are located on my desktop.
None of the files are open/in use, so i dont understand why it gives the error..
Im gonna try this at my computer at home later today.

Hmmf.. Maybe i should consider brushing up my crappy java skills.. Atleast list sort worked great in java;)


Report Offensive Follow Up For Removal

Response Number 12
Name: Mechanix2Go
Date: June 23, 2005 at 03:03:38 Pacific
Reply: (edit)

I would go to a CMD prompt, make a new directory, change to that directory, put the file to be worked on therem, put unique.com there and type:

sort < infile | unique > outfile.

***
And when you get to the point of creating a BATCH, you save it as FILENAME.BAT

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal

Response Number 13
Name: Shr0Om
Date: June 23, 2005 at 03:34:00 Pacific
Reply: (edit)

Waddaya know.. It worked now. Weird indeed. Whats wrong with my desktop? hehe..

Anyways, it didnt quite removed duplicates.
I assume this is becuase it look at the whole line, not the first number before the semi colon.
(101577;STK;456,21;0;0)
n the batch i used parsing to get this number. Guess it cant be dont with the unique filter.

Well, while doing this, i realised i cant have it remove the duplicates. This is because, this is price lists, and if there are duplicates, i need to check up what price is the correct one!
So, i guess its back to the batch:/


Report Offensive Follow Up For Removal

Response Number 14
Name: Mechanix2Go
Date: June 23, 2005 at 03:40:41 Pacific
Reply: (edit)

If I remember right, without reading back through all this, you were adding line numbers.

Yes, it DOES look at the whole line.

How else would it know if there were duplicates?

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal

Response Number 15
Name: Shr0Om
Date: June 23, 2005 at 04:49:57 Pacific
Reply: (edit)

This was a list without line numbers.
But ye, i just forgot i only gotta find duplicates on the first, eh.. Row? Before the semicolon, thus the parsing. Sorry;)


Report Offensive Follow Up For Removal

Response Number 16
Name: wizard-fred
Date: June 23, 2005 at 11:37:40 Pacific
Reply: (edit)

If the lines consist of more than line numbers you will need a initial pass to extract the number field, or use a sort routine that can sort by fields.


Sample Program
Results is for a list of 100,000 numbers in 100 groups.
Run from DOS prompt in Win98SE.

REM --- Begin -----
REM ranklist.bas
REM input = TEXT.LST
REM output = COUNT.SRT

REM 15 sec 233mhz pii qbasic
REM 4 sec powerbasic

REM original written in PowerBasic v3.1 for DOS
REM tested with QBASIC under W98SE

t1$ = TIME$

REM sort list
SHELL "SORT TEXT.LST > TEXT.SRT"

REM process list
OPEN "TEXT.SRT" FOR INPUT AS #1
OPEN "COUNT.LST" FOR OUTPUT AS #2
cnt = 0
lcnt = 0
ncnt = 0
lastnum = -999

nextline:
IF EOF(1) THEN
GOTO endlist
END IF
LINE INPUT #1, xnum$
znum = VAL(xnum$)
cnt = cnt + 1
lcnt = lcnt + 1
IF (znum <> lastnum) THEN
IF (lastnum = -999) THEN
lastnum = znum
ELSE
GOSUB makestr
END IF
END IF
lastnum = znum
GOTO nextline

endlist:
CLOSE #1
GOSUB makestr
CLOSE #2
t2$ = TIME$

REM sort by most occurrences
SHELL "SORT /R /+8 COUNT.LST > COUNT.SRT"
PRINT lcnt; "numbers distributed among"; ncnt; " numbers"
PRINT "Results in file COUNT.SRT"
PRINT t2$, t1$
END

makestr:
REM left space filled to right align values
ynum$ = RIGHT$(" " + STR$(lastnum), 7)
ycnt$ = RIGHT$(" " + STR$(cnt), 7)
PRINT #2, ynum$; ycnt$
cnt = 1
ncnt = ncnt + 1
RETURN
REM --- End -----


Report Offensive Follow Up For Removal

Response Number 17
Name: Shr0Om
Date: June 24, 2005 at 01:11:26 Pacific
Reply: (edit)

Hi wizard-fred,

dont you need qbasic installed to make this run or something (dont remember if qbasic was standard in win98 SE..)

Still, problem is that all our systems run Win Xp. Any way of making this script run on Xp sys?


Report Offensive Follow Up For Removal

Response Number 18
Name: Mechanix2Go
Date: June 24, 2005 at 02:10:40 Pacific
Reply: (edit)

You need QB.

You can compile it one of two ways; the 'standalone' or the version wich requires the DLL [?]

QBasic

M2


If at first you don't succeed, you're about average.


Report Offensive Follow Up For Removal

Response Number 19
Name: Shr0Om
Date: June 24, 2005 at 02:45:51 Pacific
Reply: (edit)

Hey again..

I hopefully found a much more efficient way or finding duplicates in the list.. No need for parsing anymore, which wouldnt work anyways, since some numbers in the list are longer than the others.. But since im working with a csv file, its contains ";" separated collumns, so a FOR loop with eol=; delims=1,1 extracted the first collumn, which is the interesting one.
Its only in this collumn im gonna look for duplicates.

Now, im not used to work with FOR loops, so i dont know if what i want to do is possible.
I'll paste the code here, and write a comment where the problem lies.
---
@echo off
set count=0
if exist tmp.txt del tmp.txt

::Get interesting data only
echo Collecting data from "Product number"
echo.
FOR /F "eol=; tokens=1,1 delims=;" %%i in (List.txt) do @echo %%i %%j >>tmp.txt

::Sort list
echo Sorting list
echo.
sort <tmp.txt> sorted.txt
del tmp.txt

::Find duplicates
echo Looking for duplicates
echo.
FOR %%i in (sorted.txt) do call :FindDuplicatesX

:FindDuplicatesX
if count equ 1 goto FindDuplicatesY
set /p varX=<sorted.txt
set count=1

******************
OK, here is m problem. I want to go back to the FOR body here, and NOT continue with FindDuplicatesY until the FOR loop is onto the next line... is it possible to call the FOR body again and then make it go to FindDuplicatesY?
******************

:FindDuplicatesY
set /p varY=<sorted.txt
set count=0

if %varX% equ %varY% echo %varX% >>report.txt


I dont know if i want is possible, but i sure hope so, as this code showed to be very efficient!


Report Offensive Follow Up For Removal

Response Number 20
Name: Shr0Om
Date: June 24, 2005 at 04:47:11 Pacific
Reply: (edit)

Ok, i almost found the solution (i think).
I make 2 identical sorted list's.
I delete 1row in the 2'nd list, and i use a nested(?) FOR loop.

Like this:
@echo off
if exist tmp.txt del tmp.txt

::Get interesting data only
echo Collecting data from "Product number"
echo.
FOR /F "eol=; tokens=1,1 delims=;" %%i in (List.txt) do @echo %%i %%j >>tmp.txt
echo Data collected
echo.

::Sort list
echo Sorting list
echo.
sort <tmp.txt> sorted.txt
del tmp.txt
copy sorted.txt sorted2.txt
echo Please remove 1'st row in Sorted2.txt
::Just a temoporary solution
pause

::Find duplicates
echo Looking for duplicates
echo.
FOR /F %%i in (sorted.txt) do call :FindDuplicatesX

:FindDuplicatesX
set /p varX=<sorted.txt

call sort2.cmd

(Sort2.cmd only contains:
:FindDuplicatesY
set /p varY=<sorted2.txt)

if %varX% equ %varY% echo %varX% >>report.txt

Still, something aint quite working with the FOR loops, so if anyone could be so kind to have a quick look at the code? :)



Report Offensive Follow Up For Removal

Response Number 21
Name: wizard-fred
Date: June 24, 2005 at 06:47:22 Pacific
Reply: (edit)

QBASIC is in the \tools\oldmsdos directory of the WIN98SE CD.
I'don't remember if QBASIC works in XP, but my PowerBasic (1995 for 16-bit DOS does, with the exception of long file names). Besides QB there are quit a few new versions of BASIC, both console and Windows versions, that are free.
If you are using a list with items of various lengths on the same line, you can either do a quick rewrite with the elements in constant width fields, or use a sort that will do delimited lists.


Report Offensive Follow Up For Removal






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Programming Forum Home








Do you own an iPhone?

Yes
No, but soon
No


View Results

Poll Finishes In 6 Days.
Discuss in The Lounge
Poll History




Data Recovery Software