Compare two files and output

December 3, 2009 at 18:43:26
Specs: Windows XP SP3
I have two text files. The first has data like:
000328
001708
001761
011772
014963

The second one, a very large file, has data on each line like:

"SMREC05 ",00000006,"001708","210","1100","CDA TREAS BILL 15JA04","BONS TRES CDA 15JA04","C ", 00000, 000000000, 00000000100, 20040115,"1350Z7D43 ",""," ", 000010000000000, 000000000000000, 000000000,"N", 20030702,"0","CA1350Z7D435 ","","1","1","0","CANADA TREASURY BILL 15JAN04","BONS DU TRESOR DU CANADA 15JAN04"," "," ", 000000000000000, 00000000, 00000000, 20030702, 20040129

I have to compare the data in the first line of the first file to the string starting at the 22nd character in each line of the second file (001708 in this case) and then output lines that match to a new file.

Then the data in the next line of the first file is compared and the process continues until the end of the first file has been reached.

The data in the first file is in ascending order as it is in the second file. If this makes it any easier.

Any help is appreciated. I have Windows XP SP3

I have tried but I am not that good at this.


See More: Compare two files and output

Report •


#1
December 3, 2009 at 19:07:00
do you mean data in big file (#2) is sorted based on column 22 or just sorted ascending (entire line)? if sorted on col22, it would speed things up, but not mission-critical.
worst-case (slowest):

setlocal enabledelayedexpansion
del xfile 2>nul
for /f %%A in (shortfil) do (
for /f "tokens=1-3,4* delims=," %%B in (bigfile) do (
if "%%A" equ %%D echo %%B%%C%%D%%E >> newfile)
)

i've probly got something wrong, (i always do...) not tested


Report •

#2
December 3, 2009 at 19:59:15
@echo off > newfile & setLocal enableDELAYedexpansion

for /f "tokens=* delims= " %%i in (f1) do (
for /f "tokens=1-3* delims=," %%a in (f2) do (
if %%i equ %%~c (
>> newfile echo %%a,%%b,%%c,%%d
)
)
)


=====================================
Helping others achieve escape felicity

M2


Report •

#3
December 4, 2009 at 10:53:11
OK, this (the second reply) worked great the first time. Now, I am wondering if we can add a little piece that would speed it up immensely.

The File2 is large. File1 has the data in the same order it occurs in File2. So really when the program checks a line in File2 and finds data from File1 in it it only has to do this until the following line in File2 has a different value. There will be no further instances of File1 data in File2 once the program has found File1 data in File2 and copied those lines to a new file.

Some sort of loop that goes back to the next File1 data once the File1 Data in File2 changes.

Just a thought. Believe me, I am more than happy with what has been done after spending a lot of time on this.

Genuflections and "I am not worthy's for all"


Report •

Related Solutions

#4
December 5, 2009 at 17:29:18
** Now, I am wondering if we can add a little piece that would speed it up immensely. **
i made the least amount of change to M2's code as i could get by with, hoping not to blow it up, but the mods req'd were more than i anticipated at first. be sure to safeguard your working copy!

@echo off > newfile & setLocal enableDELAYedexpansion

set /a Lct=0
for /f "tokens=* delims= " %%i in (f1) do (
call :fil2)
goto :ex

:fil2
set /a ct=0
for /f "tokens=1-3* delims=," %%a in (f2) do (
set ct+=1
if !ct! gtr !Lct! (
if %%i equ %%~c (
call :foundit
goto :ex)))
goto :ex

:foundit
for /L %%Z in (1,1,1) do (
>> newfile echo %%a,%%b,%%c,%%d)
set Lct=!ct!
:ex


Report •

#5
December 6, 2009 at 01:08:41
Hi gang,

I get the general idea of 'picking up where you left off' but today feels like a day when my brain is on strike.

nbrane,

You got me mystified with this:

for /L %%Z in (1,1,1) do (


=====================================
Helping others achieve escape felicity

M2


Report •

#6
December 6, 2009 at 10:57:10
inside of FOR-loop was the only way i could get access to the %% var.s without having to assign them to user-variables which i wanted to avoid doing, so i made a dummy FOR to capture them directly.

Report •

#7
December 7, 2009 at 02:25:12
nbrane,

Cool idea.

Since we're broadcasting this week from the FWIW department, you can skip the commas.

for /L %%a in (1 1 1) do (


=====================================
Helping others achieve escape felicity

M2


Report •

#8
December 7, 2009 at 02:31:59
A very good idea indeed, I thought I had seen it all.....


Batch Variable how to


Report •

#9
December 7, 2009 at 03:18:25
Hi Judago,

I learn something every day. And I've been at this longer than I'd care to admit.


=====================================
Helping others achieve escape felicity

M2


Report •

#10
December 7, 2009 at 03:29:04
Hello M2,

"I learn something every day. And I've been at this longer than I'd care to admit."

It's a Cliché, but it's true. The most amazing part from my point of view it that it seems to be from people that are new to [insert subject] that seem to do it most often.


Batch Variable how to


Report •

#11
December 11, 2009 at 12:47:50
Thank you all. I will give it a rip this weekend and let you know how it goes.

Report •

#12
December 11, 2009 at 18:44:08
please, unless you have serious constraints, otherwise , use a language with csv module that takes care of parsing csv for you. Your batch will break if your csv fields' data have commas in them. eg "one,two,three" , "field". How is your batch going to take care of that, without writing more batch code to check for these legitimate commas.

The below is Python script, using csv module. It will take care of those "commas" in the fields for you.

import csv
filename1 = "file1"
filename2 = "file2"
data = open(filename1).read().split()
reader = csv.reader(open(filename2),delimiter=',')
writer = csv.writer( open("newfile.csv","wb") )
for row in reader:
    if row[2] in data:
        print "match: ",','.join(row)
        writer.writerow(row) #write to newfile.csv

similarly, for Perl as well...use a Perl csv module.

GNU win32 packages | Gawk


Report •


Ask Question