Find value in file from text list

June 7, 2010 at 08:54:48
Specs: Windows XP

Hi,

I am trying to find out how to search a text file for particular values and then delete or replace them. The values are themselves in a text file.

As an example, lets say I have a text file called 'Data.txt' which contains the following plain text:

"An Aardvark saw a Bat, then told a Cat who told a Dog, who wrote it down for the Elephant"

and a second text file called 'Variables.txt' which contains the following list:

Aardvark
Bat
Cat
Dog
Elephant

...and I want to script something that will read the first line of Variables.txt (i.e. Aardvark) and look for it in 'Data.txt'. If it finds it in Data.txt it needs to be deleted or replaced with a different value

I then want the script to loop so that it repeats for all values in 'Variables.txt'

I hope the above makes sense.
I'm fairly okay on writing batch files, but can't get my head around the For /F command etc.

It seems to me that I need a loop within a loop - i.e. loop 1 reads the entire data.txt file looking for the variable on line 1 of Variables.txt (i.e. 'Aardvark') and once data.txt has been checked for the first variable ('Aardvark') is then looped to check again but this time for the second variable (i.e. 'Bat').

I hope the above makes sense!

Thanks


See More: Find value in file from text list

Report •


#1
June 7, 2010 at 10:26:36

You did not specify where the replacement comes from and whether it's a constant for all values, or a specific value paired to each data value. Also if it is to be case sensitive or not.
That would help to solve.

Report •

#2
June 7, 2010 at 12:55:55

Thanks very much for the reply! Apologies for being so vague.
Okay, well, for now, lets forget the replacement, so it will be a simple find-and-remove - also, the text isn't case sensitive although is likely to be in capitals where a word, and otherwise will be a number.

So, to summarise again:

I have 2x files. File 1 is the 'data' and file 2 is a text file with keywords. I want to search file 1 for the keywords in file 2 and remove them.

Another analagy;

**Contents of File 1**
'hello this is a test to delete words'

**Contents of File 2**
delete
hello
test

Then after running the script, I should end up with File 1 reading:

'this is a to words'

I hope that makes sense and is enough for someone to help.

Thanks all,

A.


Report •

#3
June 7, 2010 at 17:07:22

here's something to start with. If the lines in "data.txt" are indeed quoted,
you'll need: (set xx= %%~a ) at line 3, then add them back at output.
null lines are not saved, words defined only by space, not punctuation.

@echo off > mmm & setlocal enabledelayedexpansion
for /f "tokens=*" %%a in (data.txt) do (
(set xx= %%a )
for /f %%b in (variables.txt) do (
set xx=!xx:%%b =!
call :trim
)
>> mmm echo.!xx!
)
goto :eof
:trim
if "!xx:~0,1!" equ " " set xx=!xx:~1! & goto :trim
:3
if "!xx:~-1!" equ " " set xx=!xx:~0,-1!& goto :3
::------ end


Report •

Related Solutions

#4
June 8, 2010 at 03:05:42

Thanks nbrane, that's a great start for me!
Unfortunately as the list in variables.txt was quite long, it took ages to complete for a 250k data file - however, a potential shortcut would be if I could use wildcards, but I'm not sure how to implement.

For example, if the variables file contains:

apple
banana
1234500000
1234567001
1234567890
1234554321

then potentially I could specify 12345* or 12345##### or 12345????? if I could work out the syntax etc - the only problem is that the information in variable.txt can be numeric or alphabetic so I'm not sure how that works with changing the EQU's to GTR/GEQ

I'll keep playing with it as it's a fab start - so thanks again.
Any subsequent advice would be appreciated.

Thanks


Report •

#5
June 8, 2010 at 05:06:44

I'm stuck now trying to remove strings of text that are within the delimiters.... as an example, the data file contains:

"01/01/2010","SYSTEM 432563773 9875 ","503998","",123.86,"DD test"

So the second 'tab' which contains 'System<SPACE>432....' I want to change to say '432....' for example. I was hoping for a wildcard way (Or any other way to perform partial matching) to remove text within the " delimiters - i.e. remove Sys* would remove System' etc.

Any thoughts anyone?

Thanks,

Andy


Report •

#6
June 8, 2010 at 11:43:29

you'll probably need SED (for windows). Not sure of the
url, so just google it if you're interested. There's some
learning curve involved however, regarding regular expressions,
which are their own little (well, not so little) universe unto themselves.
If you want to email me a copy of the variables file, (prefer zip or just raw) I may be able to work up a kludge using FINDSTR, which is the only
windows tool for regular expression evaluation (very limited).
It can however find strings at beginnings and ends of words
and do wildcards. it can't replace, but it might feed a batch
that can do the replacement. maybe.

Report •

#7
June 8, 2010 at 11:54:55

Thanks for the reply - I've been experimenting today with SED, and a few others (ALTER, FART, GSAR, CHANGE, MTR, SSR etc) and the conclusion I've drawn is that wildcards within the text don't appear to be supported in any tool I've found. Unfortunately the list of variables to replace is quite long, and the data file quite large.

Using the straight through DOS scripts (and thanks again nbrane for your help) the loops took too long - imagine a 1mb data file with 1000 lines and a variable list of 200 line items. Each variable takes ~3seconds to check the file which means 600 seconds to go through - i.e. 10mins!

Fortunately the tool CHANGE is pretty quick so rather than wildcards I'm using the 200+ line variable list in a batch file - i.e. 200 lines in a batch as folows:

change /i test2.csv /FROM "1234567" /TO 987654
change /i test2.csv /FROM "1234568" /TO 987653
change /i test2.csv /FROM "1234569" /TO 987652

very cumbersome but it works!
Thanks again for your assistance

Andy


Report •

#8
June 8, 2010 at 12:40:34

Findstr could possibly narrow down the target-file size, output to an intermediate workfile using r/e, then run a batch (or "change") on the smaller subset workfile, then plug the workfile back into the main file.\
f/e:
findstr /n /g:variables.txt test2.csv > tempfile
finds every line that has any variable in the variables.txt file, and outputs the line (with the line number).
"Change" would then only have to operate on the tempfile.
Then, feed the tempfile to a .bat which, using the line numbers (delimited by ":") and a counter, you could splice the changed tempfile back into the main file.
The main time savings would be in operating on the smaller tempfile, but the trade-off might or might not be worth the effort.
at least you have something working... ;-)

Report •

#9
June 8, 2010 at 16:28:09

Cool! Thanks. If I try thaht i'll let you know how I get on.

Thanks,

Andy


Report •


Ask Question