Computing.Net > Forums > Programming > batch processing text files.

Computing.Net: Over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to sign up now, it's free!

batch processing text files.

Reply to Message Icon

Original Message
Name: edman747
Date: April 26, 2007 at 17:58:00 Pacific
Subject: batch processing text files.
OS: xp home
CPU/Ram: p4/1gb
Model/Manufacturer: hp/pavilion
Comment:

Hello,
I have about 100 text files. They are full of extra stuff that needs to be cleaned up. The stuff is one or more blank lines, trailing spaces, comment lines that start with a "#" and one byte that I want to change from 0 to 1.

Goal.
1.delete blank lines.
2.search and replace string
3.delete comment lines.
4.remove trailing spaces.

:cleanup.bat
@Echo Off
setLocal EnableDelayedExpansion

For /F "tokens=* delims=" %%A in ('dir /b *.cfg') Do (
Call :sub1 %%A
)
Goto :eof

:sub1
findstr /R /V "^$" %1 >t1.txt
:grep -ve "^$" %1 >t1.txt

findstr /R /V /C:"mp_multiplespawn 0" t1.txt >t2.txt
:sed -e "s/tiplespawn 0/tiplespawn 1/" t1.txt >t2.txt

Del %1
For /F "eol=# tokens=1,2" %%B in (t2.txt) Do (
If not %%C.==. (Echo %%B %%C>>%1) Else Echo %%B>>%1)
Del t1.txt t2.txt

I put this batch file in the directory with the messy text files and double click it in windows explorer. It reads the directory looking for the *.cfg files. Reads a file uses grep to strip the blank lines. sed to search for unique text and replace the one byte.

An unfortunate side effect of using the GNU UNIX utilities grep, is that now the end of each line has only LF (0A) In notepad it looks like a little box. And all lines run together. So now it rewrites the output file and adds a CRLF pair (0D0A) to each line, good time to delete comment lines and remove trailing spaces. Can anyone think of a cleaner way to do this? Maybe without using the GNU UNIX utilities and/or reducing the need for multiple temp files.

Have commented out grep and sed. Using findstr to delete the blank lines and matching line. Have not figured out the logic for testing the error level and checking the character offset before each matching line to develop a replace string routine. Any help?

Note: I tried using a free utility change v9.11 and found it to be problematic. The control file had to be carefully constructed with the sequence of /from /to statements in just the right order or the output files would still have blank lines or spaces. Then I would have to adjust the sequence for the next group of files. And for whatever reason change would not execute when copied to another directory. If running in a command prompt sometimes the screen would become garbled when calling change within a For loop. Plus it would overwrite the original file and change the file name to all caps. Maybe used to strip all spaces an replace them with smiles.

partial sample file:
## SvenCo-op map CFG file - HL SP Map
## Map By: Valve

nomaptrans c1a0a
nextmap c1a0e


skill 2
mp_multiplespawn 0
mp_radiovoice 1
mp_allowgaussjump 0
killnpc 0
nomedkit
## Setting of 1 forces players to respawn about a sec after death
mp_forcerespawn 0

Thanks,


Report Offensive Message For Removal


Response Number 1
Name: ghostdog
Date: April 26, 2007 at 18:46:31 Pacific
Reply: (edit)

if you have GNU awk, you can do this:
[code]
gawk ' /^$/ { next }
/^#/ {next}
/ $/ {next }
/mp_multiplespawn/ {$2 = 1}
{ print }
' "file"
[/code]
any other tools you have like Perl/Python/Java etc can also easily do this kind of text processing.


Report Offensive Follow Up For Removal

Response Number 2
Name: IVO
Date: April 27, 2007 at 02:32:53 Pacific
Reply: (edit)

The following batch script achieves what you want

@Echo Off
SetLocal EnableDelayedExpansion
For %%J in (*.cfg) Do (
For /F "tokens=* delims=" %%A in (%%J) Do (
Set Row=%%A
If not "!Row:~0,1!"=="#" (
:LOOP
If "!Row:~-1,1!"==" " (
(Set Row=!Row:~0,-1!) & GoTo :LOOP)
Set Row=!Row:tiplespawn 0=tiplespawn 1!
Echo !Row!>> %%~nJ.new)))


Report Offensive Follow Up For Removal

Response Number 3
Name: Mechanix2Go
Date: April 27, 2007 at 06:21:22 Pacific
Reply: (edit)

Hi IVO,

After several hours I made one that 'works'.

Yours is much better. [surprise]


=====================================
If at first you don't succeed, you're about average.

M2



Report Offensive Follow Up For Removal

Response Number 4
Name: IVO
Date: April 27, 2007 at 06:43:05 Pacific
Reply: (edit)

Hi M2,

ever ready to back to sixties and get a "Drag Race" with you, XMK vs Duetto, Jacqueline Bisset as gift for the winnner!


Report Offensive Follow Up For Removal

Response Number 5
Name: edman747
Date: April 27, 2007 at 18:14:51 Pacific
Reply: (edit)

Thank You All, for your reply. IVO that is really slick.
It dies silently on about six of the files. When there is a trailing space after the last token on a line. So I made a small change, to remove trailing spaces. Not exactly silent, makes a file named "%~nJ.new)))" Which contains the line it choked on.

:slick_ivo.bat
@Echo Off
SetLocal EnableDelayedExpansion
For %%J in (*.cfg) Do (
For /F "tokens=1,2" %%A in (%%J) Do (
If not %%B.==. (
Set Row=%%A %%B) Else Set Row=%%A
If not "!Row:~0,1!"=="#" (
:LOOP
If "!Row:~-1,1!"==" " (
(Set Row=!Row:~0,-1!) & GoTo :LOOP)
Set Row=!Row:tiplespawn 0=tiplespawn 1!
Echo !Row!>>%%~nJ.new)))

Also since I really want to keep the original file names I call it from another batch file. slick_ivo runs then the orig files are deleted and the *new files are renamed to *cfg.

:cleanup.bat
call slick_ivo.bat
del *.cfg
ren *.new *.cfg



Report Offensive Follow Up For Removal


Response Number 6
Name: edman747
Date: April 28, 2007 at 19:26:26 Pacific
Reply: (edit)

Hello,
Seems like using a goto inside a for loop, breaks it and %J becomes undefined. This works a little better, to remove one or more trailing spaces.

@Echo Off
SetLocal EnableDelayedExpansion
For %%J in (*.cfg) Do (
For /F "eol=# tokens=* delims=" %%A in (%%J) Do (
Set Row=%%A
call :loop
Set Row=!Row:tiplespawn 0=tiplespawn 1!
Echo !Row!>> %%~nJ.new))
:loop
If "!Row:~-1,1!"==" " (
Set Row=!Row:~0,-1!) & goto :loop
goto :EOF

----
Yet here is my solution. setting eol=# ignores comment lines. using tokens=1,2 and spaces as a delimiter strips all trailing spaces. And saves me the two extra if stataments.

@Echo Off
SetLocal EnableDelayedExpansion
For %%J in (*.cfg) Do (
For /F "eol=# tokens=1,2" %%A in (%%J) Do (
If not %%B.==. (
Set Row=%%A %%B) Else Set Row=%%A
Set Row=!Row:tiplespawn 0=tiplespawn 1!
Echo !Row!>>%%~nJ.new))


Report Offensive Follow Up For Removal

Response Number 7
Name: Mechanix2Go
Date: May 2, 2007 at 17:16:24 Pacific
Reply: (edit)

This may be more generally useful. It's not limited to 2 tokens.

::== clean#5.bat
:: remove comments: #blabla, trailing spaces and do str subst
:: lesson learned: echo tiplespawn 0>>file puts tiplespawn to con; stderr?

@echo off
setLocal EnableDelayedExpansion
for %%F in (*.new *.t) do if exist %%F del %%F

for /f "tokens=* delims= " %%c in ('dir/b/a-d *.cfg') do (
set n=%%~nc
for /f "eol=# tokens=* delims= " %%a in (%%c) do (
call :sub1 %%a
)
for /f "tokens=* delims= " %%t in (!n!.t) do (
set s=%%t
set s=!s:tiplespawn 0=tiplespawn 1!
echo !s!>>!n!.new
)
)
goto :eof

:sub1
>> !n!.t echo %*
goto :eof
::==



=====================================
If at first you don't succeed, you're about average.

M2



Report Offensive Follow Up For Removal






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Programming Forum Home








Do you own an iPhone?

Yes
No, but soon
No


View Results

Poll Finishes In 7 Days.
Discuss in The Lounge
Poll History




Data Recovery Software