Computing.Net > Forums > Programming > remove duplicate blank lines in.txt

remove duplicate blank lines in.txt

Reply to Message Icon

Original Message
Name: ctt2
Date: April 29, 2007 at 23:43:40 Pacific
Subject: remove duplicate blank lines in.txt
OS: windows xp
CPU/Ram: p4 3.2ghz
Model/Manufacturer: intel
Comment:

I need help getting a batch file that will eliminate duplicate blank lines in thousands of .txt files starting with a directory and scanning all sub directories for text files to do the same.

I.E. most of the text files have:

Subject: blah balh


1. part 1
2. part 2
3. part 3


I want to eliminate the 2nd 3rd and 4th, etc.. carriage returns so it looks like:

Subject: blah blah

1. part 1
2. part 2
3. part 3

more text from another block that had 6 carriage returns previously.



Report Offensive Message For Removal

Response Number 1
Name: Mike (by mmcconaghy)
Date: April 30, 2007 at 04:15:10 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

Try this:

sed "/^$/{
N
/^\n$/D
}" oldfile > newfile

The \n is unix newline, if your in DOS/WIN you may need to change to carriage return which is I belive \m but check to be sure.

This is a fancier way of doing the same thing:

sed /^$/{
:start
N
s/^\n$//
t start
} oldfile > newfile



Report Offensive Follow Up For Removal

Response Number 2
Name: edman747
Date: April 30, 2007 at 05:10:15 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

Hello,
Not sure if this is what you want. It will remove all blank lines. Then insert one blank line after the line that contains the string "subject".

::one_blankline.bat
@echo off
SetLocal EnableDelayedExpansion

For /F "tokens=* delims=" %%A in (sample.txt) Do (
echo %%A|find /I "subject" >nul
if errorlevel 1 (
echo %%A>>sample.new
) else (
echo %%A>>sample.new
echo.>>sample.new))


Report Offensive Follow Up For Removal

Response Number 3
Name: Mechanix2Go
Date: April 30, 2007 at 05:18:19 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

There's probably some slick way to do it with plain vanilla batch, but I don't immediately see how. I guss others here may figure it out.

Meanwhile, there are 3rd party utils that will do it.

http://golden-triangle.com/UNIQUE.COM

and use this bat:

::== blank3.bat
@echo off
setLocal EnableDelayedExpansion

for /f "tokens=* delims= " %%a in ('dir/s/b/a-d *.txt') do (
unique < %%a > %%~na.new
)
::==


=====================================
If at first you don't succeed, you're about average.

M2



Report Offensive Follow Up For Removal

Response Number 4
Name: ctt2
Date: April 30, 2007 at 06:48:38 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

wow you guys are fast, I will try these methods out and let you know what happens. Thank you very much for your help!


Report Offensive Follow Up For Removal

Response Number 5
Name: ctt2
Date: April 30, 2007 at 07:10:19 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

mechanix2go your method is working, when I run the bat file in a folder with no sub folders it makes a .new file for each text file in that folder.

When I run the bat file in a higher level folder it creates .txt files for all the files in the sub folders but it brings them to that higher level folder.

Can it be changed so the files in the sub folders stay there and the old .txt gets renamed to .old and the .new files get renamed to .txt?

If this is a big pain or time consuming task nevermind, I can live with it as it is.

Thanks for your help.


Report Offensive Follow Up For Removal


Response Number 6
Name: edman747
Date: April 30, 2007 at 09:32:02 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

Hello,
This will remove the extra blank lines. It will process one or more sample*.txt files in the current directory and then process each subdirectory. Creating a sample.new, sample1.new, subdir\sample2.new subdir\sample3.new using the original file names. The find "subject" line is a marker so it knows where to insert one blank line after subject.

::process_textfiles.bat
@echo off
SetLocal EnableDelayedExpansion

For /F %%J in ('dir/s/b sample*.txt') Do (
For /F "tokens=* delims=" %%A in (%%J) Do (
echo %%A|find/I "subject" >nul
if errorlevel 1 (
echo %%A>>%%~dpnJ.new
) else (
echo %%A>>%%~dpnJ.new && echo.>>%%~dpnJ.new)))


With A second batch file as a driver, to rename the old and new files, after processing. This is the one you would run:

::call_process_textfiles.bat
call process_textfiles.bat
For /F %%J in ('dir/b/s sample*.txt') Do (
copy /y %%J %%~dpnJ.old >nul && del %%J)
For /F %%E in ('dir/b/s sample*.new') Do (
copy /y %%E %%~dpnE.txt >nul && del %%E)

One of our sys admins is always trying to get me to backup files before I began testing. It is not usually the best idea to overwrite your original files. But this will do it. May want to backup the directory first.

hope this helps.



Report Offensive Follow Up For Removal

Response Number 7
Name: ctt2
Date: April 30, 2007 at 20:27:54 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

edman thank you for your help on this, your 2nd batch file to rename the files will be helpful. The first one does not really apply to what I wanted. In the original message I should have been more clear, the text in these documents varies rapidly, it could say
"
how to tame a wild elephant

in the wild you would do this but at a zoo you do this
"

and another text document could say

"
walking up the street with your friends is fun

you can kick over trash cans and look at speeding cars
"

In both cases I would want the 5 or so carriage returns eliminated and brought down to 1. That was the point of my original message. The batch file that Mechanix2go wrote works perfectly for this. The only problems I had were the ones mentioned in my later post. It appears as though you solved this for me with the batch file that renames the .new files to .txt and the .txt to .old.

The only other thing I need the original batch file to do is leave the .txt files that it is working on and the new .txt files in the same subfolder when it is run in a directory with subdirectories.

I.E. upon execution in a folder with sub folders and files:
\Atlanta\
\1.txt
\atlantafools.txt
\atlantabasketball.txt
\Boxes\
\bigbox.txt
\cardboard.txt
\Cards\
\collecting.txt
\values.txt

It would produce
\Atlanta\
\1.txt
\1.new
\atlantafools.txt
\atlantafools.new
\atlantabasketball.txt
\atlantabasketball.new
\Boxes\
\bigbox.txt
\bigbox.new
\cardboard.txt
\cardboard.new
\Cards\
\collecting.txt
\collecting.new
\values.txt
\values.new

Then I would run the batch file to convert the extensions of .txt to .old and .new to .txt and we would end up with..
\Atlanta\
\1.old
\1.txt
\atlantafools.old
\atlantafools.txt
\atlantabasketball.old
\atlantabasketball.txt
\Boxes\
\bigbox.old
\bigbox.txt
\cardboard.old
\cardboard.txt
\Cards\
\collecting.old
\collecting.txt
\values.old
\values.txt

With the batch files you have provided so far and some simple file management I can now complete my tasks quickly and efficiently. Thank you all for your help and consider my issue solved.

This entire forum topic should be tagged with keywords like:

duplicate line removal, extra line removal, extra carriage removel, and extra bland line removal, multifile extra blank line removal

It was almost impossible to find software that would do it. Then I found you guys. :-)


Report Offensive Follow Up For Removal

Response Number 8
Name: Mechanix2Go
Date: May 2, 2007 at 14:00:32 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

::== no-two3.bat
@echo off
setLocal EnableDelayedExpansion

for /f "tokens=* delims= " %%d in ('dir/s/b/ad \temp') do (
pushd "%%d"
copy *.txt *.old
for /f "tokens=* delims= " %%a in ('dir/b/a-d *.old') do (
unique < %%a > %%~na.txt
)
popd
)
::==



=====================================
If at first you don't succeed, you're about average.

M2



Report Offensive Follow Up For Removal

Response Number 9
Name: nshram
Date: June 11, 2007 at 23:01:03 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

Hello!

I saw your message regarding duplicate blank line removal. I was also looking for a solution. You can use the MORE command with /S option which does the job! See batch file below :

for /r .\projects %%X in (*.txt) do (
more < "%%X" /S > "%%~pX%%~nX".tmp
del "%%X"
ren "%%~dX%%~pX%%~nX".tmp "%%~nX".txt
)

Here "projects" is your directory containing the text files and/or more sub directories also containing text files.

Pure batch without extra tools! Hope it helps.


Report Offensive Follow Up For Removal

Response Number 10
Name: Mechanix2Go
Date: June 12, 2007 at 02:29:35 Pacific
Subject: remove duplicate blank lines in.txt
Reply: (edit)

Interesting use of MORE. How do you keep it from pausing?


=====================================
If at first you don't succeed, you're about average.

M2



Report Offensive Follow Up For Removal






Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: remove duplicate blank lines in.txt

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software