Remove characters from a csv file

September 28, 2010 at 20:16:40
Specs: Windows XP, P4
Hello,

I have a csv file which consist of tab delimited file. I want to delete the double quote on the first line and the double quote after the word on the first line and the comma's following it. Delete the second line of the text file and the extra comma on the end of the line

I have made this but with by using many batch file 1 for the header, next is for deleting the 2nd line, 3rd is for getting the body and last batch is for removing the last comma on the eof.

Is there a way to do it only using 1 batchfile only and if possible to do it on all csv files inside a folder

Any help would be very much appreciated. Please see example below.

"*****abc,defg,hih,sfgsdf,sfdsfs,sdfsf,,sdfsf*****",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
LI,Seq.No,Inf_Flag,Inf Country,Sender_Qual,Sender Code,Receiv. Qual.,Reciever,<
LI=,1,1,GB,SA,0100076004,SA,20100916,,6,82334913,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,2,1,GB,SA,0100076004,SA,20100916,,7,81962287,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,3,1,GB,SA,0100076004,SA,20100916,,39,77620733,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
LI=,4,1,GB,SA,0100076004,SA,20100916,,5,82237935,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,5,1,GB,SA,0100076004,SA,20100916,,10,82481259,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,6,1,GB,SA,0100076004,SA,20100906,,29,82266076,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
*****+NORC+6+SQTY+96+EOF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Thanks for the Big Help in advance
Tsirhclig


See More: Remove characters from a csv file

Report •

#1
September 29, 2010 at 02:09:51
download sed for windows then do this

C:\test> sed -n "1{s/\"//g};2!{p}" file
*****abc,defg,hih,sfgsdf,sfdsfs,sdfsf,,sdfsf*****,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
LI=,1,1,GB,SA,0100076004,SA,20100916,,6,82334913,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,2,1,GB,SA,0100076004,SA,20100916,,7,81962287,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,3,1,GB,SA,0100076004,SA,20100916,,39,77620733,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
LI=,4,1,GB,SA,0100076004,SA,20100916,,5,82237935,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,5,1,GB,SA,0100076004,SA,20100916,,10,82481259,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,6,1,GB,SA,0100076004,SA,20100906,,29,82266076,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
*****+NORC+6+SQTY+96+EOF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

GNU win32 packages | Gawk


Report •

#2
September 29, 2010 at 03:03:48
Sorry not familiar with SED. It will be better if it can be done thru a batchfile - thanks for the fast reply ghostdog

Report •

#3
September 29, 2010 at 04:22:41
Test this. Note that when the REM commands are removed the original input .csv file(s) will be deleted and .new files created are then renamed to the original filename. Please ensure that the .new files are formatted correctly before removing the REM commands. Strongly advise you to backup original input .csv files just in case. No allowance has been made for filenames containing spaces.

I wasn't sure what you meant by for removing the last comma on the eof. the script will remove a single comma on the final line of the example you posted.

@echo off
cls
setlocal enabledelayedexpansion

pushd Path\to\csv\files\||echo Pushd failed - job terminated&&exit /b

set nbr=1
for /f "tokens=*" %%1 in ('dir /b *.csv') do (
    set fil=%%1

    if !nbr! equ 1 (
       set /p line1=<%%1
       set line1=!line1:"=#!
       )

       for /f "tokens=1* delims=#" %%A in ("!line1!") do (
           set line1=%%A
           )

       set /a nbr+=1
       >!fil!.new echo !line1!
       )
                                                 
    for /f "tokens=*" %%1 in ('dir /b *.csv') do (
        set fil=%%1

        for /f "skip=2 tokens=*" %%A in (!fil!) do (
            set newline=%%A
            
        if "!newline:~0,5!" equ "*****" (
           set newline=!newline:~0,-1!
           )

           >>!fil!.new echo !newline!
           set nbr=1

))

for /f "tokens=*" %%1 in ('dir /b *.new') do (
    set oldfile=%%~n1
    set newfile=%%1

    echo Old file name = !oldfile!
    echo New file name = !newfile!
    echo.
REM    del !oldfile!
REM    ren !newfile! !oldfile!
)


Report •

Related Solutions

#4
September 29, 2010 at 04:37:02
>>It will be better if it can be done thru a batchfile

No, it would not be better because batch is not suited to manipulate files and strings. You can only do that much with it before every thing gets slow, messy and hard to maintain.

sed is just a file processing tool. Just like windows tools like findstr, ping, etc....
Just download it once only and use anywhere. Its just an exe file. Put the command in your batch file and you can run it as well.

If you are adamant to install good file processing tools, you can use native vbscript

Set objFS = CreateObject("Scripting.FileSystemObject")
strFolder="c:\test"
Set objFolder = objFS.GetFolder(strFolder)
For Each strFile In objFolder.Files
	If objFS.GetExtensionName(strFile) = "csv" Then	   
		strFileName = strFile.Name
		Set objFile = objFS.OpenTextFile(strFileName)
		Set objTempFile = objFS.CreateTextFile("temp",True)
       	Do Until objFile.AtEndOfStream
			linenum=objFile.Line
			strLine=objFile.ReadLine			
			If linenum = 1 Then				
				strLine=Replace(strLine,"""","")
				WScript.Echo strLine
				objTempFile.Write(strLine & vbCrLf)
			ElseIf linenum <> 2 Then
				WScript.Echo strLine	
				objTempFile.Write(strLine & vbCrLf)
			End If 
		Loop  	   
		objFile.Close
		objTempFile.Close
		objFS.DeleteFile(strFile)
		objFS.MoveFile "temp", strFileName 
	End If	
Next 

usage:

c:\test> cscript //nologo myscript.vbs

GNU win32 packages | Gawk


Report •

#5
September 29, 2010 at 04:50:20
@Wahine, why do you need so many for loops, especially you are using it 2 times to iterate the csv files... ?

GNU win32 packages | Gawk


Report •

#6
September 29, 2010 at 19:38:54
I have used VBS but the computers here in the office doesnt allow it to run thats why i insisted on using batch file instead. I think SED is a really great tool but im not familiar on how to use it. I already installed it on my computer but only seen GAWK as and executable file and not SED. Sorry for my ignorance on SED or GAWK. Thanks guys for all the replies, i appreaciate it.

Report •

#7
September 29, 2010 at 20:14:42
if its official work that you are doing, ask your administrator to enable vbscript to run.

GNU win32 packages | Gawk


Report •

#8
September 30, 2010 at 19:00:30
Its official but they didnt allow vbs to run on PC's here due to the ISMS - by the way the batch file above didnt work - tnx

Report •

#9
September 30, 2010 at 20:01:18
@tsirhclig - by the way the batch file above didnt work

Please advise what went wrong e.g. no output etc. Were Old and New filenames displayed?


Report •

#10
September 30, 2010 at 22:48:37
Actually nothing happened. the batch is exiting on the first pushd command - thanks

Report •

#11
September 30, 2010 at 23:40:17
@tsirhclig -the batch is exiting on the first pushd command

pushd Path\to\csv\files\||echo Pushd failed - job terminated&&exit /b
      =================

If the batch script exits after displaying the message Pushd failed - job terminated it is because the Path\to\csv\files\ which you have entered as a replacement does not exist. You must replace this with the actual path to the csv files which is not known to us.

Please advise outcome.


Report •

#12
October 1, 2010 at 01:32:24
Oh ok my bad, i'll try this ASAP

Report •

#13
October 3, 2010 at 19:26:50
The many comma on the end of the line on the 1st line and end line was not removed. Plus the asterisk at the end of the line in line 1 should still be there. Please see below output - tnx

*****+NORC+6+SQTY+96+EOF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
LI=,1,1,GB,SA,100076004,SA,5000480620,90,1,20100916,,6,82334913,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,2,1,GB,SA,100076004,SA,10029075,90,1,20100916,,7,81962287,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,3,1,GB,SA,100076004,SA,5000482373,90,1,20100916,,39,77620733,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
LI=,4,1,GB,SA,100076004,SA,5000482242,90,1,20100916,,5,82237935,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,5,1,GB,SA,100076004,SA,100261074,90,1,20100916,,10,82481259,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,6,1,GB,SA,100076004,SA,5000482242,90,1,20100906,,29,82266076,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
*****+NORC+6+SQTY+96+EOF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Report •

#14
October 4, 2010 at 01:38:18
Geez, I don't understand how the last line became the first line at all. In post #3 I stated I did not understand for removing the last comma on the eof. so wrote the script to remove only the LAST comma not all commas.

Try this:

@echo off
cls
setlocal enabledelayedexpansion

pushd Path\to\.csv\files\||echo Pushd failed - job terminated&&exit /b

for /f "tokens=*" %%1 in ('dir /b *.csv') do (
    set fil=%%1

    set /p line1=<%%1
    set line1=!line1:"=#!
       
    for /f "tokens=1* delims=#" %%A in ("!line1!") do (
        >!fil!.new echo %%A
        )

    for /f "skip=2 tokens=*" %%A in (!fil!) do (
        set newline=%%A

        if "!newline:~0,5!" equ "*****" (
            for /f "tokens=1* delims=," %%A in ("!newline!") do (
                set newline=%%A
                ))

    >>!fil!.new echo !newline!
))

for /f "tokens=*" %%1 in ('dir /b *.new') do (
    set oldfile=%%~n1
    set newfile=%%1

    echo Old file name = !oldfile!
    echo New file name = !newfile!
    echo.
REM    del !oldfile!
REM    ren !newfile! !oldfile!
)

The input file is as you posted:

"*****abc,defg,hih,sfgsdf,sfdsfs,sdfsf,,sdfsf*****",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
LI,Seq.No,Inf_Flag,Inf Country,Sender_Qual,Sender Code,Receiv. Qual.,Reciever,<
LI=,1,1,GB,SA,0100076004,SA,20100916,,6,82334913,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,2,1,GB,SA,0100076004,SA,20100916,,7,81962287,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,3,1,GB,SA,0100076004,SA,20100916,,39,77620733,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
LI=,4,1,GB,SA,0100076004,SA,20100916,,5,82237935,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,5,1,GB,SA,0100076004,SA,20100916,,10,82481259,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,6,1,GB,SA,0100076004,SA,20100906,,29,82266076,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
*****+NORC+6+SQTY+96+EOF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Output is:

*****abc,defg,hih,sfgsdf,sfdsfs,sdfsf,,sdfsf*****
LI=,1,1,GB,SA,0100076004,SA,20100916,,6,82334913,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,2,1,GB,SA,0100076004,SA,20100916,,7,81962287,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,3,1,GB,SA,0100076004,SA,20100916,,39,77620733,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
LI=,4,1,GB,SA,0100076004,SA,20100916,,5,82237935,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,5,1,GB,SA,0100076004,SA,20100916,,10,82481259,,LN1709AC,,,,PCV,2,0,,,,,,,,,<
LI=,6,1,GB,SA,0100076004,SA,20100906,,29,82266076,,LN1709AC,,,,PCV,3,0,,,,,,,,,<
*****+NORC+6+SQTY+96+EOF


Report •

Ask Question