Solved How to speed up a file conversion bat script

December 27, 2015 at 02:36:41
Specs: Windows 7, i5 2520M / 8 GB
I have a bat script that convert a file. because the records of this file contain 7 millions record. File size is about 241 MB.

This script function only truncate the first 3 bytes at each record. the execution time is about 1.5 Hr. Performance is very slowly. How to increase the performance for this script?

The script is as below
echo off
setlocal enabledelayedexpansion
set infile=Input-File.csv
set outfile=Output-File.csv

del %outfile%

for /f "tokens=1 delims=" %%C IN (%infile%) do (

set out-rec1=%%C

set out-rec1=!out-rec1:~3!


echo !out-rec1! >> %outfile%
)

:end
pause

The sample input data is as below

IN,NAME,ID,ACCT
w ,wxyz,A1234567890,5555444433332222
$ ,ABCD,A2345678901,5555333344441111
$ ,zzzz,D1345678901,5555111122223333
# ,sssssss,A1341678931,5555999922223333
q ,wwww,A1267109732,4444111188880000
$ ,uuuuuuu,B1327630912,3333111166662222

the sample output file is as below:
NAME,ID,ACCT
wxyz,A1234567890,5555444433332222
ABCD,A2345678901,5555333344441111
zzzz,D1345678901,5555111122223333
sssssss,A1341678931,5555999922223333
wwww,A1267109732,4444111188880000
uuuuuuu,B1327630912,3333111166662222

How to modify this script to increase performance. Please assist me to resolve this issue. thanks!!


See More: How to speed up a file conversion bat script

Report •


✔ Best Answer
December 28, 2015 at 06:40:15
Not sure what you're asking. StreamReader will look for encoder BOM in the file, and use UTF8 if it doesn't find a BOM. Without prompting, StreamWriter will write files in UTF8 without BOM.

If you want to specify the encoding yourself, you'd have to change

$in = [IO.StreamReader]"$pwd\some.csv"
$out = [IO.StreamWriter]"$pwd\out.csv"
to
$in = New-Object IO.StreamReader "$pwd\some.csv", desiredEnocding
$out = New-Object IO.StreamWriter "$pwd\out.csv", $false, desiredEncoding

If your question is how do you keep the output file the same encoding as the input file's detected encoding, I'd suggest

$in = [IO.StreamReader]"$pwd\some.csv"
$in.Peek() | Out-Null
$out = New-Object IO.StreamWriter "$pwd\out.csv", $false, $in.CurrentEncoding

How To Ask Questions The Smart Way

message edited by Razor2.3



#1
December 27, 2015 at 08:51:07
Not natively with Batch, but if PowerShell's an option, you could probably crunch 7 million lines in under 30 seconds with this:
$in = [IO.StreamReader]"$pwd\some.csv"
$out = [IO.StreamWriter]"$pwd\out.csv"
try {
    while (!$in.EndOfStream) {
        $out.WriteLine($in.ReadLine().Remove(0,3))
    }
} finally {
    $in.Close()
    $out.Close()
}

How To Ask Questions The Smart Way

message edited by Razor2.3


Report •

#2
December 27, 2015 at 18:22:45
Thanks for your assistant. I test this powershell script but get a error message from powershell - can't execute command (I translate the Chinese to English) Could you help me to resolve this issue?

Report •

#3
December 27, 2015 at 23:36:58
Hi
I fixed the powershell issue, but the script down when I execute this script at powershell environment. the error message show a record can't tralslate a Unicode character. Could you tell me how to redirect different files depending on different code page include the error records that don't belong any code page?

our code page may include ASCII, UTF8, Unicode. Thanks!!


Report •

Related Solutions

#4
December 28, 2015 at 06:40:15
✔ Best Answer
Not sure what you're asking. StreamReader will look for encoder BOM in the file, and use UTF8 if it doesn't find a BOM. Without prompting, StreamWriter will write files in UTF8 without BOM.

If you want to specify the encoding yourself, you'd have to change

$in = [IO.StreamReader]"$pwd\some.csv"
$out = [IO.StreamWriter]"$pwd\out.csv"
to
$in = New-Object IO.StreamReader "$pwd\some.csv", desiredEnocding
$out = New-Object IO.StreamWriter "$pwd\out.csv", $false, desiredEncoding

If your question is how do you keep the output file the same encoding as the input file's detected encoding, I'd suggest

$in = [IO.StreamReader]"$pwd\some.csv"
$in.Peek() | Out-Null
$out = New-Object IO.StreamWriter "$pwd\out.csv", $false, $in.CurrentEncoding

How To Ask Questions The Smart Way

message edited by Razor2.3


Report •


Ask Question