Computing.Net > Forums > Programming > batch merge/substitute tokens

batch merge/substitute tokens

Reply to Message Icon

Original Message
Name: maxbre
Date: March 20, 2008 at 08:01:07 Pacific
Subject: batch merge/substitute tokens
OS: win xp
CPU/Ram: 2 gb
Comment:

Hi all you great masters of batch scripting!

Let’s suppose I have one file named ‘output_utm.txt’ with 5 tokens in it which first two I need to elaborate (in fact, a quite complex elaboration performed via another batch using a specific *.exe).
The result of this elaboration (against the first two tokens) is then temporary 'stored' in a file called gbo.txt.
Up to now I managed quite well this task but now comes the tricky question because I need to ‘merge/substitute' the new tokens in the corresponding position of the old file 'output_utm.txt' and save everything in a new file ‘output_gbo.txt’.
My rough attempt is the following:

::batch merge/substitute tokens

@Echo Off > output_gbo.txt

setLocal EnableDelayedExpansion

For /F "skip=2 tokens=1-5" %%a in (output_utm.txt) Do (
set /a N+=1
if !N! equ 1 (echo %%a %%b %%c %%d %%e >> output_gbo.txt
) else (
For /F "skip=1 tokens=1-2" %%f in (gbo.txt) Do (
echo %%f %%g %%c %%d %%e >> output_gbo.txt
)
)
)
:: end

But obviously this is messing up everything because the nested 'for' is performing a ‘cartesian product’ of lines and therefore I need a way to restrict it somehow...
Any help?
Thank you
max


Report Offensive Message For Removal

Response Number 1
Name: Mechanix2Go
Date: March 20, 2008 at 08:55:09 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

I'm not quite with you. It may help to post the original files and the desired output.


=====================================
If at first you don't succeed, you're about average.

M2


Report Offensive Follow Up For Removal

Response Number 2
Name: maxbre
Date: March 20, 2008 at 09:12:26 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

hi M2

1 - sample of the original file 'output_utm.txt'

x y m_1h max_1h hh
724450 5022775 0.375 14.172 61


2 - sample of 'gbo.txt' (elaboration of first two tokens present in 'output_utm.txt')

x y
1724898.270 5022597.66


3 - sample of desired final output 'output_gbo.txt' (it is a blend of first two tokens from 'output_utm.txt' and the rest of tokens from 'gbo.txt')

x y m_1h max_1h hh
1724898.270 5022597.661 0.375 14.172 61

practically, my problem is how to deal with point 3

cheers

max


Report Offensive Follow Up For Removal

Response Number 3
Name: Mechanix2Go
Date: March 20, 2008 at 10:31:47 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

uh... I think I'm having a senior moment.

Let's say you have two files: A.txt and B.txt which contain:

one two three four five six
seven eight nine ten eleven twelve

::== B.txt

thirteen fourteen fifteen sixteen seventeen eighteen
nineteen twenty twentyone twentytwo twentythree twentyfour

::==

What's the output need to be?


=====================================
If at first you don't succeed, you're about average.

M2


Report Offensive Follow Up For Removal

Response Number 4
Name: ghostdog
Date: March 20, 2008 at 20:45:06 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

if you can use gawk, download and install from here: http://gnuwin32.sourceforge.net/pac...

assuming the header -> x y m_1h max_1h hh is not included.

FNR==NR{ a[FNR]=$3" "$4" "$5; next}
{
print $1,$2,a[FNR]
}


save the above as script.awk and from command line:

C:\test>gawk -f test3.awk output_utm.txt gbo.txt
1724898.270 5022597.66 0.375 14.172 61


Report Offensive Follow Up For Removal

Response Number 5
Name: maxbre
Date: March 21, 2008 at 00:31:45 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

hi M2 and ghostdog

to m2
sorry it's my fault I did not explain myself well;

I have the file a.txt with headers a b c d e
a b c d e
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3
a4 b4 c4 d4 e4
a5 b5 c5 d5 e5

I have the file b.txt with headers a b
a b
a1new b1new
a2new b2new
a3new b3new
a4new b4new
a5new b5new

I want to get the file c.txt like

a b c d e
a1new b1new c1 d1 e1
a2new b2new c2 d2 e2
a3new b3new c3 d3 e3
a4new b4new c4 d4 e4
a5new b5new c5 d5 e5

to ghostdog
I appreciate very much your hints for gawk; I have already installed it and I'm really struggling to use it but for now I do not feel much confident on it (nor on batch actually); I'm still in the middle of the gawk manual! In any case your help it's a good way to learn this powerfull language; thank you again for that;
At the moment I'm also interested in comparing different solutions (batch vs gawk) and learn as much as possible...

bye

max


Report Offensive Follow Up For Removal


Response Number 6
Name: ghostdog
Date: March 21, 2008 at 01:08:59 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

you can read the gawk user guide here:http://www.gnu.org/software/gawk/manual/gawk.html

using the above code, it produces your desired output.

c:\test>gawk -f script.awk file1 file2
a b c d e
a1new b1new c1 d1 e1
a2new b2new c2 d2 e2
a3new b3new c3 d3 e3
a4new b4new c4 d4 e4
a5new b5new c5 d5 e5


gawk first process file1, then file2
FNR means the input record number in the current input file. NR means the total number of input records seen so far.
so
FNR==NR means get all records from the first file. arrays in gawk are called associative arrays. The default field separator is space, if not specified. Fields are denoted by $1,$2 and so on. $1 is first field, $2 is second field and so on..

therefore, a[FNR]=$3" "$4" "$5 means to store in array "a", the values 3rd field, followed by space, followed by 4th field and so on..
this means : a[1] = d e f, a[2] = c1 d1 e1 and etc.. the statement "next" will go to next record.

{
print $1,$2,a[FNR]
}

means to print the 1st and 2nd record of the next file, ie file1, followed by a[FNR], whose values are already stored in array "a"


Report Offensive Follow Up For Removal

Response Number 7
Name: IVO
Date: March 21, 2008 at 06:11:53 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Beware the following is NOT tested

@echo off > output_gbo.txt
setlocal enabledelayedexpansion

set count=0
for /f "tokens=1-5" %%a in (output_utm.txt) do (
set count+=1
if !count! equ 1 (
echo.%%a %%b %%c %%d %%e>> output_gbo.txt
) else (
for /f "tokens=1-3 delims=[] " %%f in ('type gbo.txt ^| find /N /V ""') do (
if !count! equ %%f echo.%%g %%h %%a %%b %%c
)
)
)
:: End_Of_Batch

I suggest you follow the tips of ghostdog as the above batch is higly inefficient as to use a batch to solve this issue you can, but with a modified logic.


Report Offensive Follow Up For Removal

Response Number 8
Name: maxbre
Date: March 21, 2008 at 06:40:51 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

yes, I do agee with all you;
I promise I'll try to learn gawk (it seems to be really the proper language for my frequent tasks of txt manipulation); but on the other hand I'm still curious to explore the 'batch limits' and I'll let you know the results of my thoughts...
Thank you all for now and also have a good easter to everybody!
max

ps to ivo: what do you mean with 'modified logic'; does it means you have 'steered' my really goofy attempt in the direction that could be working but on the other hand you would have find a different batch solution? if this is the case I would be really happy to know how would you have faced the problem (by imagining you would have in your - powerful - hand just the batch solution...)
bye again


Report Offensive Follow Up For Removal

Response Number 9
Name: IVO
Date: March 21, 2008 at 06:57:13 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

To maxbre,

I posted a strightforward solution based on your code that should work (as I said that was just coded NOT tested and is based on your samples).

As the script needs to read the whole intermediate file gbo.txt for each line of the native output_utm.txt, the code is higly inefficient.

I figure out an alternate way to perform your task, but I need to examine better the proposed solution. I'll post it if viable later.

Good easter
Ivo


Report Offensive Follow Up For Removal

Response Number 10
Name: Mechanix2Go
Date: March 23, 2008 at 04:43:59 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

This one is tested.

::== mergeA1

@echo off
setLocal EnableDelayedExpansion

for /f "tokens=* delims= " %%a in (a.txt) do (
echo %%a> c.txt& goto :HDRdone
)
:HDRdone

for /f "tokens=* delims= " %%L in ('find /v /c "/\/\" ^< a.txt') do (
set /a LAST=%%L
)

for /L %%i in (2 1 !LAST!) do (
set /a CURR=%%i
call :subA
call :subB
echo !HEAD! !TAIL!>> c.txt
)
goto :eof

:subA
for /f "tokens=1-3* delims=[] " %%L in ('find /n /v "///" ^< a.txt') do (
if !CURR! equ %%L set TAIL=%%O
)
goto :eof

:subB
for /f "tokens=1* delims=[] " %%T in ('find /n /v "///" ^< b.txt') do (
if !CURR! equ %%T set HEAD=%%U
)
goto :eof


=====================================
If at first you don't succeed, you're about average.

M2


Report Offensive Follow Up For Removal

Response Number 11
Name: IVO
Date: March 23, 2008 at 10:37:44 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Answering to maxbre's post scriptum in post #8.

Let me resume the initial problem: there is a file named output_utm.txt with layout

a b c d e
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3
a4 b4 c4 d4 e4
a5 b5 c5 d5 e5

whose first two elements have to be replaced
via a complex computation performed by an external .exe program. The result is the outlined output_gbo.txt file

a b c d e
a1new b1new c1 d1 e1
a2new b2new c2 d2 e2
a3new b3new c3 d3 e3
a4new b4new c4 d4 e4
a5new b5new c5 d5 e5

maxbbre proposed solution is

Read output_utm.txt, generate for each row the computed tokens and store them into an intermediate gbo.txt file

a b
a1new b1new
a2new b2new
a3new b3new
a4new b4new
a5new b5new

then merge with the native one to set up the required target file output_gbo.txt.

The problem raises as batch scripts are limited to browse one file at time, so the merging requires to read the whole other file for each line of the main one. This approach is implemented in the following code (tested and working)

@echo off > output_gbo.txt

find /N /V "" < output_utm.txt > utm.tmp
find /N /V "" < gbo.txt > gbo.tmp

for /f "tokens=1-6 delims=[] " %%a in (utm.tmp) do (
if %%a equ 1 (
echo.%%b %%c %%d %%e %%f>> output_gbo.txt
) else (
for /f "tokens=1-3 delims=[] " %%g in (gbo.tmp) do (
if %%a equ %%g echo.%%h %%i %%d %%e %%f>> output_gbo.txt
)
)
)
del *.tmp
:: End_Of_Batch

Stated output_utm.txt has N lines the bulk of I/O is

N (numbering utm) + N (numbering gbo)+ N (reading utm) + N*N (merging) + N (previous reading of utm) = 4N + N^2

E:G. for N=100 I/O is 10400

A better way is to read the input file and generate the target one on the fly as the I/O now is just N (or 2N as we will see).

The suggested script is (tested and working too)

@echo off > output_gbo.txt
setLocal EnableDelayedExpansion
set tk=?
for /f "tokens=1-5" %%a in (output_utm.txt) do (
if !tk!==? (
echo.%%a %%b %%c %%d %%e>> output_gbo.txt
set tk=#
) else (
call maxsub %%a %%b
set /P tk=<gbo.tmp
echo.!tk! %%c %%d %%e>> output_gbo.txt
)
)
del gbo.tmp
:: End_Of_Batch

where maxsub is the batch performing the tokens manipulation, here simulated by

@echo.%1new %2new>gbo.tmp

in the real life embedding the .exe program suited to perform the computation.

Because a called batch script invokes a secondary command processor, its results have to be passed back using a temporary file holding just one line (that accounts for N I/O).

In the above case the total I/O for N=100 is just 200 opposed to 10400 of the first script. A dramatic improvement (for real large tabular files).

Now the lesson is over; remember: even scripts have to face with performance factors.


Report Offensive Follow Up For Removal

Response Number 12
Name: maxbre
Date: March 25, 2008 at 01:48:36 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

hi ivo, ghostdog & m2

what a great lesson from all you!
Thank you again, I think I have here enough brain food for next 3 years, just assuming for myself a good 'performance factor' in understanding your help!

bye
max


Report Offensive Follow Up For Removal

Response Number 13
Name: maxbre
Date: March 26, 2008 at 08:20:35 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

sorry for going back to this long and troubling story but real life is unfortunately always much more complicated...

that's because by running the followings:

::merge.bat
@echo off > output_gbo.tmp
setLocal EnableDelayedExpansion

set tk=?
for /f "skip=2 tokens=1-5" %%a in (output_utm.txt) do (
if !tk!==? (
echo.%%a %%b %%c %%d %%e>> output_gbo.tmp
set tk=#
) else (
call traspunto.bat
set /p tk=<gbo.tmp
echo.!tk! %%c %%d %%e>> output_gbo.tmp
)
)
::end of batch

::traspunto.bat
:: this batch is performing calculation on first two tokens
traspunto.exe f pia pia en utm.tmp ED50 ROMA40 32 O 32 O >gbo.tmp
::end of batch

I got into the problem that the echoing of !tk! is always reproducing just the tokens of first row of file gbo.tmp: why ?
(please consider that I must necessarly pass through the creation of intermediate gbo.tmp);
...in the end it seems that batch is definitely not the proper solution for this case...
... but still to explore m2 post #10, really hard in some part of it, let you eventually know ...
thanks again
max


Report Offensive Follow Up For Removal

Response Number 14
Name: IVO
Date: March 26, 2008 at 09:46:49 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Dear maxbre,

I guess you have absolutely not understood how my "optimized" script works. It passes the first two tokens in each row of output_utm.txt to the batch subroutine "maxsub" via the actual parameters %%a %%b

call maxsub %%a %%b

If you name "maxsub" "transpunto.bat" where are its tail parameters and how can transpunto.exe be aware of the current tokens processed?

traspunto.exe f pia pia en utm.tmp ED50 ROMA40 32 O 32 O >gbo.tmp

This command is a nightmare for me, but I do not see formal parameters on it (%1 for %%a and %2 for %%b) and more why is utm.tmp referenced here?.

What a mess!

Anyway if you explain me the meaning of the command tail, maybe I can give you a tip.

Last but not least, it is not a good idea to name the subroutine "transpunto.bat" as there is "transpunto.exe" and the native suffix precedence is .com .exe .bat that may lead to a big mess.


Report Offensive Follow Up For Removal

Response Number 15
Name: maxbre
Date: March 28, 2008 at 00:10:47 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

dear Ivo

You are perfectly right but I've been forced to go through the mess I posted in #13.
The main reason is because the program ‘traspunto.exe’ is only working with an input file here named ‘utm.tmp’ (passed with the obscure parameter 'f' in the command tail of traspunto.bat, along with many other strange parameters which are not much important here to explain I think – consider that are just options for specifying the type of transformation of geographic coordinates from one reference system to another -).
I've been trying to echoing the results of ‘traspunto.exe’ in order to keep in the direction you suggested as to generate the target on the fly but I was unsuccessfull.
‘traspunto.exe’ is a program accepting an input file here named utm.tmp (just composed by two fields x and y) which final output (calculation against two fields x and y) I've redirected to the file named gbo.tmp.
Do you think it is possible the redirection of the final output to the standard output (screen) and therefore catch the results on these two fields on the fly?
Thank you again for you patience & precious help & sorry for wasting your time with my trivial questions.

Max


PS to M2: finally I managed to understand your code posted in #10 for which I greatly thank you; (but how a hard work for myself!); I will keep the code, among many others here posted, as a reference; it is working perfectly but unfortunately it is also suffering for the same (un)efficiency problem highlighted by ivo (about 10 min run to complete the task with my real data – 1600 rows -); nevertheless I’m considering it as a great lesson of batch scripting.


Report Offensive Follow Up For Removal

Response Number 16
Name: IVO
Date: March 28, 2008 at 03:54:23 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Hi maxbre,

I don't understand the exact behavior of transpunto.exe,

- Is it forced to read the whole utm.tmp when running, creating the resultant output file holding the transposed lines from input or does the utmp.tmp contain just ONE row with two tokens?

- Where does utm.tmp come from in your #13 post, how is it related to output_utm.txt?

- As far as I can see, there is no f parameter on the transpunto.bat's tail, indeed there are no parameters at all.

It is not right to say here batch is not the way to undergo, espacially if this is the scripting language you master better, just it needs to be properly exploited (in a dramatic night years ago I was forced to code on the fly a utility in Fortran (!) to enable the as soon as possible restart of a financial system).

P.S.: Ten minutes are not a biblic time for a math run.


Report Offensive Follow Up For Removal

Response Number 17
Name: maxbre
Date: March 28, 2008 at 06:22:40 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Dear IVO

ok, now sit back because this is the complete story (hoping not boring you too much!).
I’m working on environmental modelling of air pollution and to do that I’m using a sw compiled in fortran (please do not ask me to modify this sw because it is really a nightmare! More than 5000 lines of code with hundreds of includes!).
The final output of the run is a file called output_utm.txt containing among others two fields X and Y with geographic coordinates (longitude and latitude) referring to a specific reference geographic system called utm (universal trasverse mercatore).
The file utm.tmp comes from the file output_utm.txt: in fact it is the extraction of two tokens X and Y with about 1600 rows (i.e. the information to be processed by traspunto.exe).
traspunto.exe is a compiled fortran exe which source code UNFORTUNATELY I do not have access; it’s a third party exe performing a real complex transformation of coordinates (fourier transformation). It is forced to read the whole utm.tmp when running creating the resultant output file holding the transposed lines from input. The following command:
traspunto.exe f pia pia en utm.tmp ED50 ROMA40 32 O 32 O >gbo.tmp
means: apply traspunto.exe to the file (f) utm.tmp in order to transform planar (pia pia) geographic coordinates in two fields X Y longitude latitude (en) from one reference system (ED50 32 O) to another (ROMA40 32 O) and finally store results in the file >gbo.tmp
the input file utm.tmp must have a fixed format: just two fields X and Y with rows containing information to be processed.
Finally I need to merge/substitute the information in gbo.tmp with those in output_utm.txt and save result in a file output_gbo.txt to be imported in a GIS (geographic information system) just working with a specific reference system (GBO gauss-boaga).

Now you may wonder why I’m not throwing out a piece of code in fortran to perform this task and I’m hanging on so much obstinately on batch scripting; it’s quick to say because I need to redistribute the tool to others which are not much familiar with programming (can you imagine someone less than me?) and give them a solution at click hand.
Bye
Max

ps ten minutes repeated for hundreds of times makes a big difference!


Report Offensive Follow Up For Removal

Response Number 18
Name: IVO
Date: March 28, 2008 at 06:46:11 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Well maxbre,

I suppose to have understood the core of the problem, let me get the time to work out the solution you need using what you have at your hand.

See you as soon as possible
Ciao
Ivo


Report Offensive Follow Up For Removal

Response Number 19
Name: IVO
Date: March 28, 2008 at 07:12:41 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Hi maxbre,

the following slightly modified batch should be tailored to your purpose. It generates a one row intermediate file (utm.tmp) for each native input line to be transposed. There is no need to call subroutines, just the transpose.exe application. If run time is suitable and the solution correct you are done, otherwise let me know as other ways are aheds.

::merge.bat
@echo off > output_gbo.tmp
setLocal EnableDelayedExpansion

set tk=?
for /f "skip=2 tokens=1-5" %%a in (output_utm.txt) do (
if !tk!==? (
echo.%%a %%b %%c %%d %%e>> output_gbo.tmp
set tk=#
) else (
echo.%%a %%b> utm.tmp
traspunto.exe f pia pia en utm.tmp ED50 ROMA40 32 O 32 O >gbo.tmp
set /p tk=<gbo.tmp
echo.!tk! %%c %%d %%e>> output_gbo.tmp
)
)
del *.tmp
::end of batch


Report Offensive Follow Up For Removal

Response Number 20
Name: maxbre
Date: March 31, 2008 at 01:28:49 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

great ivo

it works perfectly, also the run time is really optimised!

thanks
max

ps how many alternative solutions you have at your hands?


Report Offensive Follow Up For Removal

Response Number 21
Name: IVO
Date: March 31, 2008 at 02:39:21 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

Salve maxbre,

there are indeed alternative ways to handle that problem by batch, if the proposed one failed, but quoting a sci-fi movie "the future must be left unknown". Just let us say the roads not taken are quite tricky.

Your questions are unconventional and challenging, so other than posting future questions, you may contact me by e-mail using Computing.net messaging system too (in private message speaking our mother tongue).

Ciao
Ivo


Report Offensive Follow Up For Removal

Response Number 22
Name: maxbre
Date: March 31, 2008 at 02:49:28 Pacific
Subject: batch merge/substitute tokens
Reply: (edit)

ciao ivo

thanks again, I agree with you that now it's time to set up the words "the end" to this long thread

max

ps I'll eventually keep in touch, your batch mastering has been of great use indeed


Report Offensive Follow Up For Removal






Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: batch merge/substitute tokens 

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software