Solved File Manipulation based on column data

January 27, 2019 at 17:13:34
Specs: Windows 10
Hi,

I am new to Batch scripting, can some one give me a idea on how to do the below file manupulation.


I have a pipe delimiter file in the below format,


ABC|123|456|789|H687|9ghf5|896988
TRE|112453|456|7489|H5687|9ghf5|896988
ABC|1233|4526|3789|H6t87|9ghuf5|89698
ABC|1232|4156|789|H68r7|9ghf5h|898698
CDR|123345|456|789|H6r87|9gyhf5|898698


I want to check if the second column length is greater than 4 characters based on last two characters, I need to look up the below reference file


53 - 432|567|789|4566
45 - 455|56545|456|4666

And create a new file with the below data. The reference file can have N no of combinations, I need to create each record for the values by taking the source record and replacing the second column value with a reference value

TRE|432|456|7489|H5687|9ghf5|896988
TRE|567|456|7489|H5687|9ghf5|896988
TRE|789|456|7489|H5687|9ghf5|896988
TRE|4566|456|7489|H5687|9ghf5|896988

CDR|455|456|789|H6r87|9gyhf5|898698
CDR|56545|456|789|H6r87|9gyhf5|898698
CDR|456|456|789|H6r87|9gyhf5|898698
CDR|4666|456|789|H6r87|9gyhf5|898698

Thank you,
Nee Ayya


See More: File Manipulation based on column data

Report •

#1
January 28, 2019 at 00:29:55
✔ Best Answer
Ok, here's a trial. 'for1' and 'for2' are to be substituted for the actual filenames.
Not elegant, but hell, it's 3am and I've imbibed many beers.
@echo off & setlocal
for /f "tokens=1,2* delims=|" %%a in (for1) do call :xx %%a %%b "%%c"
goto :eof

:xx
set a=%2
set a=%a:~4%
if not defined a goto :eof
for /f "tokens=1,2 delims=-|" %%b in ('find "%a% - " ^<for2') do (
if "%%c" equ "" goto :eof
set kk=%%c
)
set kk=%kk: =%
set b=%3
set b=%b:|=^|%
echo %1^|%kk%^|%b:"=%


Report •

#2
January 28, 2019 at 05:49:58
Personally, I suggest don't use any batch scripting for the file modification, if the file size is huge you will end up waiting for the script to get completed and it will never get completed and second you can't add any exception handling.

The sample script which is posted is missing the basic design, what if you need to change the lookup on 3rd and 4th instead of 5th and 6th character, how will you parametrized the filenames and how will you clean them, and what if there are more reference values in the reference files. All these are not handled by the above script and will not be handled by batch scripting for that matter.

You can achieve this by following few simple steps, to know more email me @ markprobenzing@gmail.com

message edited by Marklsh


Report •

#3
January 28, 2019 at 07:09:49
I prefer Win10's scripting language, PowerShell, to batch script, but I normally do:
$rep = @{} 
gc file2.psv | 
 % { 
   $r = $_ -split ' - ' 
   $rep.Add($r[0], ($r[1] -split '\|'))
 }
gc file1.psv | 
 % {
   $line = $_ -split '\|', 3
   if ($line[1].Length -gt 4) {
     $key = $line[1].Substring($line[1].Length - 2)
     $rep[$key] | 
      % {
        $line[0], $_, $line[2] -join '|'
      } -End { [string]::Empty }
   }
 } |
 Out-File outFile.psv

How To Ask Questions The Smart Way

message edited by Razor2.3


Report •

Related Solutions

#4
January 29, 2019 at 18:40:50
Thanks for the script, I tried and its working fine for the first value. I am getting the below lines

TRE|432|456|7489|H5687|9ghf5|896988
CDR|455|456|789|H6r87|9gyhf5|898698

But the lookup file can have any number of values separated by pipe, how to loop the script to generate all the values

Lookup file values

53 - 432|567|789|4566
45 - 455|56545|456|4666

Generated file based on the lookup file

TRE|432|456|7489|H5687|9ghf5|896988
TRE|567|456|7489|H5687|9ghf5|896988
TRE|789|456|7489|H5687|9ghf5|896988
TRE|4566|456|7489|H5687|9ghf5|896988

CDR|455|456|789|H6r87|9gyhf5|898698
CDR|56545|456|789|H6r87|9gyhf5|898698
CDR|456|456|789|H6r87|9gyhf5|898698
CDR|4666|456|789|H6r87|9gyhf5|898698

Thank you,
NeeAyya

message edited by NeeAyya


Report •

#5
January 29, 2019 at 19:43:09
ok, here's a looped version: I missed that detail! This delivered the goods in my tests on my end with your sample files/data...
@echo off & setlocal
for /f "tokens=1,2* delims=|" %%a in (for1) do call :xx %%a %%b "%%c"
goto :eof

:xx
set a=%2
set a=%a:~4%
if not defined a goto :eof
for /f "tokens=1* delims=-" %%b in ('find "%a% - " ^<for2') do (
if "%%c" equ "" goto :eof
set kk=%%c
)
:: now loop on each item delimited with the pipes in file2
:: sending: 1=alpha, 2=main key, 3=sub-key, 4="junk"-string
set kk=%kk:|= %
for %%k in (%kk%) do call :aa %1  %a%  %%k %3
goto :eof

:aa
set b=%4
set b=%b:|=^|%
rem oops: on examining your output, I saw that element #2 was not wanted.
rem echo %1^|%2^|%3^|%b:"=%
echo %1^|%3^|%b:"=%

message edited by nbrane


Report •

#6
January 30, 2019 at 07:04:19
Thanks a lot for the script, I am trying to understand how the script is working, I have some other requirement and I am trying to do it on my own before asking for help.

Where in the script are we checking for the 5th and 6th character from the source file second column to the lookup at the reference file values.

Thanks a lot for your time and help


Report •

#7
January 30, 2019 at 13:29:20
I respect someone who actually tries to learn and understand things!
This is where it both obtains the value and tests for it:
set a=%2
set a=%a:~4%
:: (the following references the var. 'a', but the "if defined" syntax does NOT want the percent signs)
if not defined a goto :eof
The following:
CDR|123345|456|789|H6r87|9gyhf5|898698
sends the second token, 123345 to the subroutine, which assigns it to var. 'a' (%a%).
Next statement says "take the last 4 characters from the end of %a%. (See: set /?). If %a% is shorter than 4 characters, then a null is returned and a becomes undefined. Otherwise %a% becomes "3345".
See also if not already: For /?, If /?, Call /? etc

Report •

Ask Question