batch merge files on common column

November 21, 2010 at 12:45:43
Specs: Windows 7, 2.2GHz / 4GB
I am struggling to combine two or more files that share a common identifier into a single file using a DOS batch file. I am using DOS version 6.1.7600. The files look like this:

File1

ID stuff1 stuff2 stuff3 stuff4 goodies_N_1

File2 has the same format.

File2

ID stuff1 stuff2 stuff3 stuff4 goodies_N_2

I would like to join these to get File3 with the following:

File3

ID goodies_N_1 goodies_N_2

I could add more files and the column with goodies_N_x increases.

Can any one help?


See More: batch merge files on common column

Report •

#1
November 21, 2010 at 14:26:26
Windows 7, 2.2GHz / 4GB
DOS version 6.1.7600
Huh? Are you using a 17-year-old OS or the one released last year?

How To Ask Questions The Smart Way


Report •

#2
November 21, 2010 at 14:31:38
No, not 17 years old. If I type ver at the DOS prompt I get
Microsoft Windows [Version 6.1.7600]

Running Windows 7 if that is a help.

Is this not the correct way to get the version?


Report •

#3
November 22, 2010 at 10:57:42
Are both, or all, files one-liners?


=====================================
Life is too important to be taken seriously.

M2


Report •

Related Solutions

#4
November 22, 2010 at 12:53:59
Both files have exactly the same format but the files have hundreds of lines. Speed is not a factor in combining the files.

ID stuff1 stuff2 stuff3 stuff4 goodies_N_1
ID stuff1 stuff2 stuff3 stuff4 goodies_N_1
ID stuff1 stuff2 stuff3 stuff4 goodies_N_1
ID stuff1 stuff2 stuff3 stuff4 goodies_N_1

The "stuff" are characters or words separated by white space.


Report •

#5
November 22, 2010 at 13:04:00
Please post three lines of each.


=====================================
Life is too important to be taken seriously.

M2


Report •

#6
November 23, 2010 at 08:58:06
Here is 3 lines of file1

ID stuff1 stuff2 stuff3 stuff4 goodies_N_1
ID stuff1 stuff2 stuff3 stuff4 goodies_N_1
ID stuff1 stuff2 stuff3 stuff4 goodies_N_1

Here is 3 lines of file 2

ID stuff1 stuff2 stuff3 stuff4 goodies_N_1
ID stuff1 stuff2 stuff3 stuff4 goodies_N_1
ID stuff1 stuff2 stuff3 stuff4 goodies_N_1

the output file should look like

ID goodies_N_1 goodies_N_2
ID goodies_N_1 goodies_N_2
ID goodies_N_1 goodies_N_2

Thanks for considering this problem.


Report •

#7
November 23, 2010 at 21:16:37
@echo off > newfile & setLocal enableDELAYedeXpansion

set N=
for /f "tokens=* delims= " %%a in (file1) do (
  set /a N+=1 & call :sub1 %%a & set O!N!=!C!
)

set N=
for /f "tokens=* delims= " %%a in (file2) do (
  set /a N+=1 & call :sub1 %%a & set T!N!=!C!
)

for /L %%a in (1 1 !N!) do (
  >> newfile echo.ID !O%%a! !T%%a!
)
goto :eof

:sub1 set C to last token
:loop
  if '%2' neq '' (
    shift
    goto :loop
  )
set C=%1
goto :eof


=====================================
Life is too important to be taken seriously.

M2


Report •

#8
November 24, 2010 at 08:18:36
Thank you for the response and your solution. As I am new to posting questions, I have learned the value of posting a clear description of the problem. Your script does exactly what I requested. It also shows me where I had critical omissions in my question and description.
In my example of a file, I used "ID" to imply a variable like stuff1, etc. You interpreted this as a constant and your script writes the letters ID at the start of each line. Can this be modified to use instead the value of ID found in column 1 of the input files. The ID could be a text string or a number. When the two files are joined, can they be matched on the value of ID so that the values in the "goodies" columns correspond to the same ID?
In my search I found two scripts that sparked ideas for improvements. You contributed to a solution at http://www.computing.net/answers/pr... where a DIR command was done and the file names were used in the output file as column headers. Very nice and useful! Is this a possible enhancement? You also contributed to a solution at http://www.computing.net/answers/pr... where two files were joined. Nice. Thanks for these contributions.

Given all of that, here is a modified version of the file formats:

Here is 3 lines of file1

175 stuff1 stuff2 stuff3 stuff4 goodies_N_1
120 stuff1 stuff2 stuff3 stuff4 goodies_N_1
100 stuff1 stuff2 stuff3 stuff4 goodies_N_1

Here is 3 lines of file 2

100 stuff1 stuff2 stuff3 stuff4 goodies_N_2
120 stuff1 stuff2 stuff3 stuff4 goodies_N_2
175 stuff1 stuff2 stuff3 stuff4 goodies_N_2

the output file should look like

ID Filename1 Filename2
100 goodies_N_1 goodies_N_2
120 goodies_N_1 goodies_N_2
175 goodies_N_1 goodies_N_2

Thanks again for considering this request and the time you spend on it.


Report •

#9
November 24, 2010 at 18:23:11
Not clear. What's the output from these?

file1

175 stuff1 stuff2 stuff3 stuff4 goodies_N_1a
120 stuff1 stuff2 stuff3 stuff4 goodies_N_1b
100 stuff1 stuff2 stuff3 stuff4 goodies_N_1c

file2

100 stuff1 stuff2 stuff3 stuff4 goodies_N_2d
120 stuff1 stuff2 stuff3 stuff4 goodies_N_2e
175 stuff1 stuff2 stuff3 stuff4 goodies_N_2f


=====================================
Life is too important to be taken seriously.

M2


Report •

#10
November 24, 2010 at 19:56:53
the output file must have been cut from my reply but it should look like this:

ID Filename1 Filename2
100 goodies_N_1 goodies_N_2
120 goodies_N_1 goodies_N_2
175 goodies_N_1 goodies_N_2

The contents of the two files are matched by the ID field and then combined on the same line. The column headers are the names of the file from which the data are taken.


Report •

Ask Question