Computing.Net > Forums > Unix > Merging two data files

Merging two data files

Reply to Message Icon

Original Message
Name: cocoacat
Date: January 28, 2006 at 23:33:37 Pacific
Subject: Merging two data files
OS: unix
CPU/Ram: -
Comment:

I have some broblem about merging two files. this is my perl code which is not completed.

$filename = "delete.txt";
unless ( -e $filename ) {
print "The file $filename does not seem to exist \n";
exit;
}
print "\nThe file $filename exist and will be uploaded.\n";
unless ( open(FILEA , $filename)) {
print "Can not open $filename \n";
exit;
}
my @readdata = <FILEA>;
close FILEA;

$filenameB = "delete2.txt";
unless ( -e $filenameB ) {
print "The file $filenameB does not seem to exist \n";
exit;
}
print "\nThe file $filenameB exist and will be uploaded.\n";
unless ( open(FILEB , $filenameB)) {
print "Can not open $filenameB \n";
exit;
}
my @readdataB = <FILEB>;
close FILEB;

#---extracting term from the first array
my $linenum = @readdata;
my $currline = 0;
for($currline = 0; $currline < $linenum; $currline++)
{
my @splitdataA = split(" ", $readdata[$currline]);
my $geneA = $splitdataA[2];
push (@exgenesA, $geneA);
}

my $linenumB = @readdataB;
my $currlineB = 0;
for($currlineB = 0; $currlineB < $linenumB; $currlineB++)
{
my @splitdataB = split(" ", $readdataB[$currlineB]);
my $geneB = $splitdataB[1];
push (@exgenesB, $geneB);
}

@union = @isect = ();
%union = %isect = ();
foreach $e(@exgenesB)
{
$union{$e} = 1;

}
foreach $e(@exgenesA)
{
if ($union{$e})
{
$isect{$e} = 1;
}
$union{$e} =1;
}
@union = keys %union;
@isect = keys %isect;
print join (",", sort @isect);
exit;
--------------------
input file (delete.txt)
cd A1 B1 0.1
cd A2 B2 0.3
cd A3 B4 0.2
cd A5 B3 0.2
cd A6 B3 0.2
---------------------------
input file (delete2.txt)
ab B1 A1 0.2
ab B2 A2 0.3
ab B3 A4 0.2
ab B5 A3 0.2
---------------------------
I tried search for pair of matching data
the results should be
Output :
A1 B1 0.1 B1 A1 0.2
A2 B2 0.3 B2 A2 0.3
A3 B3 0.2 B3 A4 0.2
---------------------------
But in my code, i can only get intersec between them that are B1,B2 and B3.
Could you please suggest me about extracting output like as over output.
Thank you so much.



Report Offensive Message For Removal


Response Number 1
Name: cocoacat
Date: January 30, 2006 at 18:50:31 Pacific
Reply: (edit)

I tried to do my code to search matching pair in two files,shown like this :

open(IF1, "data1.txt") or die "Error opening data file: $!\n";
my @a;
while (<IF1>)
{
chomp;
my @fields=split;
for(my $j=0;$j<@fields;$j++){
push (@a,@fields[$j]);

}
}
close(IF1);
open(IF2, "data2.txt") or die "Error opening data file: $!\n";
my @b;
while (<IF2>)
{
chomp;
my @fields=split;
for(my $j=0;$j<@fields;$j++){
push (@b,@fields[$j]);

}
}
close(IF2);

my $count = 0;
for (my $i = 0; $i <8; $i++)
{
for (my $t = 0; $t <8; $t++)
{
# 8 mean total number of column, $i mean row, 1 and 2 mean position of column
if ((@a[8*$i+1] eq @b[8*$t+2])and(@a[8*$i+2] eq @b[8*$t+1]))
{
$count++; for (my $k=1; $k<8; $k++)
{
my @c = push(@c,@a[(8*$i)+$k]);
}
for (my $k=1; $k<8; $k++)
{
my @c = push(@c,@b[(8*$t)+$k]);
}
}
}
}
print @c;
---------
data1.txt
cc A1 B1 7e-14 149 33 74.3 181
cc A2 B3 5e-13 72 45 71.6 174
cc A3 B5 1e-11 152 30 64.7 156
cc A4 B6 1e-10 175 26 63.5 153
cc A5 B7 5e-10 95 33 62.8 151
---------
data2.txt
aa B1 A1 4e-13 207 23 56.6 135
aa B2 A1 5e-13 207 23 56.6 135
aa B3 A2 6e-13 72 45 71.6 174
aa B4 A3 7e-12 163 31 69.3 168
aa B5 A3 8e-11 152 30 64.7 156
aa B6 A3 9e-10 175 26 63.5 153
--------
The result showed like this :
A1
B1
7e-14
149
33.5570469798658
74.3
181
B1
A1
4e-13
207
23.6714975845411
56.6
135
A1
B1
6e-14
149
33.5570469798658
74.3
181
B1
A1
4e-13
207
23.6714975845411
56.6
135
A2
B3
5e-13
72
45.8333333333333
71.6
174
B3
A2
6e-13
72
45.8333333333333
71.6
174
A3
B5
1e-11
152
30.9210526315789
64.7
156
B5
A3
8e-11
152
30.9210526315789
64.7
156
-----------
I have some problem about lines. I want lines like this in one time of matching :

A1 B1 7e-14 149 33 74.3 181 B1 A1 4e-13 207
23 56.6 135

A2 B3 5e-13 72 45 71.6 174 B3 A2 6e-13 72
45 71.6 174

A3 B5 1e-11 152 30 64.7 156 B5 A3 8e-11 152
30 64.7 156

you can see matching pair like that. Please suggest me about code or output like that. Thank you so much. Now I feel my code is very slow for running data which have hundred thousand lines.



Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Merging two data files

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 4 Days.
Discuss in The Lounge