Extracting rows from pipe delimited file

Dell / INSPIRON 9300
June 8, 2009 at 12:52:32
Specs: Windows XP
Hi everyone,

I am new to this forum and would appreciate your help.

I am trying to extract rows from a pipe-delimited flat file that is too large to load into excel.

One of my search criteria is to compare on the same line, two address fields. If different, then output the entire line into a different text file. Then I would load this into excel and manipulate accordingly.

A second search criteria is more complicated. Data (property address) can be on multiple lines (one after another). For each unique property address, if there is more than one row that has the same value for a date field, output all lines with same address and same date field to a different flat file.

Any help would be appreciated. I used to program in Perl in Unix, but it has been over 15 years. Now, I would like to do this in Windows XP but don't know how to maneuver anymore. I would be open to other Windows XP scripting language if it would make life simpler such as batch.

Thank you in advance for all your help.
Duane.


See More: Extracting rows from pipe delimited file

Report •


#1
June 8, 2009 at 16:47:40
if you know Perl on Unix, then you can do the same with Windows. Go download Perl for windows from Activestate and you can start programming your task.

Report •

#2
June 9, 2009 at 08:08:48
Hello ghostdog,

Thank you for your reply. Unfortunately, I have been out of programming for 15 years now and am looking for help on how to do this in perl or in any other language.

If you have any tips on simple programming to accomplish above, I would certainly appreciate it.

Thanks again for your response,
Duane.


Report •

#3
June 9, 2009 at 08:31:56
>>I have been out of programming for 15 years now and am looking for help on how to do this in perl or in any other language.

firstly , remember 15 yrs ago, how did you learn Perl. reading a Perl book? it will be the same. Please type perldoc perl and read the documentation.

secondly, i (or anybody ) work well with examples , especially if you describe with example of sample input files, sample output you expect. after reading your whole first post, i still don't understand what you are trying to do.

lastly, on a personal note, while Perl can be used to do stuff, i prefer Python due to its clear syntax and powerful string manipulation capabilities (among other things, like modules). anyway, i am most willing to help , but first put your problem across more clearly with examples.


Report •

Related Solutions

#4
June 10, 2009 at 14:53:14
If you post enough lines and explain the field layout, I'll work on it.


=====================================
If at first you don't succeed, you're about average.

M2


Report •

#5
June 12, 2009 at 13:35:51
Hello ghostdog and Mechanix2Go,

Thank you again for your help. I've documented this as specific as I can. Please read below and let me know if I'm missing anything.

Have a good one!
Duane.

Given:
8th field=Property Location
17th field=Deed Date
Property location is not sorted and can appear in any order.

Criteria:
For each unique property location, if the deed date of one row is the same as another row, output all rows with the same deed date for that property location.

Example: "157 TEMPLE ST" is a property location that has matching deed dates. The output would be 2 rows with deed date "5/21/2002" for this one location.

Goal: Write code to run in Windows XP to extract rows into another file that meet the criteria above.
I would like to have the application syntax as such: runprogram <input filename> <output filename>

Sample CSV file (if there's a formatting issue, i can send by email if you wish):
Municipality,Grantor's Name (Seller),Grantor's Mailing Address,Grantor's City/State/Zip,Grantee's Name (Buyer),Grantee's Mailing Address,Grantee's City/State/Zip,Property Location,Block,Lot,Qual,Property Class,Land Assmnt,Building Assmnt,Total Assmnt,Recorded Date,Deed Date,Book,Page,Sale Price,NU Code,Sq. Ft.,Cl. 4 Use,SR1A #,Yr. Built
1608,SALAFIA JOSEPH,187 VALLEY RD,WAYNE NJ 00000,GREGORIO ANTONIO,157 SHERIDAN AVE,PATERSON NJ 00000,157 SHERIDAN AVE,B0176,21,,2,1200,20200,21400,8/7/2001,6/28/2001,283,272,173200,,0,,6647996,0
1608,"RIVERA,EDDIE & ENES",165 20TH AVE,PATERSN NJ 00000,"RIVERA,ENES",157 TEMPLE ST,PATERSON NJ 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,9/6/1989,3/28/1989,B125,459,0,26,0,8,1663663,0
1608,"RIVERA,ENES",128 NOERH 15TH ST,PROSPECT PARK NJ 07508,"MALDONADO,JOSE",353 VAN HOUTEN ST,PATERSON NJ 07501,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,5/9/1991,4/29/1991,T128,515,60000,,0,9,1969940,0
1608,MALDONADO J JOSE,932 EAST 19TH ST,PATERSON NJ 00000,MALDONADO JOSE & MALDONADO N ETALS,932 EAST 19TH ST,PATERSON NJ 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,7/10/1998,6/25/1998,G156,291,0,25,0,,5372202,0
1608,SHERIFF RON FAVA,COUNTY OF PASSAIC,PATERSON NJ 00000,FAIRBANKS CAPITAL CORP,3815 SOUTH WEST TEMPLE,SALT LAKE CITY UTAH 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,1/9/2002,12/10/2001,351,107,0,12,0,,6653087,0
1608,TCIF RE02 LLC,3815 S WEST TEMPLE,SALT LAKE CITY UT 00000,BARNA NORMAN,12 RAVINE CT,CLIFTON NJ 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,6/6/2002,5/21/2002,427,174,100000,,0,,6657954,0
1608,TCIF LLC,3815 SOUTH W TEMPLE,SALT LAKE CITY UT 00000,TCIF RE02 LLC,3815 S OUTH W TEMPLE,SALT LAKE CITY UT 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,6/6/2002,5/21/2002,427,167,0,25,0,,6657953,0
1608,BARNA NORMAN,12 RAVINE CT,CLIFTON NJ 00000,WILLIAMS CAROLYN,218 PROSPECT AVE 2A,HACKENSACK NJ 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,10/18/2002,9/30/2002,493,273,300000,,0,,7202142,0
1608,WALKER JOHN & VIVLOIS H/W,2595 HALLEY BERRY TRAIL,SNELLVILLE GEORGIA 00000,MEDRANO MEREDITH,32 N 6TH ST,PATERSON NJ 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,2/11/2004,1/30/2004,736,111,106000,,0,,7756035,0
1608,LUCIANO DANERIS,19 12TH AVE,PATERSON NJ 00000,MIDWAY PROPERTIES LLC,25 AVE PORT IMPERIAL 802,WEST NY NJ 00000,157 TEMPLE ST,A0058,1,,2,1400,15000,16400,7/25/2006,6/28/2006,1274,226,300000,,0,,9271608,0
1608,FERREIRA ANGEL,34 N 6TH ST,PATERSON NJ 00000,FERREIRA ANGEL,34 N 6TH ST,PATERSON NJ 00000,34 N 6TH ST,A0058,2,,2,1400,20600,22000,4/17/2002,3/11/2002,391,182,0,25,0,,6656776,0
1608,FERREIRA ANGEL ETAL,34 N 6TH ST,PATERSON NJ 00000,FERREIRA ANGEL,34 N 6TH ST,PATERSON NJ 00000,34 N 6TH ST,A0058,2,,2,1400,20600,22000,12/29/2003,10/21/2003,715,151,0,25,0,,7755284,0
1608,WILLIAMS CAROLYN,157 TEMPLE ST,PATERSON NJ 00000,LUCIANO DANERIS,157 TEMPLE ST,PATERSON NJ 00000,34 N 6TH ST,A0058,2,,2,1400,20600,22000,7/8/2004,6/29/2004,804,237,360000,,0,,8249296,0
1608,MANGUAL ESTHER & JUANITO H/W,34 N 8TH ST,PATERSON NJ 00000,GUERRERO MARCUS & GUERRERO YANIRIS,34 N 8TH ST,PATERSON NJ 00000,34 N 8TH ST,B0123,5.A,,2,1400,11200,12600,4/1/2005,8/2/2004,969,268,200000,,0,,8258811,0
1608,GUERRERO MARCOS ETAL,34 N 8TH ST,PATERSON NJ 00000,VARGAS CARMEN,34 N 8TH ST,PATERSON NJ 00000,34 N 8TH ST,B0123,5.A,,2,1400,11200,12600,8/11/2006,6/20/2006,1285,296,330000,,0,,9275061,0
1608,"ENGLEHARDT,EDWIN SHERIFF",COUNTY OF PASSAIC,PASSAIC NJ 00000,BANKERS TRUST,6700 S POINT PKY SUT 500,X 00000,34 NORTH 6TH ST,A0058,2,,2,1400,15500,16900,2/15/1995,1/20/1995,V138,270,100,12,0,9,3348495,0
1608,BANKERS TRUST CO OF CA,11000 BROKEN LAND PKWY,COLUMBIA MD 00000,PRP ASSOC,34 N SIXTH ST,PATERSON NJ 00000,34 NORTH 6TH ST,A0058,2,,2,1400,15500,16900,11/22/1995,10/16/1995,L142,90,35900,26,0,9,3711500,0


Report •

#6
June 12, 2009 at 15:47:02
uh... I don't see any pipes there.


=====================================
If at first you don't succeed, you're about average.

M2


Report •

#7
June 12, 2009 at 23:21:04
yes, you are correct. The pipe delimited file to which I was referring was another file that I was attempting to manipulate at the time. Therefore, I should have stated a CSV file vs. pipe delimited.

Apologies for that,
Duane.


Report •

#8
June 12, 2009 at 23:45:27
if you have Python on Windows
import os,sys,csv
filename = sys.argv[1]
outputfile = sys.argv[2]
pos1=int(sys.argv[3])
pos2=int(sys.argv[4])
lines={}
count={}
reader = csv.reader(open(filename))
writer = csv.writer(open(outputfile,"w"),delimiter=",")
for n,row in enumerate(reader):
    row[pos2] = row[pos2].strip()
    row[pos1] = row[pos1].strip()
    count.setdefault(row[pos2],0)
    lines.setdefault(row[pos2] , []) 
    count[row[pos2]] = count[row[pos2]] + 1
    lines[row[pos2]].append(row)
for i,j in count.iteritems():
    if j > 1:
        for o in lines[i]:           
           writer.writerow(o)

usage:

c:\test> python test.py <input> <outfile> <pos1> <pos2>

here's an executableyou can run in batch if you don't want to download Python

C:\test>test.exe file.txt  outfile.txt 7 16

C:\test>more outfile.txt
1608,"ENGLEHARDT,EDWIN SHERIFF",COUNTY OF PASSAIC,PASSAIC NJ 00000,BANKERS TRUST,6700 S POINT
TH 6TH ST,A0058,2,,2,1400,15500,16900,2/15/1995,1/20/1995,V138,270,100,12,0,9,3348495,0
1608,BANKERS TRUST CO OF CA,11000 BROKEN LAND PKWY,COLUMBIA MD 00000,PRP ASSOC,34 N SIXTH ST,P
6TH ST,A0058,2,,2,1400,15500,16900,11/22/1995,1/20/1995,L142,90,35900,26,0,9,3711500,0
1608,"RIVERA,EDDIE & ENES",165 20TH AVE,PATERSN NJ 00000,"RIVERA,ENES",157 TEMPLE ST,PATERSON
58,1,,2,1400,15000,16400,9/6/1989,5/21/2002,B125,459,0,26,0,8,1663663,0
1608,TCIF RE02 LLC,3815 S WEST TEMPLE,SALT LAKE CITY UT 00000,BARNA NORMAN,12 RAVINE CT,CLIFTO
0058,1,,2,1400,15000,16400,6/6/2002,5/21/2002,427,174,100000,,0,,6657954,0
1608,TCIF LLC,3815 SOUTH W TEMPLE,SALT LAKE CITY UT 00000,TCIF RE02 LLC,3815 S OUTH W TEMPLE,S
 TEMPLE ST,A0058,1,,2,1400,15000,16400,6/6/2002,5/21/2002,427,167,0,25,0,,6657953,0



Report •

#9
June 13, 2009 at 00:42:38
hello ghostdog,

thank you so much for your help! i will test this out tomorrow as it is getting late. but i did want to let you know how much i appreciate it :)

thanks again,
duane.


Report •


Ask Question