Computing.Net > Forums > Programming > Combine duplicate records

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Combine duplicate records

Reply to Message Icon

Name: gdet
Date: November 17, 2008 at 05:15:33 Pacific
OS: UNIX
CPU/Ram: UNIX box
Product: UNIX
Comment:

I have an extract of a csv file that looks like this:

5619,Marcus Lindsay Asher,2005/06,3500,525
5806,John Clifford,1999/00,3000,450
5806,John Clifford,2000/01,2000,300
5887,Geoffery William,2005/06,5000,750

I would like to combine duplicate records into a single record with the new single record containing additional fields (for example the last 3 fields). In the example file above, the first field is the unique field. So I would like my output to be like below:

5619,Marcus Lindsay Asher,2005/06,3500,525
5806,John Clifford,1999/00,3000,450,2000/01,2000,300
5887,Geoffery William,2005/06,5000,750

Can someone please please help me with a solution using Unix scripting, awk or C, whichever is simpler.

Thank you.



Sponsored Link
Ads by Google

Response Number 1
Name: nails
Date: November 17, 2008 at 09:58:00 Pacific
Reply:

One way is this Korn shell script. In a normal break point report such as this, the script assumes the datafile is sorted by the first field key:



#!/bin/ksh

fflag=0
while IFS="," read f1 f2 f3 f4 f5
do
if [[ $fflag = 0 ]]
then
fflag=1
sf1="$f1"
printf "%s,%s,%s,%s,%s" "$f1" "$f2" "$f3" "$f4" "$f5"
continue
fi

if [[ "$f1" != "$sf1" ]]
then # changes
printf "\n%s,%s,%s,%s,%s" "$f1" "$f2" "$f3" "$f4" "$f5"
else # does NOT change
printf ",%s,%s,%s" "$f3" "$f4" "$f5"
fi

sf1="$f1"
done < datafile
echo ""

Of course, this particular script only works for a comma-delimited record of 5 fixed fields.


0

Response Number 2
Name: gdet
Date: November 17, 2008 at 16:16:31 Pacific
Reply:

Thank you nails. Almost there. Only problem left is the third printf (when it does not change) writes the f3, f4 and f5 at the beginning of the line. So the output becomes as below:

5619,Marcus Lindsay Asher,2005/06,3500,525
,2000/01,2000,300d,1999/00,3000,450
5887,Geoffery William,2005/06,5000,750


0

Response Number 3
Name: gdet
Date: November 17, 2008 at 16:52:43 Pacific
Reply:

Actually nails, the output is correct. Only thing is the third printf (when it does not change) writes f3, f4 and f5 on a new line instead of appending at the end. So my output becomes as below:

5619,Marcus Lindsay Asher,2005/06,3500,525
5806,John Clifford,1999/00,3000,450
,2000/01,2000,300
5887,Geoffery William,2005/06,5000,750


0

Response Number 4
Name: nails
Date: November 17, 2008 at 19:43:45 Pacific
Reply:

I checked my code again, and I'm not having any trouble. The only way I can duplicate your error is placing a newline at the end of the 2nd printf:

printf "\n%s,%s,%s,%s,%s\n" "$f1" "$f2" "$f3" "$f4" "$f5"

This is incorrect as it should be this:

printf "\n%s,%s,%s,%s,%s" "$f1" "$f2" "$f3" "$f4" "$f5"

I don't know what else to tell you. BTW, what unix are you using? I'm using Solaris 9.

If your code is different from mine, you can post it and I'll take a look at it.


0

Response Number 5
Name: gdet
Date: November 17, 2008 at 23:00:07 Pacific
Reply:

I think you're right. I got that issue when I ran the script against a small test file. When I modified the script to process a bigger csv file with more fields, I didn't have the issue. So I'm not quite sure why the output behaves like that.
Anyway, for now thank you so much for your help! :)


0

Related Posts

See More



Response Number 6
Name: Sen Hu
Date: February 13, 2009 at 09:45:11 Pacific
Reply:


OK. I could not resist the temptation to write a script for Windows - I am sure that there are users who want to do similar things with CSV files in Windows also.

Assuming that the input CSV (Comma-Separated-Values) file is X.csv, and the output is to go to Y.csv, the following script will combine duplicate records in the input CSV file. (For biterscripting, go to http://www.biterscriptng.com . I think is free.)

Sen

# SATRT OF SCRIPT
# Read in the entire file
var str data, records, new_records, record, new_record, key ; cat X.csv > $records

# Process records one by one
while ($records <> "")
do
lex -e "1" $records > $record

# Get unique key and data
stex -p "]^,^2" $record > $key # The first field is key.
stex -p "[^,^2" $record > $data # The second comma onward is data.

# Check if this key exists in new_data.
var str key_search ; set $key_search = "^"+$key+"^"
if ( {sen $key_search $new_records} == 0)
# This key does not exist in new_data. Just add this entire row.
set $new_records = $new_records + $record + "\n"
else
do
# This key already exists in new_data. Just append data to that row.
var str key_arg ; set $key_arg = "["+$key_search
stex $key_arg $new_records > $new_record
lap "1" $data $new_record >null
set $new_records = $new_records + $new_record +"\n"

done
endif
done

# Write new csv to Y.csv
echo $new_records > Y.csv


0

Sponsored Link
Ads by Google
Reply to Message Icon






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Programming Forum Home


Sponsored links

Ads by Google


Results for: Combine duplicate records

Doubt in Sql query building www.computing.net/answers/programming/doubt-in-sql-query-building/17694.html

Duplicate rows in SQL Joins www.computing.net/answers/programming/duplicate-rows-in-sql-joins/12430.html

error trap on P Key duplicate vb6 www.computing.net/answers/programming/error-trap-on-p-key-duplicate-vb6/12746.html