Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
I have an extract of a csv file that looks like this:
5619,Marcus Lindsay Asher,2005/06,3500,525
5806,John Clifford,1999/00,3000,450
5806,John Clifford,2000/01,2000,300
5887,Geoffery William,2005/06,5000,750I would like to combine duplicate records into a single record with the new single record containing additional fields (for example the last 3 fields). In the example file above, the first field is the unique field. So I would like my output to be like below:
5619,Marcus Lindsay Asher,2005/06,3500,525
5806,John Clifford,1999/00,3000,450,2000/01,2000,300
5887,Geoffery William,2005/06,5000,750Can someone please please help me with a solution using Unix scripting, awk or C, whichever is simpler.
Thank you.

One way is this Korn shell script. In a normal break point report such as this, the script assumes the datafile is sorted by the first field key:
#!/bin/kshfflag=0
while IFS="," read f1 f2 f3 f4 f5
do
if [[ $fflag = 0 ]]
then
fflag=1
sf1="$f1"
printf "%s,%s,%s,%s,%s" "$f1" "$f2" "$f3" "$f4" "$f5"
continue
fiif [[ "$f1" != "$sf1" ]]
then # changes
printf "\n%s,%s,%s,%s,%s" "$f1" "$f2" "$f3" "$f4" "$f5"
else # does NOT change
printf ",%s,%s,%s" "$f3" "$f4" "$f5"
fisf1="$f1"
done < datafile
echo ""Of course, this particular script only works for a comma-delimited record of 5 fixed fields.

Thank you nails. Almost there. Only problem left is the third printf (when it does not change) writes the f3, f4 and f5 at the beginning of the line. So the output becomes as below:
5619,Marcus Lindsay Asher,2005/06,3500,525
,2000/01,2000,300d,1999/00,3000,450
5887,Geoffery William,2005/06,5000,750

Actually nails, the output is correct. Only thing is the third printf (when it does not change) writes f3, f4 and f5 on a new line instead of appending at the end. So my output becomes as below:
5619,Marcus Lindsay Asher,2005/06,3500,525
5806,John Clifford,1999/00,3000,450
,2000/01,2000,300
5887,Geoffery William,2005/06,5000,750

I checked my code again, and I'm not having any trouble. The only way I can duplicate your error is placing a newline at the end of the 2nd printf:
printf "\n%s,%s,%s,%s,%s\n" "$f1" "$f2" "$f3" "$f4" "$f5"
This is incorrect as it should be this:
printf "\n%s,%s,%s,%s,%s" "$f1" "$f2" "$f3" "$f4" "$f5"
I don't know what else to tell you. BTW, what unix are you using? I'm using Solaris 9.
If your code is different from mine, you can post it and I'll take a look at it.

I think you're right. I got that issue when I ran the script against a small test file. When I modified the script to process a bigger csv file with more fields, I didn't have the issue. So I'm not quite sure why the output behaves like that.
Anyway, for now thank you so much for your help! :)

OK. I could not resist the temptation to write a script for Windows - I am sure that there are users who want to do similar things with CSV files in Windows also.Assuming that the input CSV (Comma-Separated-Values) file is X.csv, and the output is to go to Y.csv, the following script will combine duplicate records in the input CSV file. (For biterscripting, go to http://www.biterscriptng.com . I think is free.)
Sen
# SATRT OF SCRIPT
# Read in the entire file
var str data, records, new_records, record, new_record, key ; cat X.csv > $records# Process records one by one
while ($records <> "")
do
lex -e "1" $records > $record# Get unique key and data
stex -p "]^,^2" $record > $key # The first field is key.
stex -p "[^,^2" $record > $data # The second comma onward is data.# Check if this key exists in new_data.
var str key_search ; set $key_search = "^"+$key+"^"
if ( {sen $key_search $new_records} == 0)
# This key does not exist in new_data. Just add this entire row.
set $new_records = $new_records + $record + "\n"
else
do
# This key already exists in new_data. Just append data to that row.
var str key_arg ; set $key_arg = "["+$key_search
stex $key_arg $new_records > $new_record
lap "1" $data $new_record >null
set $new_records = $new_records + $new_record +"\n"
done
endif
done# Write new csv to Y.csv
echo $new_records > Y.csv

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |