Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi.Hope someone can help me.
Situation:
I have a CSV file which contains 90 columns. I want to delete all columns but retain only 4 columns i.e. col 41,col 9.col 80 and col 74. I also want to rearrange these 4 columns so that they are in that sequence.What I have so far:
for i in *.csv
do
cut -d, -f41,9,80,74 > done/"$i"
doneWhen I run the above, nothing happens. Hope someone out there can help me to get this script working. Thanks in advance.

It's not doing anything because you didn't give it input file name $i:
cut -d, -f 4,8,9 $i > done/"$i"

Thanks nails. It works now, it removes the unwanted columns and retains only the column I specify but it doesn't rearrange the columns in the sequence I want.
I tried this and but nothing happens:
awk -F, -v OFS=, '{ print $4,$1,$3,$2 }'Can I combine the awk command above into the loop? so that the script does two things while looping:
1.removes unwanted columns
2.rearranges remaining columns in specified sequence
3.move output to done folder

The OFS should be set in the BEGIN section of the awk script:
awk -F, ' BEGIN { OFS="," }
{
print $4,$1,$3,$2
}' datafile > done/newfileAnother way is to leave the OFS alone, and use the printf statement to do the formating:
awk -F, '
{
printf("%s,%s,%s,%s\n", $4,$1,$3,$2)
}' datafile > done/newfile

for i in *.csv
do
sed '1d' $i
cut -d, -f 9,41,70,84 $i
awk -F, '
{
printf("%s,%s,%s,%s\n", $4,$1,$3,$2)
}' $i > done/"$i"
doneI am very new to shell scripting and you have been a great help. Thank you for helping and guiding me along. So,would the above script do the following?
1.loop through a directory and process all files ending in .csv
2.delete the first line in each csv
3.delete all columns except columns 9,41,70 and 84 in each csv
4.rearranges columns 9,41,70,84 into 84,9,70,41 in each csv
5.outputs processed csvs into /done folderThanks nails.

No, your script is NOT right. The old unix tools like sed, cut, and awk can not edit files in place. That means you have to use pipes, |, so the output of one file becomes the input of another: for i in *.csv do sed '1d' $i | cut -d, -f 9,41,70,84 | awk -F, ' { printf("%s,%s,%s,%s\n", $4,$1,$3,$2) }' > done/"$i" done Or you have to use temporary files: for i in *.csv do sed '1d' $i > 1.tmp cut -d, -f 9,41,70,84 1.tmp > 2.tmp awk -F, ' { printf("%s,%s,%s,%s\n", $4,$1,$3,$2) }' 2.tmp > done/"$i" done Finally, have you thought about solving the entire problem in awk? If the number of fields is greater than 3 (skip the blank lines) print the four fields in whatever order you want. the only problem I see is if your awk version will not support that large number of fields: # untested for i in *.csv do awk -F, ' { if(NF > 3) printf("%s,%s,%s,%s\n", $9,$41,$70,$84) }' $i > done/"$i" done

Thanks nails. I tried it but it jumbles up the columns wrongly.
The script did delete the first line of every csv file, kept only the wanted columns i.e. 9,41,70,84 but it did not rearrange the columns correctly. It didn't rearrange them into the correct column sequence which is 84,9,70,41(i.e. $4,$1,$3,$2 )

Just rearrange the columns in the order you wish:
# untested for i in *.csv do awk -F, ' { if(NF > 3) printf("%s,%s,%s,%s\n", $84,$9,$70,$41) }' $i > done/"$i" done

Everything is ok now but the field containing "price information(i.e. 8.99 or 615.80)" did not output any data and only zeros appeared. You see, field 70 is actually not a string and is actually a float field containing price information with two decimal points(for e.g. 89.99, 7.05 or 715.27).
I tried this:
printf("%s,%s,%.2f,%s\n", $84,$9,$70,$41)but it still does not work. Instead of displaying the price, it just displays 00.00 or 00.000000. Have also tried replacing %.2f with %d and just %f but all does not work. By the way, I opted to use the pipe that you suggested above instead of temp files.
Another thing that I would like to do after the above is working is to also convert all fields to be double quoted e.g. "<field>" because of the need for it to be a valid csv.
Let me know if you need more details in order to help me solve this problem.Thanks nails for your help.

This is the script in its entirety where field 80 is "price information(i.e. 89.99, 7.05 or 715.27)". Everything is ok now except this field.
for i in *.csv
do
sed '1d' $i |
cut -d"," -f 9,13,26,41,74,80 |
awk -F, '
{
printf("%s,%s,%.2f,%s,%s,%s\n", $4,$1,$6,$5,$3,$2)
}' > done/"$i"
done

I am unable to duplicate your problem, but I think I understand what is happening. For some reason, $6 is not converted to decimal. awk thinks the field is a string which is why printf displays 0.0.
You might try adding 0.0 to the field. This might do the decimal converstion. Here is an example:
echo "A1,B2,C3,D4,E5, 715.27 "|awk -F, '
{
$6=$6+0.0
printf("%s,%s,%.2f,%s,%s,%s\n", $4,$1,$6,$5,$3,$2)
}'

I tried adding the below.
$6=$6+0.0But it still does not work. It is still displaying 0.00.
I am using CentOs 5.3, could this be the problem or could it be a DOS to UNIX text file format file issue?
All other fields are ok , only this field 6($6) which contains price information in the form of numbers with two decimal that is having problems....strange....

I don't know what the problem is. If you can send me a small sample of your data file, I'll take a look at it. Leave me a message in computing.net's private message center with your email address, and I'll let you know how you can contact me.

hi nails. I have emailed you the sample csv file already. Please check your email. Did you receive it?
Thanks.

Hi There,
I have a strange problem:The csv which i need to process has 3 columns as shown below
Test.csv
Column1,Coulmn2,Column3
"Floor 1, Suite 2, Rack3",122333,222222The command
cut -d, -f 2,3 Test.csv > Test2.csv
gives Test2.csv
Coulmn2,Column3
Suite 2, Rack3But i need
Coulmn2,Column3
122333,222222
Is this possible.Please help

![]() |
![]() |
![]() |
| Login or Register to Reply | |
| Login | Register |
| Ads by Google |