Computing.Net > Forums > Unix > manipulate columns in a csv using shell

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

manipulate columns in a csv using shell

Reply to Message Icon

Name: gizmoconcept
Date: May 13, 2009 at 12:13:35 Pacific
OS: Windows XP
Subcategory: General
Comment:

Hi.Hope someone can help me.

Situation:
I have a CSV file which contains 90 columns. I want to delete all columns but retain only 4 columns i.e. col 41,col 9.col 80 and col 74. I also want to rearrange these 4 columns so that they are in that sequence.

What I have so far:
for i in *.csv
do
cut -d, -f41,9,80,74 > done/"$i"
done

When I run the above, nothing happens. Hope someone out there can help me to get this script working. Thanks in advance.



Sponsored Link
Ads by Google

Response Number 1
Name: nails
Date: May 13, 2009 at 14:29:20 Pacific
Reply:

It's not doing anything because you didn't give it input file name $i:

cut -d, -f 4,8,9 $i > done/"$i"


0

Response Number 2
Name: gizmoconcept
Date: May 19, 2009 at 10:12:21 Pacific
Reply:

Thanks nails. It works now, it removes the unwanted columns and retains only the column I specify but it doesn't rearrange the columns in the sequence I want.

I tried this and but nothing happens:
awk -F, -v OFS=, '{ print $4,$1,$3,$2 }'

Can I combine the awk command above into the loop? so that the script does two things while looping:
1.removes unwanted columns
2.rearranges remaining columns in specified sequence
3.move output to done folder


0

Response Number 3
Name: nails
Date: May 19, 2009 at 16:15:54 Pacific
Reply:

The OFS should be set in the BEGIN section of the awk script:
awk -F, ' BEGIN { OFS="," }
{
print $4,$1,$3,$2
}' datafile > done/newfile

Another way is to leave the OFS alone, and use the printf statement to do the formating:
awk -F, '
{
printf("%s,%s,%s,%s\n", $4,$1,$3,$2)
}' datafile > done/newfile


0

Response Number 4
Name: gizmoconcept
Date: May 20, 2009 at 13:13:14 Pacific
Reply:

for i in *.csv
do
sed '1d' $i
cut -d, -f 9,41,70,84 $i
awk -F, '
{
printf("%s,%s,%s,%s\n", $4,$1,$3,$2)
}' $i > done/"$i"
done

I am very new to shell scripting and you have been a great help. Thank you for helping and guiding me along. So,would the above script do the following?
1.loop through a directory and process all files ending in .csv
2.delete the first line in each csv
3.delete all columns except columns 9,41,70 and 84 in each csv
4.rearranges columns 9,41,70,84 into 84,9,70,41 in each csv
5.outputs processed csvs into /done folder

Thanks nails.


0

Response Number 5
Name: nails
Date: May 20, 2009 at 18:30:05 Pacific
Reply:

No, your script is NOT right.  The old unix tools like sed, cut, and awk can not edit files in place.  That means you
have to use pipes, |, so the output of one file becomes the input of another:

for i in *.csv
do
   sed '1d' $i |
   cut -d, -f 9,41,70,84 |
   awk -F, '
   {
      printf("%s,%s,%s,%s\n", $4,$1,$3,$2)
   }' > done/"$i"
done

Or you have to use temporary files:

for i in *.csv
do

   sed '1d' $i > 1.tmp
   cut -d, -f 9,41,70,84 1.tmp > 2.tmp
   awk -F, '
   {
      printf("%s,%s,%s,%s\n", $4,$1,$3,$2)
   }' 2.tmp > done/"$i"
done

Finally, have you thought about solving the entire problem in awk?  If the number of fields is greater than 3
(skip the blank lines) print the four fields in whatever order you want.  the only problem I see is if your awk
version will not support that large number of fields:

# untested
for i in *.csv
do
   awk -F, '
   {
    if(NF > 3)
      printf("%s,%s,%s,%s\n", $9,$41,$70,$84)
   }' $i > done/"$i"
done


0

Related Posts

See More



Response Number 6
Name: gizmoconcept
Date: May 22, 2009 at 11:31:53 Pacific
Reply:

Thanks nails. I tried it but it jumbles up the columns wrongly.

The script did delete the first line of every csv file, kept only the wanted columns i.e. 9,41,70,84 but it did not rearrange the columns correctly. It didn't rearrange them into the correct column sequence which is 84,9,70,41(i.e. $4,$1,$3,$2 )


0

Response Number 7
Name: nails
Date: May 22, 2009 at 12:12:39 Pacific
Reply:

Just rearrange the columns in the order you wish:

# untested
for i in *.csv
do
   awk -F, '
   {
    if(NF > 3)
      printf("%s,%s,%s,%s\n", $84,$9,$70,$41)
   }' $i > done/"$i"
done


0

Response Number 8
Name: gizmoconcept
Date: May 22, 2009 at 12:58:03 Pacific
Reply:

Everything is ok now but the field containing "price information(i.e. 8.99 or 615.80)" did not output any data and only zeros appeared. You see, field 70 is actually not a string and is actually a float field containing price information with two decimal points(for e.g. 89.99, 7.05 or 715.27).

I tried this:
printf("%s,%s,%.2f,%s\n", $84,$9,$70,$41)

but it still does not work. Instead of displaying the price, it just displays 00.00 or 00.000000. Have also tried replacing %.2f with %d and just %f but all does not work. By the way, I opted to use the pipe that you suggested above instead of temp files.

Another thing that I would like to do after the above is working is to also convert all fields to be double quoted e.g. "<field>" because of the need for it to be a valid csv.

Let me know if you need more details in order to help me solve this problem.Thanks nails for your help.


0

Response Number 9
Name: gizmoconcept
Date: May 23, 2009 at 01:38:34 Pacific
Reply:

This is the script in its entirety where field 80 is "price information(i.e. 89.99, 7.05 or 715.27)". Everything is ok now except this field.

for i in *.csv
do
sed '1d' $i |
cut -d"," -f 9,13,26,41,74,80 |
awk -F, '
{
printf("%s,%s,%.2f,%s,%s,%s\n", $4,$1,$6,$5,$3,$2)
}' > done/"$i"
done


0

Response Number 10
Name: nails
Date: May 23, 2009 at 11:59:29 Pacific
Reply:

I am unable to duplicate your problem, but I think I understand what is happening. For some reason, $6 is not converted to decimal. awk thinks the field is a string which is why printf displays 0.0.

You might try adding 0.0 to the field. This might do the decimal converstion. Here is an example:

echo "A1,B2,C3,D4,E5, 715.27 "|awk -F, '
{
$6=$6+0.0
printf("%s,%s,%.2f,%s,%s,%s\n", $4,$1,$6,$5,$3,$2)
}'


0

Response Number 11
Name: gizmoconcept
Date: May 23, 2009 at 13:02:59 Pacific
Reply:

I tried adding the below.
$6=$6+0.0

But it still does not work. It is still displaying 0.00.

I am using CentOs 5.3, could this be the problem or could it be a DOS to UNIX text file format file issue?

All other fields are ok , only this field 6($6) which contains price information in the form of numbers with two decimal that is having problems....strange....


0

Response Number 12
Name: nails
Date: May 24, 2009 at 08:37:29 Pacific
Reply:

I don't know what the problem is. If you can send me a small sample of your data file, I'll take a look at it. Leave me a message in computing.net's private message center with your email address, and I'll let you know how you can contact me.


0

Response Number 13
Name: gizmoconcept
Date: May 28, 2009 at 03:48:30 Pacific
Reply:

hi nails. I have emailed you the sample csv file already. Please check your email. Did you receive it?

Thanks.


0

Response Number 14
Name: mazrkhan
Date: September 10, 2009 at 02:27:43 Pacific
Reply:

Hi There,


I have a strange problem:

The csv which i need to process has 3 columns as shown below

Test.csv

Column1,Coulmn2,Column3
"Floor 1, Suite 2, Rack3",122333,222222

The command

cut -d, -f 2,3 Test.csv > Test2.csv

gives Test2.csv

Coulmn2,Column3
Suite 2, Rack3

But i need

Coulmn2,Column3
122333,222222


Is this possible.

Please help


0

Sponsored Link
Ads by Google
Reply to Message Icon





Use following form to reply to current message:

Login or Register to Reply
LoginRegister


Sponsored links

Ads by Google


Results for: manipulate columns in a csv using shell

how to seach and replace a value using shell www.computing.net/answers/unix/how-to-seach-and-replace-a-value-using-shell-/8492.html

count the occurence of a char in a string www.computing.net/answers/unix/count-the-occurence-of-a-char-in-a-string/8516.html

displaying words in a row to column www.computing.net/answers/unix/displaying-words-in-a-row-to-column/4945.html