create csv file based on info from a column

June 30, 2011 at 14:54:07
Specs: Windows 7
hello

I have a csv file and I want to create a new file by filtering information from a certain column, I have tried findstr but it looks at all the columns and not only one column

thank you


See More: create csv file based on info from a column

Report •

#1
June 30, 2011 at 21:40:27
Use a For loop to access the column. In the simplest For loop the maximum number of columns is 31.

If you need further assistance please post more info such as the number of columns and a sample line, and what you want to do with the value contained in the selected column.


Please come back & tell us if your problem is resolved.


Report •

#2
July 1, 2011 at 06:13:12
the column that I want to check its value is number 17 and I want to put the line that I find in another file
so for example I want to look for the word buy in this column and if I find this word I want to put the whole line in a separate file.
I tried the for loop but I didn't know how to filter for the required column, I also have another filter which will be adding only the lines that have a value in the first column, can these two steps be done together?

right now here is what I have

FOR /F "usebackq tokens=1,* delims=$" %%i in ("test.csv") do (
echo %%i>>"test2.csv"
GoTo:OUT)
:Out

findstr "Buy" test.csv>>test2.csv
I did the first command to get the headings and then I am searching for the word buy but the word buy can appear in other columns and I only want to search the column number 17

thank you for your help


Report •

#3
July 1, 2011 at 17:48:09
You didn't show a typical line from your .csv file so the following script creates a file for testing only.

@echo off
cls
setlocal enabledelayedexpansion

:: Create test file.
(
echo Header line,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11,h12,^
h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24
echo 1st,1,2,3,4,buy,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
echo 2nd,1,2,3,4,5,6,7,8,9,10,buy,12,13,14,15,16,17,18,19,20,21,22,23
echo 3rd,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
echo 4th,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,buy 20000 at 1.50,18,19,20,21,22,23
echo 5th,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,buy,23
echo 6th,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,you can buy,18,19,20,21,22,buy
echo 7th,1,2,3,4,5,6,7,8,9,10,11,buy,13,14,15,16,do not buy,18,19,20,21,22,23
)>%temp%\trial.csv

:: Extract Header line>
set /p header=<%temp%\trial.csv&echo !header!>%temp%\toutput.csv

:: Extract lines 
for /f "skip=1 tokens=*" %%1 in (%temp%\trial.csv) do (
    set line=%%1
        for /f "tokens=1-24 delims=," %%A in ("!line!") do (

            rem Extract lines containing buy in column 17
            echo %%R|findstr "buy">nul&&echo !line!>>%temp%\toutput.csv

            rem Extract lines containing 2 in column 1
            echo %%A|findstr "2">nul&&echo !line!>>%temp%\toutput.csv

    )
)

type %temp%\toutput.csv
del %temp%\t*.csv


Please come back & tell us if your problem is resolved.


Report •

Related Solutions

#4
July 1, 2011 at 21:07:52
the code is working thank you, but I have noticed something in my file and that is some numbers have commas so for example if the price is 12000 it is written as 12,000 so is there a way to over come that because it seems that this code is counting the 12 at a column and the 000 at another column so it isn't able to find all the rows that have the word buy in them
and as for the second condition I want to remove the rows that are blank in their first column

here is a sample from the file
CurrentIdentifier,Part,PartDivision,ReportName,ReportDate,Currency,Originator,Shop,Manager,cost,price,Officer,ClosingDate,Percent,LocalCurrency,Pool,recommendation,Form,Focus,Stage,BackupOfficer,GeographicFocus,Rank,Year,Class,Liquid,PurchaseDate,Liquidity,Name,TotalOrderSize,TotalPrice,TotalPercent,FormerIdentifier,SecurityIDType,AccountNumber,ClientID,CommitID,ParentCommitID
a0AA000000087dFMAQ,"CORE, L.P.",,ALL Reports,12/31/2010,USD,,"Odabsld, L.P.",Odevsld ,"83,167,403.89","13,251,500.00",John Smith ,9/28/2007,15.9335,EUR,,Buy,LP,Multiple,,Jeremy,Intl - Germany,,2007,,F,,Illiquid,"Odabsld Private, L.P.",,,,,,,32,189,189

again thank you very much for your help


Report •

#5
July 3, 2011 at 23:01:13
Yes, formats of .csv files. That's why I wrote in my first response If you need further assistance please post more info such as the number of columns and a sample line, and what you want to do with the value contained in the selected column. to which you didn't respond so the work done to produce my first script was a waste of time.

I have made the assumption that column 17 will either hold the single word Buy or some other info or perhaps be blank. As usual, paths and filenames will need to be amended to suit your setup. I have used the @ character as a delimiter, if that character appears anywhere in your file you should change the char in the script to some other printable char, if necessary one of the Extended ASCII Codes.

Good luck, hope I've helped.

Please test the following script.

@echo off
cls
setlocal enabledelayedexpansion

set /p header=<trial.csv
echo %header%>%temp%\output.csv

for /f "skip=1 tokens=*" %%A in (trial.csv) do (
    set inline=%%A&set inline1=!inline:"=@!

    for /f "tokens=1-10* delims=@" %%A in ("!inline1!") do (
        if not "%%A" equ "," (
            set linepart=%%I

                for /f "tokens=1-7* delims=," %%1 in ("!linepart!") do (
                    if /i "%%5" equ "buy" (
                        echo !inline!>>%temp%\output.csv
        )
      )
    )
  )
)


type %temp%\output.csv



Please come back & tell us if your problem is resolved.


Report •

#6
July 5, 2011 at 07:20:24
thank you for your reply and sorry for my misunderstanding at the beginning, I tested the file it is bringing some records but not all of them so I am trying to play with it a little bit to see why it is bringing some records and not the rest
can I ask you a question, I am trying to read the code and here is what I got
so you are diving each line into two parts
part 1 from 1-10 and it starts at column A and in this part you are checking to see if the first column is empty
and the second part starts from column I and you are counting 7 columns so why is the important column number 5 and not 7?
thank you very much for your help

Report •

#7
July 5, 2011 at 17:56:03
so you are diving each line into two parts
part 1 from 1-10 and it starts at column A and in this part you are checking to see if the first column is empty

No, the line is being split into several segments depending on how many times the delimiter (@) appears in the line. I chose an arbitrary number of tokens as 1-10* so that the line could be split into 11 segments if necessary, this could probably be reduced to 10 by using tokens=1-9* and still achieve the desired result as the For variable assigned to the required segment is %%I (the 9th letter of the alphabet).

See this:
for /f "tokens=1-10* delims=@" %%A in ("!inline1!") do (
if not "%%A" equ "," (
set linepart=%%I

I don't understand your final query.
and the second part starts from column I and you are counting 7 columns so why is the important column number 5 and not 7?

You will appreciate that the script does not deal with a .csv file but with an ASCII text file (which happens to be a .csv file) or a simple environment variable. When the For loop examines the file or environment variable it looks for the characters which appear between the delimiter(s) in this case a single , (comma). The delimiter otherwise is ignored. If there are no characters between delimiters, not even a space, the For variable is not set as there is nothing to which to set it.

Env Variable LINEPART consists of:
,John Smith ,9/28/2007,15.9335,EUR,,Buy,LP,Multiple,,Jeremy,Intl - Germany,,2007,,F,,Illiquid,

Therefore the only viable fields are:
%%1 - John Smith
%%2 - 9/28/2007
%%3 - 15.9335
%%4 - EUR
%%5 - Buy (the only field of interest to us)
%%6 - LP
%%7 - Multiple
%%8 - ,Jeremy,Intl - Germany,,2007,,F,,Illiquid,

See this:
for /f "tokens=1-7* delims=," %%1 in ("!linepart!") do (
if /i "%%5" equ "buy" (
echo !inline!>>%temp%\output.csv

Again the number of tokens could have been reduced to 1-5* and achieve the desired result.

Hope this helps you.

W.


Please come back & tell us if your problem is resolved.


Report •

#8
July 7, 2011 at 19:42:55
the result of the first part isn't consistent among all lines so the resulting columns aren't always in the same order, sometimes the word buy is appearing in column 4 and other times in column 5, I am not sure what is causing this difference in behavior.


CurrentIdentifier,Part,PartDivision,ReportName,ReportDate,Currency,Originator,Shop,Manager,cost,price,Officer,ClosingDate,Percent,LocalCurrency,Pool,recommendation,Form,Focus,Stage,BackupOfficer,GeographicFocus,Rank,Year,Class,Liquid,PurchaseDate,Liquidity,Name,TotalOrderSize,TotalPrice,TotalPercent,FormerIdentifier,SecurityIDType,AccountNumber,ClientID,CommitID,ParentCommitID
a0AA000000087dFMAQ,"CORE, L.P.",,ALL Reports,12/31/2010,USD,,"Odabsld, L.P.",Odevsld ,"83,167,403.89","13,251,500.00",John Smith ,9/28/2007,15.9335,EUR,,Buy,LP,Multiple,,Jeremy,Intl - Germany,,2007,,F,,Illiquid,"Odabsld Private, L.P.",,,,,,,32,189,189
a0AA000000087dLMAQ,WP Opportunity,,ALL Reports,12/31/2010,USD,,"Arlington, L.P.",Arlingt,"585,000,000.00","5,000,000.00",John Smith,3/17/2006,0.8547,USD,,Buy,LP,Multiple,,Jeremy,US,,2005,,F,,Illiquid,"Arlington, L.P.",,,,,,,10,57,57

so the first line is appearing but not the second line


Report •

#9
July 8, 2011 at 02:28:51
The reason for this is that column 2 in the second record posted is not enclosed in "..." therefore the script I posted is not consistently creating the various segments of the record in the second For loop. The easiest cure for this is for you to ensure that all fields are in the same format otherwise I can probably code a workaround but it will have to wait till next week as this weekend is fully booked.

W.


Please come back & tell us if your problem is resolved.


Report •

#10
July 8, 2011 at 06:13:11
well this file is automatically generated so I want to keep the current format of the file without changing.
I can wait till next week no problem but there are two things what do I need to do if I want to filter based on different phrases like "buy - big" and "sell - small"
the other thing is I want to include only the lines that have a value in the first column because some rows don't have a value for current identifier
thank you very much for your help

Report •

#11
July 8, 2011 at 15:24:21
there are two things what do I need to do if I want to filter based on different phrases like "buy - big" and "sell - small"

You can do this by a using a Find command line instead of If in:
for /f "tokens=1-7* delims=," %%1 in ("!linepart!") do (
if /i "%%5" equ "buy" (
echo !inline!>>%temp%\output.csv

Second query:
I want to include only the lines that have a value in the first column because some rows don't have a value for current identifier

This was taken care of in my script by testing for a comma as the first character, did it not work for you? It worked here.

See:
for /f "tokens=1-10* delims=@" %%A in ("!inline1!") do (
if not "%%A" equ "," (
set linepart=%%I

If the first column has a value Linepart would be set else the script would return to the first For loop and the next record would be read. Of course if the first column does not contain a value and the second column is not enclosed in "..." the script fails.

I am becoming increasingly concerned that if we succeed in correcting the problems you have so far presented some other anomaly will arise and we will be amending the script for the foreseeable future. This could be avoided if every record in the .csv file had the same format.


Please come back & tell us if your problem is resolved.


Report •

#12
July 9, 2011 at 09:53:44
well if I am going to do any manual operations on the file then I wouldn't need the batch file but I have checked the columns that might have the same problem as the second column and they are the 8th, 9th and 29th column.
I know that this is a weird way to create the csv file but I wanted to create an automated way to deal with the file but if it will take a long time to develop I think I will have to the process manually

Report •

#13
July 9, 2011 at 14:22:58
I have checked the columns that might have the same problem as the second column and they are the 8th, 9th and 29th column.

With so many variables in record formats I regret I cannot assist you further. Perhaps if you start a new post laying out your requirements and giving examples of all the formats someone else may be able to assist, perhaps using a different scripting or programming language.

Good luck.


Please come back & tell us if your problem is resolved.


Report •

#14
July 11, 2011 at 06:08:52
ok thank you for your help

Report •

Ask Question