Name: ld123 Date: February 20, 2008 at 23:26:31 Pacific Subject: Parsing Question OS: Win XP CPU/Ram: Dual Core Model/Manufacturer: Pavilion DV6000
Comment:
I have a directory with .txt files in it. I'm trying to parse out a value in the files and output to a .csv file. The value will always be a letter, V, and 7 digits, example V1234567. The letter will always be the same, the digits will always be different, but always 7 in number.
I've tried some of the examples I've found on the board, but, I am stumped.
Here are the first three lines of a sample. This will be the general format for all the files. I have added the 1.), 2.), and 3.) in front of the lines:
As an FYI, this is a custom report we run on workstations. TID is terminal id, I have no idea what CID is, some date thing that I don't really need to parse out.
Also, in addition to M2's question, on the TID line is there always exactly one instance of the letter Z (i.e. the one that's followed by the number)? If so, things get easier still.
I'm downloading a copy of the Gawk for Windows now. I'll install it on a VM server I have and test it out. With the example you gave, c:\> gawk -F "|" "/TID/{print $(NF-2)}" file V1234567, wouldn't that only abstract the V1234567? The next file may have the value V0987654.
I haven't had a chance to look at the software yet, but how would you handle wildcards?
Well, a whole different thing to learn now. I'm going through the pdf's with the program. Could you explain the command you wrote? I can't find any mention of a printing command NF- etc.
Your original post indicated that you needed to output to a csv file, which presumably means that you need to incorporate this parsed value with other data. Is that correct? If so, the gawk command may not do everything you need.
Here's a Perl command that accomplishes the same thing as the gawk command, but my assumption is that we need to expand the logic to handle the additional csv data.
NF means number of fields in gawk. -F means field delimiter.
gawk -F "|" means set the field delimiter to "|".
If you try this on the command line: gawk -F "|" '{print NF}' file
it shows you the number of fields you have separated by "|". you can verify by counting.
/TID/ simply means to match lines where it contains TID.
$(NF-2) means to get the VALUE of the field that is 2 places before the final field. That would be your V1234567, and its assumed that VXXXXXXX is always at that field.
$NF means to get the last field VALUE.
NF means the last field number.
to extract that VXXXXXX value to a new file, just pipe it.
I've run the bat against a sample file, but no dice. The file runs but no output, even though the sample file has the format and value mentioned above. Any ideas?
The information on Computing.Net is the opinions of its users. Such
opinions may not be accurate and they are to be used at your own risk.
Computing.Net cannot verify the validity of the statements made on this site. Computing.Net and Computing.Net, LLC hereby disclaim all responsibility and liability for the content of Computing.Net and its accuracy.
PLEASE READ THE FULL DISCLAIMER AND LEGAL TERMS BY CLICKING HERE