Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
I have a long (5000+ lines) list of lines with two columns, separated with semicolon.
The words in first column are "standard" and they need to be found within the next column for occurances.
example:
LOREM IPSUM DOLOR;IREM HOPSUM FILUR SIT AMET, CONSECTETUER ADIPISCING ELIT;MAT SIMET, CONSECTETUER DIPSCUT TILE FUSCE ULTRICIES VELIT A IPSUM;EFUSCES ESTRICIES ALIT A HOPSUM QUISQUE VEL PEDE;QUI SQUE VELPE DENow to explain what this program needs to do. It needs to take one word from first column (until separator appears) in line at the time and search it from within the next column (everything after the separator) in line until every word is searched from within the second column. It also needs to print a report, it has to make a new column where it will print out a result of how many occurances are found from each line AND if possible, it should count the words in first column and either print an occurance percentage or just the amount of words into the new OR yet another column. The program should leave out colons, dots, slashes and so on.
So after running the program, the list should look like this:
LOREM IPSUM DOLOR;IREM HOPSUM FILUR;0;0 SIT AMET, CONSECTETUER ADIPISCING ELIT;MAT SIMET, CONSECTETUER DIPSCUT TILE;1;20 FUSCE ULTRICIES VELIT A IPSUM;EFUSCES ESTRICIES ALIT A HOPSUM;2;40 QUISQUE VEL PEDE;QUI SQUE VELPE DE;1;33So, in first line it found no occurances and therefor printed out 0 hits and 0% of words were hits.
From the second line, the word "CONSECTETUER" was found from the next column and therefor we get 1 hit and 20% of the words were hits.
In third line, we get 2 hits since the word "FUSCE" was found from within the next column, and the word "A" was also found. We don't need to know how many times each word is found, only if it is found at all. And we got 2 hits out of five, that should give us 40% hit percentage.
In the fourth line, another simple one. "VEL" is found from within the next column (1 hit) and since there are 3 words, we should get a percentage of "33,3333333333333333....." but i think we should get along just fine with "33".
I might not need a 100% ready sollution (although I don't complain if someone would offer me it :) ), I would just appreciate if you guys have coded something that might help me in creating of this program. I am a newbie in creating more complex perl scripts (with regexp) and I know absolutely nothing about c-language so I think perl might be the sollution for me.
Any hints or pointers? Thanks!
This I found as I was looking for a sollution to count the words, helpful?
http://www.nntp.perl.org/group/perl...
perl's own "compare"-array might be helpful for the comparison itself but I understand that this doesn't compare the words as strings but as words, so for example "THE" would not be found even if "THEME" is found on another column.
http://search.cpan.org/~davecross/A...

![]() |
Batch File to Append Date...
|
I need Help explaining a ...
|

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |