Computing.Net > Forums > Programming > perl/c find words from string

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

perl/c find words from string

Reply to Message Icon

Name: slartsa
Date: January 22, 2009 at 23:29:35 Pacific
OS: windows xp/2k/nt
CPU/Ram: -
Product: - / -
Subcategory: General
Comment:

I have a long (5000+ lines) list of lines with two columns, separated with semicolon.

The words in first column are "standard" and they need to be found within the next column for occurances.

example:

LOREM IPSUM DOLOR;IREM HOPSUM FILUR
SIT AMET, CONSECTETUER ADIPISCING ELIT;MAT SIMET, CONSECTETUER DIPSCUT TILE
FUSCE ULTRICIES VELIT A IPSUM;EFUSCES ESTRICIES ALIT A HOPSUM
QUISQUE VEL PEDE;QUI SQUE VELPE DE

Now to explain what this program needs to do. It needs to take one word from first column (until separator appears) in line at the time and search it from within the next column (everything after the separator) in line until every word is searched from within the second column. It also needs to print a report, it has to make a new column where it will print out a result of how many occurances are found from each line AND if possible, it should count the words in first column and either print an occurance percentage or just the amount of words into the new OR yet another column. The program should leave out colons, dots, slashes and so on.

So after running the program, the list should look like this:

LOREM IPSUM DOLOR;IREM HOPSUM FILUR;0;0
SIT AMET, CONSECTETUER ADIPISCING ELIT;MAT SIMET, CONSECTETUER DIPSCUT TILE;1;20
FUSCE ULTRICIES VELIT A IPSUM;EFUSCES ESTRICIES ALIT A HOPSUM;2;40
QUISQUE VEL PEDE;QUI SQUE VELPE DE;1;33

So, in first line it found no occurances and therefor printed out 0 hits and 0% of words were hits.

From the second line, the word "CONSECTETUER" was found from the next column and therefor we get 1 hit and 20% of the words were hits.

In third line, we get 2 hits since the word "FUSCE" was found from within the next column, and the word "A" was also found. We don't need to know how many times each word is found, only if it is found at all. And we got 2 hits out of five, that should give us 40% hit percentage.

In the fourth line, another simple one. "VEL" is found from within the next column (1 hit) and since there are 3 words, we should get a percentage of "33,3333333333333333....." but i think we should get along just fine with "33".

I might not need a 100% ready sollution (although I don't complain if someone would offer me it :) ), I would just appreciate if you guys have coded something that might help me in creating of this program. I am a newbie in creating more complex perl scripts (with regexp) and I know absolutely nothing about c-language so I think perl might be the sollution for me.

Any hints or pointers? Thanks!


This I found as I was looking for a sollution to count the words, helpful?
http://www.nntp.perl.org/group/perl...


perl's own "compare"-array might be helpful for the comparison itself but I understand that this doesn't compare the words as strings but as words, so for example "THE" would not be found even if "THEME" is found on another column.
http://search.cpan.org/~davecross/A...



Sponsored Link
Ads by Google
Reply to Message Icon

Related Posts

See More


Batch File to Append Date... I need Help explaining a ...



Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Programming Forum Home


Sponsored links

Ads by Google


Results for: perl/c find words from string

Urgent find word from file inc hepl www.computing.net/answers/programming/urgent-find-word-from-file-inc-hepl/18320.html

C character subst. in string www.computing.net/answers/programming/c-character-subst-in-string/9739.html

C++ Finding the Middle Four Digits www.computing.net/answers/programming/c-finding-the-middle-four-digits-/4647.html