Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
I have this text file:
Word1/PRO_UBST word2/ADJ word3/N word4/V word5/PREP word6/N
The "/N" is a grammatical tag of the word.
/N=Noun
/PRO_****=Pronouns
/V=Verb
/PREP=prepositionI need some kind of program that indentifies - and put brackets around - the complete nouns. Not just the words tagged with "/N" - but also those that has the following structure:
/PRO_*** + /N
and
/ADJ + /N
and
/PRO_**** + /ADJ + /NIs that possible?

Those + characters were inserted by you? And they are not actually part of your text, right?
What about variations of those, such as:
/ADJ /ADJ /N

Yes - the "+" is inserted by me.
And yes - it would be nice if it could cover alle the variations, but that would probably be to difficult. To me, even placing brackets around the "/N" was a problem.... Somebody did help me in that matter earlier:#!/usr/bin/perl -w while (<STDIN>) { s/\b([^ ]+\/N)\b/\[ $1 ]/g; print; }

I can code that fairly easily - just want to get the rules straight first.
So you want brackets placed around each \N, and include within those brackets any number of preceeding /ADJ and/or /PRO?
Is an awk solution OK?
Do I have to worry about exact spacing? In other words, if two words are separated by multiple spaces, do I have to keep those multiple spaces? Or would a single space be OK?

OK, try this ...
This awk program analyzes each word and constructs a new output line into out. Any string of /ADJ and /PRO words are accumulated in adj. When a /N is encountered, it appends a bracketed expression to out consisting of the /N preceeded by anything in adj. When a word other than /ADJ or /PRO or /N is encountered, any accumulated adj and that word is appended to out.
awk '{
adj="" ; out="" ; adjsp="" ; outsp=""
for (i=1 ; i<=NF ; i++)
{if ( match($i,"/PRO") || match($i,"/ADJ") )
{adj=adj adjsp $i
adjsp=" "
continue}if ( match($i,"/N") )
{out=out outsp "[ " adj adjsp $i " ]"
outsp=" "
adj=""
adjsp=""
continue}out=out outsp adj adjsp $i
outsp=" "
adj=""
adjsp=""
}
print out
}' infile

THANKS!!! :o)
I'm not able to test this until tommorrow (not having Linux here at work) and just wants to be sure that this keeps the entire original text, and just adds the brackets to the "complete" Nouns. Meaning that it adds brackets around N's preceeded by either /PRO, /ADJ or both - and includes these within the bracket...

Yes, that should do it. I put spaces around the opening and closing brackets, but adjustments can be made as needed.

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |