Computing.Net > Forums > Unix > Grep expression help

Grep expression help

Reply to Message Icon

Original Message
Name: Trent
Date: May 2, 2003 at 15:26:03 Pacific
Subject: Grep expression help
OS: Mac OS X 10.2.5
CPU/Ram: 500Mhz G4/768 MB
Comment:

Hi all, here's hoping some one can give me a pointer or two with this probelm...
I've got a huge text file that I'm cleaning up into tab'd data for import into a database. I've got it all cleaned up except for some of the tabbing... Here's the problem:

Using grep I need to come up with an expression that finds lines that, 1. contain fewer than 5 tabs and 2. contain more than 5 tabs.

I know enough about grep to be dangerous and have been reading over the docs on it but haven't been able to construct anything useful as of yet. I'm using BBEdit on Mac OS X to put together this text file, so all I need is the expression that actually does the search, I don't need to worry about returning or formatting the returned data, BBEdit handles all that. Thanks.


Report Offensive Message For Removal


Response Number 1
Name: David Perry
Date: May 2, 2003 at 20:24:30 Pacific
Reply: (edit)

Try these options to see if BBedit supports the same syntax.

? - matches zero or one of the preceding character

{n} - matches n copies of the preceding character!

{n,m} - matches at least n but not more than m copies of the preceding character

{n,} - matches at least n copies of the preceding character.


Report Offensive Follow Up For Removal

Response Number 2
Name: Trent
Date: May 5, 2003 at 09:19:36 Pacific
Reply: (edit)

It does indeed support all of those options and all other options that grep supports, however, I'm having trouble constructing a string that alerts me on a line by line basis as to which lines have either fewer than 5 tabs or more than 5 tabs. Those are the lines I want to know about and fix. Each line will be a record in a database and there are over 8000 lines. Total pain, but this is the final cleanup. Thanks.


Report Offensive Follow Up For Removal

Response Number 3
Name: James Boothe
Date: May 5, 2003 at 13:55:13 Pacific
Reply: (edit)

These expressions use the letter X, so you need to substitute a tab character representation for each X.

First expression uses REs:

grep -v '^[^X]*X[^X]*X[^X]*X[^X]*X[^X]*X[^X]*$' myfile

And this expression uses EREs, which allows us to represent that repeating multi-character pattern with the {5} construct:

egrep -v '^([^X]*X){5}[^X]*$' myfile


Report Offensive Follow Up For Removal

Response Number 4
Name: gcl
Date: May 15, 2003 at 09:37:21 Pacific
Reply: (edit)

GREP SEARCHING FOR TABS USING THE TERMINAL

Broken down there are 4 possibilities how tabs might
occur on a line;
1. find tabs only
^([[:cntrl:]]{n,m})$

2. find non-control chars, then n tabs
^([^[:cntrl:]])[[:cntrl:]]{n,m}$

3. find n tabs, then non-control chars
^[[:cntrl:]]{n,m}[^[:cntrl:]]$

4. find non-control chars, n tabs, non-
control chars
^[^[:cntrl:]][[:cntrl:]]{n,m}[^[:cntrl:]]$


Now, pipe it all together!

find more than 5 tabs (ugly, ain't it!);
grep -E
'^([[:cntrl:]]{5,})$|^([^[:cntrl:]])[[:cntrl:]]{5,}$|^[[:cntrl:]]
{5,}[^[:cntrl:]]$|^[^[:cntrl:]][[:cntrl:]]{5,}[^[:cntrl:]]$' /
filename

find fewer than 5 tabs; just substitute `5,' above with `0,4'
(there are 4 occurances)


Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Grep expression help

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 5 Days.
Discuss in The Lounge