Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi all, here's hoping some one can give me a pointer or two with this probelm...
I've got a huge text file that I'm cleaning up into tab'd data for import into a database. I've got it all cleaned up except for some of the tabbing... Here's the problem:Using grep I need to come up with an expression that finds lines that, 1. contain fewer than 5 tabs and 2. contain more than 5 tabs.
I know enough about grep to be dangerous and have been reading over the docs on it but haven't been able to construct anything useful as of yet. I'm using BBEdit on Mac OS X to put together this text file, so all I need is the expression that actually does the search, I don't need to worry about returning or formatting the returned data, BBEdit handles all that. Thanks.

Try these options to see if BBedit supports the same syntax.
? - matches zero or one of the preceding character
{n} - matches n copies of the preceding character!
{n,m} - matches at least n but not more than m copies of the preceding character
{n,} - matches at least n copies of the preceding character.

It does indeed support all of those options and all other options that grep supports, however, I'm having trouble constructing a string that alerts me on a line by line basis as to which lines have either fewer than 5 tabs or more than 5 tabs. Those are the lines I want to know about and fix. Each line will be a record in a database and there are over 8000 lines. Total pain, but this is the final cleanup. Thanks.

These expressions use the letter X, so you need to substitute a tab character representation for each X.
First expression uses REs:
grep -v '^[^X]*X[^X]*X[^X]*X[^X]*X[^X]*X[^X]*$' myfile
And this expression uses EREs, which allows us to represent that repeating multi-character pattern with the {5} construct:
egrep -v '^([^X]*X){5}[^X]*$' myfile

GREP SEARCHING FOR TABS USING THE TERMINAL
Broken down there are 4 possibilities how tabs might
occur on a line;
1. find tabs only
^([[:cntrl:]]{n,m})$
2. find non-control chars, then n tabs
^([^[:cntrl:]])[[:cntrl:]]{n,m}$
3. find n tabs, then non-control chars
^[[:cntrl:]]{n,m}[^[:cntrl:]]$
4. find non-control chars, n tabs, non-
control chars
^[^[:cntrl:]][[:cntrl:]]{n,m}[^[:cntrl:]]$
Now, pipe it all together!find more than 5 tabs (ugly, ain't it!);
grep -E
'^([[:cntrl:]]{5,})$|^([^[:cntrl:]])[[:cntrl:]]{5,}$|^[[:cntrl:]]
{5,}[^[:cntrl:]]$|^[^[:cntrl:]][[:cntrl:]]{5,}[^[:cntrl:]]$' /
filenamefind fewer than 5 tabs; just substitute `5,' above with `0,4'
(there are 4 occurances)

![]() |
ftp from Unix to Windows ...
|
Free Shell acct anywhere ...
|

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |