|
|
|
Grep expression help
|
Original Message
|
Name: Trent
Date: May 2, 2003 at 15:26:03 Pacific
Subject: Grep expression helpOS: Mac OS X 10.2.5CPU/Ram: 500Mhz G4/768 MB |
Comment: Hi all, here's hoping some one can give me a pointer or two with this probelm... I've got a huge text file that I'm cleaning up into tab'd data for import into a database. I've got it all cleaned up except for some of the tabbing... Here's the problem: Using grep I need to come up with an expression that finds lines that, 1. contain fewer than 5 tabs and 2. contain more than 5 tabs. I know enough about grep to be dangerous and have been reading over the docs on it but haven't been able to construct anything useful as of yet. I'm using BBEdit on Mac OS X to put together this text file, so all I need is the expression that actually does the search, I don't need to worry about returning or formatting the returned data, BBEdit handles all that. Thanks.
Report Offensive Message For Removal
|
|
Response Number 1
|
Name: David Perry
Date: May 2, 2003 at 20:24:30 Pacific
|
Reply: (edit)Try these options to see if BBedit supports the same syntax. ? - matches zero or one of the preceding character {n} - matches n copies of the preceding character! {n,m} - matches at least n but not more than m copies of the preceding character {n,} - matches at least n copies of the preceding character.
Report Offensive Follow Up For Removal
|
|
Response Number 2
|
Name: Trent
Date: May 5, 2003 at 09:19:36 Pacific
|
Reply: (edit)It does indeed support all of those options and all other options that grep supports, however, I'm having trouble constructing a string that alerts me on a line by line basis as to which lines have either fewer than 5 tabs or more than 5 tabs. Those are the lines I want to know about and fix. Each line will be a record in a database and there are over 8000 lines. Total pain, but this is the final cleanup. Thanks.
Report Offensive Follow Up For Removal
|
|
Response Number 3
|
|
Reply: (edit)These expressions use the letter X, so you need to substitute a tab character representation for each X. First expression uses REs: grep -v '^[^X]*X[^X]*X[^X]*X[^X]*X[^X]*X[^X]*$' myfile And this expression uses EREs, which allows us to represent that repeating multi-character pattern with the {5} construct: egrep -v '^([^X]*X){5}[^X]*$' myfile
Report Offensive Follow Up For Removal
|
|
Response Number 4
|
Name: gcl
Date: May 15, 2003 at 09:37:21 Pacific
|
Reply: (edit)GREP SEARCHING FOR TABS USING THE TERMINAL Broken down there are 4 possibilities how tabs might occur on a line; 1. find tabs only ^([[:cntrl:]]{n,m})$ 2. find non-control chars, then n tabs ^([^[:cntrl:]])[[:cntrl:]]{n,m}$ 3. find n tabs, then non-control chars ^[[:cntrl:]]{n,m}[^[:cntrl:]]$ 4. find non-control chars, n tabs, non- control chars ^[^[:cntrl:]][[:cntrl:]]{n,m}[^[:cntrl:]]$ Now, pipe it all together!
find more than 5 tabs (ugly, ain't it!); grep -E '^([[:cntrl:]]{5,})$|^([^[:cntrl:]])[[:cntrl:]]{5,}$|^[[:cntrl:]] {5,}[^[:cntrl:]]$|^[^[:cntrl:]][[:cntrl:]]{5,}[^[:cntrl:]]$' / filename find fewer than 5 tabs; just substitute `5,' above with `0,4' (there are 4 occurances)
Report Offensive Follow Up For Removal
|
Use following form to reply to current message:
|
|

|