Solved Ignore diacritics and accents in grep

August 15, 2012 at 11:12:26
Specs: Windows XP
Hello, I am using gnuwin32 tools in Windows.

I need a solution to do a batch search that ignores diacritics and accents

the input text file looks like this:

english
énglish
énglish
ènglish
ĕnglish
ênglish

when i do "grep english file.txt" it only shows me the first line but I need the script to return all those lines.

if grep can't do it, is there any better command line search tool?


See More: Ignore diacritics and accents in grep

Report •


#1
August 15, 2012 at 11:40:10
I think the issue is that all six of those characters in batch are different. they are not e's with a modification. they are ascii characters 136,137,138, 130 and so on. so every tool that I have seen will not recognize e as the same as è for example. Theoretically you might be able to search for *nglish, if grep allows wildcards.

:: mike


Report •

#2
August 15, 2012 at 12:46:06
✔ Best Answer
grep '[eééèĕê]nglish' file.txt

should do the job.

As mikelinus says, they are different characters so there's no easy way to just drop the accents (other than specifying each character).


Report •

#3
August 15, 2012 at 19:55:21
Thanks ijack, to be honest after I posted it I wasn't sure that I was on firm ground with the tool kit, as I have utterly no experiance with that kit.

:: mike


Report •

Related Solutions

#4
August 15, 2012 at 23:11:36
Grep is an extremely powerful search tool but to make the most of it you have to understand its regular expression syntax. Have a look at http://www.zytrax.com/tech/web/rege... for example.

Report •


Ask Question