goelweb.com --> Software --> Unix utilities --> Removing non-ASCII characters
I was parsing XML files, but sometimes the parser would error out because there were some non-ASCII characters in the input. To deal with this, I could have written a simple C program to strip those characters. But there had to be an easier way. After some effort (more effort than it would have taken me to write the equivalent C program), I came up with a solution.
$ ex -c 's/[^[:alnum:][:punct:][:space:]]/ /g|wq' $HOME/data.xml
This command processes file data.xml. First it substitues non-alphanumeric, non-punctuation, and non-space characters with a space. Then it writes those changes to the file. In one step it does all this, and you don't need to revert to programming. Caution: if you're trying to edit a large file, the program may not successfully run.
This was a good exercise because I've used this idea on a number of other occassions:
$ ex -c 's/[[:space:]]*$//g|wq' file # remove trailing blank spaces on each line $ ex -c 'g/^[[:space:]]*$/d|wq' file # delete blank lines from filerishi.goel@alumni.usc.edu