ambassadors of technology
|
Bob Fabry, N6EK, brought UNIX to the University of California at Berkeley in
1973, and it was from this system that Bill Joy (and others) were able to hack
it into the shape that became BSD. Bob also arranged the contract with DARPA
to add TCP/IP to their
BSD operating system, making it the first Internet capable O/S in 1980. In
1988, after Bob had retired from Berkeley, he helped design and build radio
beacons for the International Amateur Radio Union. These beacons now help
amateur and professional HF radio users determine which HF radio bands will
best provide a path to which location around the world.
|
vi, ed, more, lynx, sed, csh, grep: Regex-aware applications
The regular expression, or RegEx, is a language for specifying
text to search for. Regular expressions are used by vi, ed, more,
sed, csh, grep, and many other Unix applications (and sometimes
documentation!) for text searching and search-and replace operations.
Most programs that support regular expressions allow you to use them to
search for something using the [/] key. This works in vi, ed, more, and
lynx, to my knowledge.
Grep allows you to search in a file or multiple files for a chunk of
text or a regular expression:
grep joe /etc/passwd
This will look for the word `joe' in the file `/etc/passwd'
Most programs that support search-and-replace operations usually expect you
to type something along the lines of
s/thing to change/thing to change it to/
This syntax works in ed and ex. In vi, you need to start with a colon to
put vi into ex mode.
:s/thing to change/thing to change it to/
So, in vi, you can change water into wine by typing this in command
mode (make sure you're in command mode by hitting the [ESC] key):
:s/water/wine/
You can put the text that you'd matched in your search into the replacement
with an ampersand (&).
:s/water/& buffalo/
This replaces the first occurence of `water' with `water buffalo'.
This comes in more useful when you use actual Regular Expressions instead
of just verbatim text searches.
Learn more about the variations of the :s/before/after/ command in vi on
the vi reference page.
The C Shell's command history can use regular expressions to rerun previous
commands with modifications:
!!:s/file1/file2/
This will run the last command you typed, except `file1' will be replaced
with `file2'. Learn more about the variations of C Shell history modification
on our C Shell Reference.
Instead of typing just the text you want to search for in these programs,
you can use a regular expression, also known as a regex or RE.
Regular Expressions allow you to search for text you don't know exactly.
Regular Expressions work somewhat like mathmatical expressions, and multiple
Regular Expressions can be put together to make a bigger Regular Expression.
The simplest Regular Expressions are the ones that match one character.
Most characters, including all letters and numbers, are regular expressions
which match themselves. When we typed:
grep joe /etc/passwd
we were searching for the regex `joe' inthe file /etc/passwd. The Regular
Expression `joe' is a complex Regular Expression made up of three
single-character Regular Expressions which match themselves, `j', `o', and
`e'.
Any character which has special meaning in a Regular Expression can be
preceded by a backslash (\) to use it literally. Normally when typing
complex Regular Expressions on the command line for grep, I enclose the
whole thing in single quotes (') to avoid having to deal with any character
of my regular expression having special meaning to the
shell
.
grep '\*\*\* \$100 \*\*\*' myfile
This will search for `*** $100 ***' in the file `myfile'. The single quotes
stop the shell from
globbing
the asterisks (*),
and the slashes (\) stop grep from trying to interpret the *'s and $ as part
of the Regular Expression (as you'll see shortly, * means one or more of
the previous character in Regular Expressions, whereas $ means the end
of a line. Take a deep breath and read on ...
A list of characters can be put into a Regular Expression to be matched in
one position by enclosing the list in [ and ].
grep '[Jj]oe' /etc/passwd
This will find any line in /etc/passwd which contains `joe' or `Joe'.
Inside lists, a range of characters can be specified with a hyphen (-),
and if the first character of the list is a ^, then the list will match any
character not included in the list.
grep 'Joe[0-9A-Fa-f]' /etc/passwd
This will print any line in /etc/passwd containing Joe0, Joe1, JoeA, JoeB,
Joea, Joeb, etc (but not joe0, joeA, joea, etc).
A period matches any single character.
grep 'j.e' /etc/passwd
This will print lines in /etc/passwd containing `jae', `jxe', `j@e', `j3e',
etc.
A Regular Expression can begin with a caret (^) to specify
that the line should start with the match, and can end with a dollar
symbol($) to specify that the line should end with the match.
grep 'joe$' /etc/passwd
This will print any lines in /etc/passwd which end in joe.
grep '^joe' /etc/passwd
This will print any lines in /etc/passwd which begin with joe.
grep '^joe$' /etc/passwd
This will print any lines in /etc/passwd which contain just joe.
A regular expression may be followed by one of several
repetition operators:
?
|
The preceding item is optional and matched at most
once.
|
*
|
The preceding item will be matched zero or more
times.
|
+
|
The preceding item will be matched one or more
times.
|
{n}
|
The preceding item is matched exactly n times.
|
{n,}
|
The preceding item is matched n or more times.
|
{n,m}
|
The preceding item is matched at least n times, but
not more than m times.
|
Some of this functionality exists in only the `extended' variety of the
Regular Expression, thus I'll use the `egrep' (or Extended GREP) program.
egrep '[Hh#][@Aa][Cc]?[KkXx]+[0Oo3Ee][Rr]' /etc/passwd
Whew! This will find lines containing any of the following in
the file /etc/passwd:
- Hacker
- Haker
- Haxer
- haxxer
- hAXX3R
- Hax0r
- #@XX0R
- H@KXxXk3r
The regular expression says, "H or h or # followed by @ or A or a followed
by possibly (at most one) C or c followed by one or more K or k or X or x
followed by an 0 or O or o or 3 or E or e followed by an R or r." Isn't this
fun? If you thought script kiddies wrote things in indecipherable gibberish...
Finally, you can match one of two multiple-character regular expressions
using the pipe (|) character and possibly some parentheses:
egrep '[Hh](owdy|i|ello)' mytext
This prints out any lines in the file `mytext' which contains one of the
following:
- Howdy
- howdy
- Hi
- hi
- Hello
- hello
Have fun, and remember ...
[Gg6][Rr][3Ee][Pp] ?[Tt+][Hh#]([3Ee@Aa]|/\\) ?[Pp][Ll1]([Aa@]|/\\)(\|\\\||[Nn])[3Ee][Tt+]