LinuxGuruz

  Regular expressions

   The wildcards belong to the shell. They are used for matching
   filenames. UNIX has a more general and widely used mechanism for
   matching strings, this is through regular expressions.

   Regular expressions are used by the egrep utility, text editors like
   ed, vi and emacs and sed and awk. They are also used in the C
   programming language for matching input as well as in the Perl
   programming language and lex tokenizer. Here are some examples using
   the egrep command which print lines from the file /etc/rc which match
   certain conditions. The contruction is part of egrep. Everything in
   between these symbols is a regular expression. Notice that special
   shell symbols ! * & have to be preceded with a backslash \ in order to
   prevent the shell from expanding them!

# Print all lines beginning with a comment #

egrep '(^#)'           /etc/rc

# Print all lines which DON'T begin with #

egrep '(^[^#])'        /etc/rc

# Print all lines beginning with e, f or g.

egrep '(^[efg])'       /etc/rc

# Print all lines beginning with uppercase

egrep '(^[A-Z])'       /etc/rc

# Print all lines NOT beginning with uppercase

egrep '(^[^A-Z])'      /etc/rc

# Print all lines containing ! * &

egrep '([\!\*\&])'     /etc/rc

# All lines containing ! * & but not starting #

egrep '([^#][\!\*\&])' /etc/rc

   Regular expressions are made up of the following `atoms'.

   These examples assume that the file `/etc/rc' exists. If it doesn't
   exist on the machine you are using, try to find the equivalent by, for
   instance, replacing /etc/rc with /etc/rc* which will try to find a
   match beginning with the rc.

   `.'
          Match any single character except the end of line.
   `^'
          Match the beginning of a line as the first character.
   `$'
          Match end of line as last character.
   `[..]'
          Match any character in the list between the square
          brackets.(see below).
   `*'
          Match zero or more occurrances of the preceding expression.
   `+'
          Match one or more occurrences of the preceding expression.
   `?'
          Match zero or one occurrance of the preceding expression.

   You can find a complete list in the unix manual pages. The square
   brackets above are used to define a class of characters to be matched.

   Here are some examples,
     * If the square brackets contain a list of characters, $[a-z156]$
       then a single occurrance of any character in the list will match
       the regular expression: in this case any lowercase letter or the
       numbers 1, 5 and 6.
     * If the first character in the brackets is the caret symbol `^'
       then any character except those in the list will be matched.
     * Normally a dash or minus sign `-' means a range of characters. If
       it is the first character after the `[' or after `[^' then it is
       treated literally.