Alphabet of the languages L described by the regular expressions: AL = {a, b, c}
Now we define the language of all regular expressions that define a language over AL. This language of regular expressions is a meta-language. Let us call it RE. Alphabet of RE: = {a, b, c, epsi, con, or, star, open, close} or {a, b, c, є, ., |, *, (, )} or {a, b, c, "є", ".", "|", "*", "(", ")"}
Recursive language equations - definition of the language RE:
RE = C | RE "|" C -- union - Note: distinguish | from "|" , the latter being a lexical unit of the RE language.
Finding a solution (note that LG = L1 | LG .L2 has solution LG = L1 . L2 * ): we get C = B ( "." B )* -- RE = C ( "|" C )* ---- therefore RE = ( ( B ( "." B )* ) ( "|" ( B ( "." B )* ) )* ) where B = Sim ("*" )*
But we also have the parenthesis : B = Sim | B "*" | "(" RE ")"
The diagram below is an finite state automaton (assuming that the "(" RE ")" branch at the bottom is not present). With the "(" RE ")" branch, it is a recursive finite state automaton. It can be used to check whether a given sequence is part of the RE language. However, no automaton (without recursion) exists that could check that language. Why not ?
level 0 |
level 1 |
level 2 |
level 3 |
level 4 |
level 5 |
|
RE = Ø |
RE = Ø |
RE = Ø |
RE = Ø |
RE = a | b | c | "є" |
RE = a | b | c | "є" | ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* ) | (a | b | c | "є" ) "|" (a | b | c | "є" | ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* )) |
|
C = Ø |
C = Ø |
C = Ø |
C = a | b | c | "є" |
C = a | b | c | "є" | ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* ) |
C = a | b | c | "є" | a* | b* | c* | a** | b** | c** | (a | b | c | "є" | ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* ) ) "." (a | b | c | "є" | a* | b* | c* | a** | b** | c**) |
|
B = Ø |
B = Ø |
B = a | b | c | "є" |
B = a | b | c | "є" | a* | b* | c* |
B = a | b | c | "є" | a* | b* | c* | a** | b** | c** |
B = a | b | c | "є" | a* | b* | c* | a** | b** | c** | a*** | b*** | c*** | "(" ( a | b | c | "є" ) ")" |
|
Sim = Ø |
Sim = a | b | c | "є" |
Sim = a | b | c | "є" |
Sim = a | b | c | "є" |
Sim = a | b | c | "є" |
Sim = a | b | c | "є" |
|
Note: Each sequence belonging to RE, C, B or Sim can also be represented by a tree where the alphabet symbols form the leaf nodes and the variables representing sets of strings (that is, RE, C, B and Sim) form the internal nodes of the tree. Some examples are given in class.
We can write the recursive equations above in the form of a context-free grammar as follows:
Context-free grammar |
The same grammar written with separate production rules, each having an identifying number |
|
|
"a" , "b" , "c" are so-called terminals of the grammar. They represent symbols of the alphabet used by the regular expressions.
The syntax tree of a sequence of terminal symbols explains why the sequence is part of the language according to the syntax rules of the context-free grammar. Consider the following example (a):