CS310 : Automata Theory 2020
Lecture 8: Regular expressions and NFA
Instructor: S. Akshay
IIT Bombay, India
28-01-2020
Recap
I Deterministic finite state automata (DFA): run, accepting run, (extended) transition relation
I Regular languages: closure under Union, intersection and complementation
I Non-deterministic finite state automata (NFA): the Subset construction, worst-case exponential blowup
I Applications 1: text search using automata.
I -NFA: conversion to NFA.
I Yesterday and today: Regular expressions
Recap
I Deterministic finite state automata (DFA): run, accepting run, (extended) transition relation
I Regular languages: closure under Union, intersection and complementation
I Non-deterministic finite state automata (NFA): the Subset construction, worst-case exponential blowup
I Applications 1: text search using automata.
I -NFA: conversion to NFA.
I Yesterday and today: Regular expressions
Another view of regular languages
Consider any regular language
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words Can these be used to generate all regular languages?
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!
The Kleene Star operation
For a language L, itsKleene closure, denoted L∗ is the set of all strings obtained by taking any number of strings from Lwith possible repetitions and concatenating all of them.
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!
The Kleene Star operation
For a language L, itsKleene closure, denoted L∗ is the set of all strings obtained by taking any number of strings from Lwith possible repetitions and concatenating all of them.
L∗ =∪i≥0Li
Regular expressions as a grammar
Given an alphabet Σ,
Regular expressions are all those generated by following grammar L::= Σ| ∅ |L+L|L◦L|L∗
Regular expressions as a grammar
Given an alphabet Σ,
Regular expressions are all those generated by following grammar
L::= Σ| ∅ |L+L|L◦L|L∗ I L is being recursively defined.
I |separates the constructors I + is the union
I ◦ is concatenation, often we just writeL1L2
Regular expressions as a grammar
Given an alphabet Σ,
Regular expressions are all those generated by following grammar
L::= Σ| ∅ |L+L|L◦L|L∗ I L is being recursively defined.
I |separates the constructors I + is the union
I ◦ is concatenation, often we just writeL1L2
I In the above notation, different occurrences ofLare not required to be same expressions
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union
I ((0)1) + (0)∗ =? I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Algebraic laws of regular expressions
Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.
Associativity and commutativity
I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M +L
Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅
Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL
Algebraic laws of regular expressions
Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.
Associativity and commutativity
I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N
I Commutativity: Only for +: L+M =M +L Identity and Annihilators
I L+∅=∅+L=L I L=L=L I ∅L=∅
Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL
Algebraic laws of regular expressions
Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.
Associativity and commutativity
I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity:
Only for +: L+M =M +L Identity and Annihilators
I L+∅=∅+L=L I L=L=L I ∅L=∅
Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL
Algebraic laws of regular expressions
Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.
Associativity and commutativity
I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L
Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅
Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL
Algebraic laws of regular expressions
Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.
Associativity and commutativity
I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L
Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅
Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL
Algebraic laws of regular expressions
Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.
Associativity and commutativity
I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L
Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅
Union (denoted +) acts as addition and concat as multiplication.
Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL
Algebraic laws of regular expressions
Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.
Associativity and commutativity
I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L
Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅
Union (denoted +) acts as addition and concat as multiplication.
Distributivity: L(M +N) = LM+LN,(M +N)L =ML+NL
Algebraic laws of regular expressions (Contd)
I (L∗)∗ =L∗ I ∅∗ =
I L+=LL∗ =L∗L I +L∗=L∗ I L∗ =+L+
Simplify the following expressions: 1. 0 + 01∗
2. 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Algebraic laws of regular expressions (Contd)
I (L∗)∗ =L∗ I ∅∗ =
I L+=LL∗ =L∗L I +L∗=L∗ I L∗ =+L+
Simplify the following expressions:
1. 0 + 01∗
2. 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Proving “new” laws
Consider
(L+S)∗ = (L∗S∗)∗
ReplacingLby 0 and S by 1 we get
(0 + 1)∗ = (0∗1∗)∗
which is of course same. From this can we conclude identity?
Proving “new” laws
Consider
(L+S)∗ = (L∗S∗)∗ Replacing Lby 0 and S by 1 we get
(0 + 1)∗ = (0∗1∗)∗ which is of course same.
From this can we conclude identity?
Proving “new” laws
Consider
(L+S)∗ = (L∗S∗)∗ Replacing Lby 0 and S by 1 we get
(0 + 1)∗ = (0∗1∗)∗
which is of course same. From this can we conclude identity?
A test for laws of regular expression
I Let R be a regular expression withL1, ..,Ln as variables
I Let R0 be the concrete regular expression by replacing L1, ...,Ln by fresh symbols a1, ...,an in R.
Theorem
For all languagesL1, . . .Ln, each stringw ∈L(R) can be written as w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word aj1, ...,ajn ∈L(R0).
Proof sketch.
SinceR and R0 have same structure, the word produced in R must have a counterpart inR0.
Exercise: (read the) formal proof via structural induction.
Due to the above theorem, we can only analyze concrete languages for proving laws of regular expressions. Use it carefully!
A test for laws of regular expression
I Let R be a regular expression withL1, ..,Ln as variables
I Let R0 be the concrete regular expression by replacing L1, ...,Ln by fresh symbols a1, ...,an in R.
Theorem
For all languages L1, . . .Ln, each stringw ∈L(R) can be written as w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word aj1, ...,ajn ∈L(R0).
Proof sketch.
SinceR and R0 have same structure, the word produced in R must have a counterpart inR0.
Exercise: (read the) formal proof via structural induction.
Due to the above theorem, we can only analyze concrete languages for proving laws of regular expressions. Use it carefully!
A test for laws of regular expression
I Let R be a regular expression withL1, ..,Ln as variables
I Let R0 be the concrete regular expression by replacing L1, ...,Ln by fresh symbols a1, ...,an in R.
Theorem
For all languages L1, . . .Ln, each stringw ∈L(R) can be written as w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word aj1, ...,ajn ∈L(R0).
Proof sketch.
Since R and R0 have same structure, the word produced in R must have a counterpart in R0.
Exercise: (read the) formal proof via structural induction.
Due to the above theorem, we can only analyze concrete languages for proving laws of regular expressions. Use it carefully!
A test for laws of regular expression
I Let R be a regular expression withL1, ..,Ln as variables
I Let R0 be the concrete regular expression by replacing L1, ...,Ln by fresh symbols a1, ...,an in R.
Theorem
For all languages L1, . . .Ln, each stringw ∈L(R) can be written as w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word aj1, ...,ajn ∈L(R0).
Proof sketch.
Since R and R0 have same structure, the word produced in R must have a counterpart in R0.
Exercise: (read the) formal proof via structural induction.
Due to the above theorem, we can only analyze concrete languages for
Use it carefully!
A test for laws of regular expression
I Let R be a regular expression withL1, ..,Ln as variables
I Let R0 be the concrete regular expression by replacing L1, ...,Ln by fresh symbols a1, ...,an in R.
Theorem
For all languages L1, . . .Ln, each stringw ∈L(R) can be written as w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word aj1, ...,ajn ∈L(R0).
Proof sketch.
Since R and R0 have same structure, the word produced in R must have a counterpart in R0.
Exercise: (read the) formal proof via structural induction.
Due to the above theorem, we can only analyze concrete languages for
Regular expressions and regular languages
Going from regular expressions to DFA and back
1. Given any regular expression, show that the language it defines is regular.
2. Given any regular language, show that there exists a regular expression defining it.
From Regular expressions to DFA
Theorem
Every language defined by a regular expression is regular!
Proof: exercise! Try structural induction.
From Regular expressions to DFA
Theorem
Every language defined by a regular expression is regular!
Proof:
exercise! Try structural induction.
From Regular expressions to DFA
Theorem
Every language defined by a regular expression is regular!
Proof: exercise! Try structural induction.
Regular expressions to -NFA
Theorem
For a given RE R, there is an-NFAA such thatL(R) =L(A).
Proof.
We have six constructors for RE. Three of them are base cases
I I ∅ I a
The other three are inductive I R1+R2
I R1R2
I R∗ ...
Regular expressions to -NFA
Theorem
For a given RE R, there is an-NFAA such thatL(R) =L(A).
Proof.
We have six constructors for RE.
Three of them are base cases I
I ∅ I a
The other three are inductive I R1+R2
I R1R2
I R∗ ...
Regular expressions to -NFA
Theorem
For a given RE R, there is an-NFAA such thatL(R) =L(A).
Proof.
We have six constructors for RE.
Three of them are base cases I
I ∅ I a
The other three are inductive I R1+R2
I R1R2
I R∗ ...
Regular expressions to -NFA
Theorem
For a given RE R, there is an-NFAA such thatL(R) =L(A).
Proof.
We have six constructors for RE.
Three of them are base cases I
I ∅ I a
The other three are inductive I R1+R2
I R1R2
I R∗ ...
Regular expressions to NFA II
Proof(contd.) base case:
-NFAs for the base cases I
q1
start q2
I ∅
q1
start q2
I a
q1
start a q2
...
Regular expressions to NFA II
Proof(contd.) base case:
-NFAs for the base cases I
q1
start q2
I ∅
q1
start q2
I a
q1
start a q2
...
Regular expressions to NFA II
Proof(contd.) base case:
-NFAs for the base cases I
q1
start q2
I ∅
q1
start q2
I a
q1
start a q2
...
Regular expressions to NFA III
Proof(contd.) induction step:
-NFAs for the inductive constructors I L1+L2
q0
start
q01 q02
qf1 qf2
qf
L1 L2
I L1L2
q0
start q01 L1 qf1 q02 L2 qf2 qf
I L1∗
q0
start q01 L1 qf1 qf
Regular expressions to NFA III
Proof(contd.) induction step:
-NFAs for the inductive constructors I L1+L2
q0
start
q01 q02
qf1 qf2
qf
L1 L2
I L1L2
q0
start q01 L1 qf1 q02 L2 qf2 qf
I L1∗
q0
start q01 L1 qf1 qf
Regular expressions to NFA III
Proof(contd.) induction step:
-NFAs for the inductive constructors I L1+L2
q0
start
q01 q02
qf1 qf2
qf
L1 L2
I L1L2
q0
start q01 L1 qf1 q02 L2 qf2 qf
I L1∗
L1