CS310 : Automata Theory 2020
Lecture 7: Regular expressions
Instructor: S. Akshay
IIT Bombay, India
27-01-2020
Recap
I Deterministic finite state automata (DFA)
I Formal definition, run, accepting run, (extended) transition relation I Regular languages, closure under Union, intersection and
complementation
I Non-deterministic finite state automata (NFA)
I From NFA to DFA: The Subset construction, worst-case exponential blowup
I Applications to text search using automata.
I -NFA and conversion to NFA.
I Today: An algebraic view of regular languages.
Recap
I Deterministic finite state automata (DFA)
I Formal definition, run, accepting run, (extended) transition relation I Regular languages, closure under Union, intersection and
complementation
I Non-deterministic finite state automata (NFA)
I From NFA to DFA: The Subset construction, worst-case exponential blowup
I Applications to text search using automata.
I -NFA and conversion to NFA.
I Today: An algebraic view of regular languages.
Recap
I Deterministic finite state automata (DFA)
I Formal definition, run, accepting run, (extended) transition relation I Regular languages, closure under Union, intersection and
complementation
I Non-deterministic finite state automata (NFA)
I From NFA to DFA: The Subset construction, worst-case exponential blowup
I Applications to text search using automata.
I -NFA and conversion to NFA.
I Today: An algebraic view of regular languages.
Adding transitions
An -nondeterministic finite-state automaton (-NFA) is a 5-tuple
(Q,Σ, δ,q0,F) where
I Q is a finite set of states
I Σ is a finite alphabet, i.e., set of input symbols
I δ :Q×(Σ∪)→P(Q) is a function that takes a state and input symbol and returns the set of possible next states,
I q0∈Q is the start/initial state
I F ⊆Q is the set of final/accepting states.
-closure
Definition
Let (Q,Σ, δ,q0,F) be an-NFA. For each setS ⊆Q,EClose(S) is the set of states reachable via -transitions from S.
Inductive definition ofEClose(S): I for all q ∈S,q ∈EClose(S)
I if q∈EClose(S) andq0 ∈δ(q, ), then q0 ∈EClose(S).
-closure
Definition
Let (Q,Σ, δ,q0,F) be an-NFA. For each setS ⊆Q,EClose(S) is the set of states reachable via -transitions from S.
Inductive definition of EClose(S):
I for all q ∈S,q ∈EClose(S)
I if q∈EClose(S) andq0 ∈δ(q, ), then q0 ∈EClose(S).
Extended transition relation for -NFA
Extended transition relation
Let ˆδ:Q×Σ∗→ P(Q) be defined as:
δ(q, ) =ˆ EClose({q}) δ(q,ˆ wa) = [
q0∈ˆδ(q,w)
EClose(δ(q0,a))
Acceptance
An-NFAA acceptsw iff ˆδ(q0,w)∩F 6=∅.
Extended transition relation for -NFA
Extended transition relation
Let ˆδ:Q×Σ∗→ P(Q) be defined as:
δ(q, ) =ˆ EClose({q}) δ(q,ˆ wa) = [
q0∈ˆδ(q,w)
EClose(δ(q0,a))
Acceptance
An -NFAA acceptsw iff ˆδ(q0,w)∩F 6=∅.
Removing -transitions
Theorem
For any-NFAA, there exists an NFA A0 (without -transitions) such that L(A) =L(A0).
Proof: Let A= (Q,Σ, δ,q0,F) be an-NFA. Then, we construct NFA A0 = (Q0,Σ, δ0,q0,F0) s.t.,
I Q0 =Q∪˜q0,q00 = ˜q0, Σ is the same but no -transitions are used.
I F =
(F ∪ {˜q0} ifEClose({q0})∩F 6=∅
F otherwise
I δ0(q,a) =
(EClose(δ(EClose(q0,a))) ifq = ˜q0
EClose(δ(q,a)) otherwise
Removing -transitions
Theorem
For any-NFAA, there exists an NFA A0 (without -transitions) such that L(A) =L(A0).
Proof: Let A= (Q,Σ, δ,q0,F) be an-NFA. Then, we construct NFA A0 = (Q0,Σ, δ0,q0,F0) s.t.,
I Q0 =Q∪˜q0,q00 = ˜q0, Σ is the same but no -transitions are used.
I F =
(F ∪ {˜q0} ifEClose({q0})∩F 6=∅
F otherwise
I δ0(q,a) =
(EClose(δ(EClose(q0,a))) ifq = ˜q0
EClose(δ(q,a)) otherwise
Removing -transitions
Theorem
For any-NFAA, there exists an NFA A0 (without -transitions) such that L(A) =L(A0).
Proof: Let A= (Q,Σ, δ,q0,F) be an-NFA. Then, we construct NFA A0 = (Q0,Σ, δ0,q0,F0) s.t.,
I Q0 =Q∪˜q0,q00 = ˜q0, Σ is the same but no -transitions are used.
I F =
(F ∪ {˜q0} ifEClose({q0})∩F 6=∅
F otherwise
I δ0(q,a) =
(EClose(δ(EClose(q0,a))) ifq = ˜q0
EClose(δ(q,a)) otherwise
Removing -transitions
q0
start q1 q2
b
a
a,b
a
q0 q1 q2
˜ q0 start
a b a
a
a,b
a a
a
a
Can we simplify this?
Removing -transitions
q0
start q1 q2
b
a
a,b
a
q0 q1 q2
˜ q0 start
a b a
a
a,b
a a
a
a
Removing -transitions
Theorem
For any-NFAA= (Q,Σ, δ,q0,F), there exists an NFA
A0 = (Q0,Σ, δ0,q0,F0) (without -transitions) such thatL(A) =L(A0).
Proof: Construction in 3 steps:
1. Saturate: repeatedly add shortcuts that make -transitions redundant.
2. Fix final states: if some state reachable from initial state by -transitions is final, then make initial state as final!
3. Remove-transitions.
Can we do this in one shot?
I Q0=Q,q00 =q0, Σ is the same but no-transitions are used. I F =
(F ∪ {q0} ifEClose({q0})∩F 6=∅
F otherwise
I δ0(q,a) =
(EClose(δ(EClose(q0,a))) ifq =q0 EClose(δ(q,a)) otherwise
Removing -transitions
Theorem
For any-NFAA= (Q,Σ, δ,q0,F), there exists an NFA
A0 = (Q0,Σ, δ0,q0,F0) (without -transitions) such thatL(A) =L(A0).
Proof: Construction in 3 steps:
1. Saturate: repeatedly add shortcuts that make -transitions redundant.
2. Fix final states: if some state reachable from initial state by -transitions is final, then make initial state as final!
3. Remove-transitions.
Can we do this in one shot?
I Q0=Q,q00 =q0, Σ is the same but no-transitions are used. I F =
(F ∪ {q0} ifEClose({q0})∩F 6=∅
F otherwise
I δ0(q,a) =
(EClose(δ(EClose(q0,a))) ifq =q0 EClose(δ(q,a)) otherwise
Removing -transitions
Theorem
For any-NFAA= (Q,Σ, δ,q0,F), there exists an NFA
A0 = (Q0,Σ, δ0,q0,F0) (without -transitions) such thatL(A) =L(A0).
Proof: Construction in 3 steps:
1. Saturate: repeatedly add shortcuts that make -transitions redundant.
2. Fix final states: if some state reachable from initial state by -transitions is final, then make initial state as final!
3. Remove-transitions.
Can we do this in one shot?
I Q0 =Q,q00 =q0, Σ is the same but no-transitions are used.
I F =
(F ∪ {q0} ifEClose({q0})∩F 6=∅
F otherwise
I δ0(q,a) =
(EClose(δ(EClose(q0,a))) ifq =q0 EClose(δ(q,a)) otherwise
Removing -transitions
Theorem
For any-NFAA= (Q,Σ, δ,q0,F), there exists an NFA
A0 = (Q0,Σ, δ0,q0,F0) (without -transitions) such thatL(A) =L(A0).
Proof: Construction in 3 steps:
1. Saturate: repeatedly add shortcuts that make -transitions redundant.
2. Fix final states: if some state reachable from initial state by -transitions is final, then make initial state as final!
3. Remove-transitions.
Can we do this in one shot?
I Q0 =Q,q00 =q0, Σ is the same but no-transitions are used.
I F =
(F ∪ {q0} ifEClose({q0})∩F 6=∅
F otherwise
I δ0(q,a) =
(EClose(δ(EClose(q0,a))) ifq =q0 EClose(δ(q,a)) otherwise
Removing -transitions
Theorem
For any-NFAA= (Q,Σ, δ,q0,F), there exists an NFA
A0 = (Q0,Σ, δ0,q0,F0) (without -transitions) such thatL(A) =L(A0).
Proof: Construction in 3 steps:
1. Saturate: repeatedly add shortcuts that make -transitions redundant.
2. Fix final states: if some state reachable from initial state by -transitions is final, then make initial state as final!
3. Remove-transitions.
Can we do this in one shot?
I Q0 =Q,q00 =q0, Σ is the same but no-transitions are used.
I F =
(F ∪ {q0} ifEClose({q0})∩F 6=∅
F otherwise
Another view of regular languages
Consider any regular language
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words Can these be used to generate all regular languages?
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!
The Kleene Star operation
For a language L, itsKleene closure, denoted L∗ is the set of all strings obtained by taking any number of strings from Lwith possible repetitions and concatenating all of them.
Another view of regular languages
Consider any regular language I it can be letters 1 ora or etc.
I it can be emptyset.
I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.
I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!
The Kleene Star operation
For a language L, itsKleene closure, denoted L∗ is the set of all strings obtained by taking any number of strings from Lwith possible repetitions and concatenating all of them.
Regular expressions as a grammar
Given an alphabet Σ,
Regular expressions are all those generated by following grammar L::= Σ| ∅ |L+L|L◦L|L∗
Regular expressions as a grammar
Given an alphabet Σ,
Regular expressions are all those generated by following grammar
L::= Σ| ∅ |L+L|L◦L|L∗
I L is being recursively defined.
I |separates the constructors I + is the union
I ◦ is concatenation, often we just writeL1L2
Regular expressions as a grammar
Given an alphabet Σ,
Regular expressions are all those generated by following grammar
L::= Σ| ∅ |L+L|L◦L|L∗
I L is being recursively defined.
I |separates the constructors I + is the union
I ◦ is concatenation, often we just writeL1L2
I In the above notation, different occurrences ofLare not required to be same expressions
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union
I ((0)1) + (0)∗ =? I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
Simplify this: 1∗01 + 1∗01(0 ++ 11)∗(0 ++ 11)
Examples
Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.
2. words containing at least one aor oneb.
3. words containing at least one aand one b.
Order of precedence
Paranthesis are used or define precedence order: Kleene>Concat >union I ((0)1) + (0)∗ =?
I (01)2 =?
I 01∗+ 10 + 11 =?
Is a regular expression unique?
From Regular expressions to DFA
Theorem
Every language defined by a regular expression is regular!
Proof: exercise! Try structural induction.
From Regular expressions to DFA
Theorem
Every language defined by a regular expression is regular!
Proof:
exercise! Try structural induction.
From Regular expressions to DFA
Theorem
Every language defined by a regular expression is regular!
Proof: exercise! Try structural induction.
From DFA to Regular expressions
Theorem
Every regular language can be defined by a regular expression!