## CS310 : Automata Theory 2020

### Lecture 8: Regular expressions and NFA

Instructor: S. Akshay

IIT Bombay, India

28-01-2020

### Recap

I Deterministic finite state automata (DFA): run, accepting run, (extended) transition relation

I Regular languages: closure under Union, intersection and complementation

I Non-deterministic finite state automata (NFA): the Subset construction, worst-case exponential blowup

I Applications 1: text search using automata.

I -NFA: conversion to NFA.

I Yesterday and today: Regular expressions

### Recap

I Deterministic finite state automata (DFA): run, accepting run, (extended) transition relation

I Regular languages: closure under Union, intersection and complementation

I Non-deterministic finite state automata (NFA): the Subset construction, worst-case exponential blowup

I Applications 1: text search using automata.

I -NFA: conversion to NFA.

I Yesterday and today: Regular expressions

### Another view of regular languages

Consider any regular language

### Another view of regular languages

Consider any regular language I it can be letters 1 ora or etc.

I it can be emptyset.

I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.

I it can be several words: obtained as a (finite) union of sets of words

### Another view of regular languages

Consider any regular language I it can be letters 1 ora or etc.

I it can be emptyset.

I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.

I it can be several words: obtained as a (finite) union of sets of words Can these be used to generate all regular languages?

### Another view of regular languages

Consider any regular language I it can be letters 1 ora or etc.

I it can be emptyset.

I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.

I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!

### Another view of regular languages

Consider any regular language I it can be letters 1 ora or etc.

I it can be emptyset.

I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.

I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!

The Kleene Star operation

For a language L, itsKleene closure, denoted L^{∗} is the set of all strings
obtained by taking any number of strings from Lwith possible repetitions
and concatenating all of them.

### Another view of regular languages

Consider any regular language I it can be letters 1 ora or etc.

I it can be emptyset.

I it can be a word, e.g., 110: obtained as a (finite) concatenation of letters.

I it can be several words: obtained as a (finite) union of sets of words I it can be something that “repeats”!

The Kleene Star operation

For a language L, itsKleene closure, denoted L^{∗} is the set of all strings
obtained by taking any number of strings from Lwith possible repetitions
and concatenating all of them.

L^{∗} =∪i≥0L^{i}

### Regular expressions as a grammar

Given an alphabet Σ,

Regular expressions are all those generated by following grammar
L::= Σ| ∅ |L+L|L◦L|L^{∗}

### Regular expressions as a grammar

Given an alphabet Σ,

Regular expressions are all those generated by following grammar

L::= Σ| ∅ |L+L|L◦L|L^{∗}
I L is being recursively defined.

I |separates the constructors I + is the union

I ◦ is concatenation, often we just writeL_{1}L_{2}

### Regular expressions as a grammar

Given an alphabet Σ,

Regular expressions are all those generated by following grammar

L::= Σ| ∅ |L+L|L◦L|L^{∗}
I L is being recursively defined.

I |separates the constructors I + is the union

I ◦ is concatenation, often we just writeL_{1}L_{2}

I In the above notation, different occurrences ofLare not required to be same expressions

### Examples

Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.

2. words containing at least one aor oneb.

3. words containing at least one aand one b.

Order of precedence

Paranthesis are used or define precedence order: Kleene>Concat >union
I ((0)1) + (0)^{∗} =?

I (01)2 =?

I 01^{∗}+ 10 + 11 =?

Is a regular expression unique?

Simplify this: 1∗01 + 1∗01(0 ++ 11)^{∗}(0 ++ 11)

### Examples

Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.

2. words containing at least one aor oneb.

3. words containing at least one aand one b.

Order of precedence

Paranthesis are used or define precedence order: Kleene>Concat >union

I ((0)1) + (0)^{∗} =?
I (01)2 =?

I 01^{∗}+ 10 + 11 =?

Is a regular expression unique?

Simplify this: 1∗01 + 1∗01(0 ++ 11)^{∗}(0 ++ 11)

### Examples

Given regular expressions for the following languages over Σ ={a,b,c} 1. All words containing at least one a.

2. words containing at least one aor oneb.

3. words containing at least one aand one b.

Order of precedence

Paranthesis are used or define precedence order: Kleene>Concat >union
I ((0)1) + (0)^{∗} =?

I (01)2 =?

I 01^{∗}+ 10 + 11 =?

Is a regular expression unique?

Simplify this: 1∗01 + 1∗01(0 ++ 11)^{∗}(0 ++ 11)

### Examples

2. words containing at least one aor oneb.

3. words containing at least one aand one b.

Order of precedence

Paranthesis are used or define precedence order: Kleene>Concat >union
I ((0)1) + (0)^{∗} =?

I (01)2 =?

I 01^{∗}+ 10 + 11 =?

Is a regular expression unique?

Simplify this: 1∗01 + 1∗01(0 ++ 11)^{∗}(0 ++ 11)

### Examples

2. words containing at least one aor oneb.

3. words containing at least one aand one b.

Order of precedence

Paranthesis are used or define precedence order: Kleene>Concat >union
I ((0)1) + (0)^{∗} =?

I (01)2 =?

I 01^{∗}+ 10 + 11 =?

Is a regular expression unique?

Simplify this: 1∗01 + 1∗01(0 ++ 11)^{∗}(0 ++ 11)

### Algebraic laws of regular expressions

Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.

Associativity and commutativity

I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M +L

Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅

Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL

### Algebraic laws of regular expressions

Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.

Associativity and commutativity

I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N

I Commutativity: Only for +: L+M =M +L Identity and Annihilators

I L+∅=∅+L=L I L=L=L I ∅L=∅

Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL

### Algebraic laws of regular expressions

Let Land M be regular expressions. We often identify the regular expression with the language of strings it represents.

Associativity and commutativity

I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity:

Only for +: L+M =M +L Identity and Annihilators

I L+∅=∅+L=L I L=L=L I ∅L=∅

Union (denoted +) acts as addition and concat as multiplication. Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL

### Algebraic laws of regular expressions

Associativity and commutativity

I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L

Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅

### Algebraic laws of regular expressions

Associativity and commutativity

I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L

Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅

### Algebraic laws of regular expressions

Associativity and commutativity

I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L

Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅

Union (denoted +) acts as addition and concat as multiplication.

Distributivity: L(M +N) = LM+LN,(M +N)L=ML+NL

### Algebraic laws of regular expressions

Associativity and commutativity

I Associativity: L+ (M+N) = (L+M) +N,L(MN) = (LM)N I Commutativity: Only for +: L+M =M+L

Identity and Annihilators I L+∅=∅+L=L I L=L=L I ∅L=∅

Union (denoted +) acts as addition and concat as multiplication.

Distributivity: L(M +N) = LM+LN,(M +N)L =ML+NL

### Algebraic laws of regular expressions (Contd)

I (L^{∗})^{∗} =L^{∗}
I ∅^{∗} =

I L^{+}=LL^{∗} =L^{∗}L
I +L^{∗}=L^{∗}
I L^{∗} =+L^{+}

Simplify the following expressions:
1. 0 + 01^{∗}

2. 1∗01 + 1∗01(0 ++ 11)^{∗}(0 ++ 11)

### Algebraic laws of regular expressions (Contd)

I (L^{∗})^{∗} =L^{∗}
I ∅^{∗} =

I L^{+}=LL^{∗} =L^{∗}L
I +L^{∗}=L^{∗}
I L^{∗} =+L^{+}

Simplify the following expressions:

1. 0 + 01^{∗}

2. 1∗01 + 1∗01(0 ++ 11)^{∗}(0 ++ 11)

### Proving “new” laws

Consider

(L+S)^{∗} = (L^{∗}S^{∗})^{∗}

ReplacingLby 0 and S by 1 we get

(0 + 1)^{∗} = (0^{∗}1^{∗})^{∗}

which is of course same. From this can we conclude identity?

### Proving “new” laws

Consider

(L+S)^{∗} = (L^{∗}S^{∗})^{∗}
Replacing Lby 0 and S by 1 we get

(0 + 1)^{∗} = (0^{∗}1^{∗})^{∗}
which is of course same.

From this can we conclude identity?

### Proving “new” laws

Consider

(L+S)^{∗} = (L^{∗}S^{∗})^{∗}
Replacing Lby 0 and S by 1 we get

(0 + 1)^{∗} = (0^{∗}1^{∗})^{∗}

which is of course same. From this can we conclude identity?

### A test for laws of regular expression

I Let R be a regular expression withL1, ..,Ln as variables

I Let R^{0} be the concrete regular expression by replacing L1, ...,Ln by fresh
symbols a_{1}, ...,a_{n} in R.

Theorem

For all languagesL_{1}, . . .L_{n}, each stringw ∈L(R) can be written as
w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word
a_{j}_{1}, ...,a_{j}_{n} ∈L(R^{0}).

Proof sketch.

SinceR and R^{0} have same structure, the word produced in R must have a
counterpart inR^{0}.

Exercise: (read the) formal proof via structural induction.

Due to the above theorem, we can only analyze concrete languages for proving laws of regular expressions. Use it carefully!

### A test for laws of regular expression

I Let R be a regular expression withL1, ..,Ln as variables

I Let R^{0} be the concrete regular expression by replacing L1, ...,Ln by fresh
symbols a_{1}, ...,a_{n} in R.

Theorem

For all languages L_{1}, . . .L_{n}, each stringw ∈L(R) can be written as
w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word
a_{j}_{1}, ...,a_{j}_{n} ∈L(R^{0}).

Proof sketch.

SinceR and R^{0} have same structure, the word produced in R must have a
counterpart inR^{0}.

Exercise: (read the) formal proof via structural induction.

Due to the above theorem, we can only analyze concrete languages for proving laws of regular expressions. Use it carefully!

### A test for laws of regular expression

I Let R be a regular expression withL1, ..,Ln as variables

I Let R^{0} be the concrete regular expression by replacing L1, ...,Ln by fresh
symbols a_{1}, ...,a_{n} in R.

Theorem

For all languages L_{1}, . . .L_{n}, each stringw ∈L(R) can be written as
w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word
a_{j}_{1}, ...,a_{j}_{n} ∈L(R^{0}).

Proof sketch.

Since R and R^{0} have same structure, the word produced in R must have a
counterpart in R^{0}.

Exercise: (read the) formal proof via structural induction.

Due to the above theorem, we can only analyze concrete languages for proving laws of regular expressions. Use it carefully!

### A test for laws of regular expression

I Let R be a regular expression withL1, ..,Ln as variables

^{0} be the concrete regular expression by replacing L1, ...,Ln by fresh
symbols a_{1}, ...,a_{n} in R.

Theorem

For all languages L_{1}, . . .L_{n}, each stringw ∈L(R) can be written as
w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word
a_{j}_{1}, ...,a_{j}_{n} ∈L(R^{0}).

Proof sketch.

Since R and R^{0} have same structure, the word produced in R must have a
counterpart in R^{0}.

Exercise: (read the) formal proof via structural induction.

Due to the above theorem, we can only analyze concrete languages for

Use it carefully!

### A test for laws of regular expression

I Let R be a regular expression withL1, ..,Ln as variables

^{0} be the concrete regular expression by replacing L1, ...,Ln by fresh
symbols a_{1}, ...,a_{n} in R.

Theorem

_{1}, . . .L_{n}, each stringw ∈L(R) can be written as
w =w1w2...wk ∈L(R) such thatwi is in some languageLji, and the word
a_{j}_{1}, ...,a_{j}_{n} ∈L(R^{0}).

Proof sketch.

Since R and R^{0} have same structure, the word produced in R must have a
counterpart in R^{0}.

Exercise: (read the) formal proof via structural induction.

Due to the above theorem, we can only analyze concrete languages for

### Regular expressions and regular languages

Going from regular expressions to DFA and back

1. Given any regular expression, show that the language it defines is regular.

2. Given any regular language, show that there exists a regular expression defining it.

### From Regular expressions to DFA

Theorem

Every language defined by a regular expression is regular!

Proof: exercise! Try structural induction.

### From Regular expressions to DFA

Theorem

Every language defined by a regular expression is regular!

Proof:

exercise! Try structural induction.

### From Regular expressions to DFA

Theorem

Every language defined by a regular expression is regular!

Proof: exercise! Try structural induction.

### Regular expressions to -NFA

Theorem

For a given RE R, there is an-NFAA such thatL(R) =L(A).

Proof.

We have six constructors for RE. Three of them are base cases

I I ∅ I a

The other three are inductive I R1+R2

I R1R2

I R^{∗} ...

### Regular expressions to -NFA

Theorem

For a given RE R, there is an-NFAA such thatL(R) =L(A).

Proof.

We have six constructors for RE.

Three of them are base cases I

I ∅ I a

The other three are inductive I R1+R2

I R1R2

I R^{∗} ...

### Regular expressions to -NFA

Theorem

For a given RE R, there is an-NFAA such thatL(R) =L(A).

Proof.

We have six constructors for RE.

Three of them are base cases I

I ∅ I a

The other three are inductive I R1+R2

I R1R2

I R^{∗} ...

### Regular expressions to -NFA

Theorem

For a given RE R, there is an-NFAA such thatL(R) =L(A).

Proof.

We have six constructors for RE.

Three of them are base cases I

I ∅ I a

The other three are inductive I R1+R2

I R1R2

I R^{∗} ...

### Regular expressions to NFA II

Proof(contd.) base case:

-NFAs for the base cases I

q_{1}

start q_{2}

I ∅

q1

start q2

I a

q1

start a q2

...

### Regular expressions to NFA II

Proof(contd.) base case:

-NFAs for the base cases I

q_{1}

start q_{2}

I ∅

q1

start q2

I a

q1

start a q2

...

### Regular expressions to NFA II

Proof(contd.) base case:

-NFAs for the base cases I

q_{1}

start q_{2}

I ∅

q1

start q2

I a

q1

start a q2

...

### Regular expressions to NFA III

Proof(contd.) induction step:

-NFAs for the inductive constructors I L1+L2

q0

start

q_{0}^{1}
q_{0}^{2}

q_{f}^{1}
q_{f}^{2}

q_{f}

L_{1}
L2

I L1L2

q_{0}

start q_{0}^{1} L1 q_{f}^{1} q_{0}^{2} L2 q_{f}^{2} q_{f}

I L_{1}^{∗}

q0

start q_{0}^{1} L1 q_{f}^{1} q_{f}

### Regular expressions to NFA III

Proof(contd.) induction step:

-NFAs for the inductive constructors I L1+L2

q0

start

q_{0}^{1}
q_{0}^{2}

q_{f}^{1}
q_{f}^{2}

q_{f}

L_{1}
L2

I L1L2

q_{0}

start q_{0}^{1} L1 q_{f}^{1} q_{0}^{2} L2 q_{f}^{2} q_{f}

I L_{1}^{∗}

q0

start q_{0}^{1} L1 q_{f}^{1} q_{f}

### Regular expressions to NFA III

Proof(contd.) induction step:

-NFAs for the inductive constructors I L1+L2

q0

start

q_{0}^{1}
q_{0}^{2}

q_{f}^{1}
q_{f}^{2}

q_{f}

L_{1}
L2

I L1L2

q_{0}

start q_{0}^{1} L1 q_{f}^{1} q_{0}^{2} L2 q_{f}^{2} q_{f}

I L_{1}^{∗}

L1