Regular Expression & Regular Languages

(1)

Regular Expression & Regular Languages

(2)

Regular Language

• A language L is known as regular if and only if it is recognized by a finite accepter (FA).

• Language L is regular if and only if it is recognized by a DFA. (??)

• Language L is regular if and only if it is recognized by an NFA. (??)

• A language L is known as regular if and only if it is described by a regular expression (RE).

• A language L is recognized by a FA if and only if L is described by a regular expression.

• NFA recognize exactly the regular languages.

• Regular expressions describe exactly the regular languages.

How to show that a given language is regular ?

(3)

Regular Expression

• A regular expression consists of strings of symbols from some alphabet ^S , parentheses (), and the operators +, . **and *.**

• Let ^S be a given alphabet. Then,

• f, λ and a ^{Î S} are all regular expressions. These are known as primitive regular expressions.

• Recursive Definition:

• If r₁ and r₂ are regular expressions (REs), then the following expressions are also regular:

• r₁ + r₂ OR r₁ | r₂ è (r₁ or r₂)

• r₁.r₂OR r₁r₂ è (r₁ followed by r₂)

• r₁^* è (r₁ repeated zero or more times)

• (r₁)

• A strings of symbols is a regular expression if and only if it can be derived from primitive regular expressions by finite applications of recursive

definition.

(4)

Rules for Specifying Regular Expressions:

()

* .

|

Precedence of Operators

PRECEDENCE HIGHEST .

.. ..

LOWEST

(5)

Rules for REs

For

r , s

and

t

be RE over

Σ

ü r +Æ = Æ + r = r ü r×Æ = Æ×r = Æ ü Æ* = L

ü r +L = L + r = r ü r× L = L ×r = r ü L* = L

ü (r + L)⁺ = r*

ü r +s = s + r

ü r×(s + t) = r×s +r×t ü r×(s.t) = (r×s) ×t

ü r⁺ = r r^*

ü r* = r*(r + L) = r* r* = (r*)*

ü (r*s*)* = (r + s)*

(6)

Valid Regular Expressions: Example

• Let Σ ={a, b, c}

• f, λ, a, b, c

• a, b

• a.b, b.a, a+b, (a.b), (b.a), (a+b)*

• a + b is equivalent to b+a

• a.b is not equivalent to b.a

• (a + b.c)*

• (c+ f )

Why ?

(7)

Invalid Regular Expressions: Example

• Let Σ ={a, b, c}

• a, b*, +a, .b

• +a.b, b.a, .a+b, (++a.b), (..b.a***)*, (++a++b**)*

• (+a + b.c)*

• (c+ f +)

Why ?

(8)

Some Notations

• Parentheses in regular expressions can be omitted when the order of evaluation is clear.

• ((0+1)

^*

) = (0+1)* ^¹ 0+1

^*

• ((0

^*

)+(1

^*

)) = 0

^*

+ 1

^*

• For concatenation, × can be omitted.

• r × r × r… r is denoted by r

ⁿ

.

n times

(9)

Simple Examples over S = {0,1}

• { ^aÎS

^*

| ^a does not contain 1’s}

• 0

^*

• { ^aÎS

^*

| ^a contains 1’s only}

• 1 ^× (1

^*

) (which can be denoted by (1

⁺

))

• { ^aÎS

^*

| ^a contains only 0’s or only 1’s}

• (00

^*

)+(11

^*

)

• S

^*

• (0+1)

^*

• Note: 0* + 1* ¹ (0+1)*

(10)

Examples over S = {0,1}

• Strings of even length, L={00,01,10,11} *

• **(00+01+10+11) * or**

• ((0+1)(0+1))*

• Strings of length 6, L={ ^aÎS *| the length of a is 6}

• 000000+….+111111

• (0+1)(0+1) (0+1)(0+1) (0+1)(0+1) =(0+1)

⁶

• Strings of length 6 or less, L={ ^aÎS *| the length of a is less than or equal to 6}

• λ +0+1+00+01+10+11….+111111

• (0+1+ λ)

⁶

(11)

Examples over S = {0,1}

• { ^aÎS

^*

| ^a is a binary number divisible by ⁴ }

• **(0+1)*00**

• { ^aÎS *| ^a does not contain ¹¹ }

• **(0+10)* (1+ λ)**

• { ^aÎS *| ^a contains odd number of ¹ ’s}

• 0(1010)10*

• { ^aÎS *| any two 0’s in a are separated by three ¹

^’

s}

• 1(0111)01* + 1*

(12)

Regular Expressions: Example

• All strings of 1s and 0s

(0 | 1)^*

• All strings of 1s and 0s beginning with a 1

1 (0 | 1)^*

• All strings containing two or more 0s

(1|0)^*0(1|0)^*0(1|0)^*

• All strings containing an even number of 0s

(1^*01^*01^*)^* | 1^*

(13)

Regular Expressions : Example

• All strings containing an even number of 0s and even number of 1s

Assume that ( 0 0 | 1 1 ) is X

X* | (X* ( 0 1 | 1 0 ) X* ( 0 1 | 1 0 ) X*)*

OR

( 0 0 | 1 1 )^*(( 0 1 | 1 0 )( 0 0 | 1 1 )^*( 0 1 | 1 0 )( 0 0 | 1 1 )^*)^*

• All strings of alternating 0s and 1s

(

λ

| 1 ) ( 0 1 )^* (

λ

^{| 0 )}

• Strings over the alphabet {a, b} in which substrings ab and ba occur an unequal number of times

• (a⁺b⁺)⁺ | (b⁺a⁺)⁺

(14)

Regular Expressions : Example

• Strings over the alphabet {0, 1} with no consecutive 0's

• (1 | 01 )^* (0 |

e

)

• 1^*(01⁺)^* (0 |

e

)

• 1^*(011^*)^* (0 |

e

)

• Strings over the alphabet {a, b} with exactly three b's

• a^*ba^*ba^*ba^*

• Strings over the alphabet {a, b, c} containing (at least once) bc

• (a|b|c)^*bc(a|b|c)^*

(15)

Regular Expressions : Example

• (1 | 10)^*

• all strings starting with “1” and containing no “00”

• (0 | 1)^*011

• all strings ending with “011”

• 0^*1^*

• all strings with no “0” after “1”

• 00^*11^*

• all strings with at least one “0” and one “1”, and no “0” after “1”

(16)

Regular Expressions : Example

• What languages do the following RE represent?

• ((0 | 1)(0 | 1))^* | ((0 | 1)(0 | 1)(0 | 1))^*

(17)

Regular Languages

• Each RE has an equivalent regular language (RL).

• A language L is regular if there is a regular expression r such that L

= L(r).

• The language L(r) denoted by any regular expression r is defined by the following rules.

• Φ is a regular expression. L(Φ) = {} =Φ

• λ is a regular expression. L(λ) = {λ}

• a Î S are all regular expressions. L(a) = {a}

(18)

Regular Languages: Cont..

• If r

₁

and r

₂

are regular expressions (REs).

• r

₁

+ r

₂

is R.E., then

L(r

₁

) È L(r

₂

) = {w | w Î L(r

₁

) or w Î L(r

₂

)}

• r

₁

.r

₂

is R.E., then

L(r

₁

).L(r

₂

) = {w

₁

.w

₂

: w

₁

Î L(r

₁

) and w

₂

Î L(r

₂

)}

• r

₁^*

is R.E., then

(L(r

₁

))

^*

= L(r

₁

)

⁰

È L(r

₁

)

¹

È L(r

₁

)

²

È L(r

₁

)

³

È …

• (r

₁

) is R.E., then

( ^r ₁ ^r ₂ ) ^L ( ) ^r ₁ ^L ( ) ^r ₂

L + = È

( ^r ₁ ^r ₂ ) ^L ( ) ( ) ^r ₁ ^L ^r ₂

L × =

( ) ^r ₁ ^* ( ^L ( ) ^r ₁ ) ^*

L =

( )

( ) ^r ₁ ^L ( ) ^r ₁

L =

(19)

Regular Expression to Regular Language

Regular Expression: ( ^a ⁺ ^b ) ^× ^a ^*

( )

( ^a ^b ^a ^* )

L + × ⁼ ^L ( ( â ⁺ ^b ) ) ( ) ^L â ^* ( â ^b ) ( ) ^L â ^*

L +

=

( ) ( )

( ^L ^a ^È ^L ^b ) ( ) ( ^L ^a ) ^*

=

{ } { }

( ^a ^È ^b ) { } ( ^a ) ^*

=

{ â ^, ^b }{ l ^, â ^, âa ^, âaa ^,... }

=

{ â ^, âa ^, âaa ^,..., ^b ^, ^ba ^, ^baa ^,... }

=

(20)

RE to RL

* ) 1 0

( 00

* ) 1 0

( + +

= ) r

( r

L

= { all strings containing substring 00 }

( ) ( ) ^aa ^bb ^b

r = * *

( ) ^r ⁼ ^{ ^a ² ^b ² ^b ^: ⁿ ^, ^m ^³ ⁰ ^}

L ⁿ ^m

( ^a ^b ) ( ^a ^bb )

r = + * +

( ) { ^r â ^, ^bb ^, âa ^, âbb ^, ^ba ^, ^bbb ^,... }

L =

(21)

RE & RL: Example

• λ* is RE, then the language L(λ *) = {λ}^* = {λ}

• f* is RE, then the language L(f*) = {f}^* = { }

• 0* is RE, then the language

L(0*) = {0}* = {λ, 0, 00, 000, 0000, …}

• (0+1).(00+11) is RE, then the language

L( (0+1).(00+11) ) = {0, 1}{00, 11} = {000, 011, 100, 111}

• (10+01) * is RE, then the language

L((10+01)*) = {10, 01}^* = {λ, 10, 1010, 101010, …, 01, 0101, 010101, …, 1001, 100101, 10010101, …, 0110, 011010, 01101010, …}

(22)

RE & RL: Example

• Let L be a language over {a, b}, each string in L contains the substring bb

• L = {a, b}^*{bb}{a, b}^*

• L is regular language (RL). Why?

• {a} and {b} are RLs

• {a, b} is RL

• {a, b}^* is RL

• {b}{b} = {bb} is also RL

• Then L = {a, b}^*{bb}{a, b}^* is RL

(23)

RE & RL: Example

• Let L be a language over {a, b}, each string in L

• begins and ends with an a AND contains at least one b

• L = {a}{a, b}^*{b}{a, b}^*{a}

• L is regular language (RL). Why?

• {a} and {b} are RLs

• {a, b} is RL

• {a, b}^* is RL

• Then L = {a}{a, b}^*{b}{a, b}^*{a} is RL

(24)

RL - Example

• The RE (b + aba)ab* represents the strings over {a, b} with an odd number of a’s

• Note: this is a set equality; to prove it you have to show the following:

• strings with an odd number of a’s are in this language; and

• any string in this language has an odd number of a’s.

(25)

RE & RL: Example

• Let å = {a, b}

• RE a|b è L = {a, b}

• RE (a|b)(a|b) è L = {aa, ab, ba, bb}

• RE aa|ab|ba|bb same as above

• RE a* è L = {l, a , aa, aaa, …}

• RE (a|b)* è L = set of all strings of a’s and b’s including l

• RE (a*b*)* è same as above

• RE a|a*b è L = {a,b,ab,aab,aaab, …}

(26)

RE & RL

• 01^*

• {0, 01, 011, 0111, …..}

• (01^*)(01)

• {001, 0101, 01101, 011101, …..}

• (0 | 1)^*

• {0, 1, 00, 01, 10, 11, …..}

• i.e., all strings of 0 and 1

• (0 | 1)^* 00 (0 | 1)^*

• {00, 1001, …..}

• i.e., all 0 and 1 strings containing a “00”

(27)

EQUIVALENT REs

• Two regular expressions r and s are equivalent (r=s), if and only if r and s represent/generate the same language.

• Example-:

• r = a|b, s = b|a è r = s Why?

• Since L(r) = L(s) = {a, b}

• RE = (a)|((b)(c)) is equivalent to a + bc

(28)

EQUIVALENT REs

• Examples,

• (a*b*)* = (a+b)*

• (a+b)*ab(a+b)*+b*a* = (a+b)*

• First equality rather clear.

• For the second equality, note that (a+b)* denotes strings over a and b, that a string either contains ab or it doesn’t; the first half of the left-hand expression describes the

strings that contain the substring ab and the second half describes those that don’t; the + says “take the union”.

(29)

Regular Expressions: Exercise

• Construct a RE over S={0,1} such that

• It does not contain any string with two consecutive “0”s

• It has no prefix with two or more “0”s than “1” nor two or more “1”s than “0”

• The set of all strings ending with “00”

• The set of all strings with 3 consecutive 0’s

• The set of all strings beginning with “1”, which when interpreted as a binary no., is divisible by 5

• The set of all strings with a “1” at the 5th position from the right

• The set of all strings not containing 101 as a sub-string

• Construct a RE for the set {aⁿb^m: n >=3, m is even}.

• Construct a RE for the set {aⁿb^m: n >=4, m <= 3}.

• Construct a RE for the set {w: |w| mod 3 =0}.

• Construct a RE for the set {w: |w| mod 3 = 1}

(30)

Regular Expression to NFA

(31)

Regular Language

• A language L is known as regular if and only if it is recognized by a finite accepter (FA).

• Language L is regular if and only if it is recognized by a DFA. (??)

• Language L is regular if and only if it is recognized by an NFA. (??)

• A language L is known as regular if and only if it is described by a regular expression (RE).

• A language L is recognized by a FA if and only if L is described by a regular expression.

How to show that “ A given language is regular ?”

(32)

Connection Between RE & RL

• A language L is called regular if and only if there exists some DFA M such that L = L(M).

• Since a DFA has an equivalent NFA, then

• A language L is called regular if and only if there exists some NFA N such that L = L(N).

• If we have a RE r, we can construct an NFA that accept L(r).

(33)

Connection Between RE & RL

NFAs for Primitive Regular Expression

3. For regular expression a Î S, construct NFA

start q₀ a q_f L (a)= {a}

2. For regular expression

!

, construct NFA

!

start q₀ q_f L(!) = {!}

1. For regular expression Æ, construct NFA

start q₀ q_f L(Æ) = { } =

Æ

(34)

Connection Between RE & RL

where q_i and q_f are new initial / final states, and !-moves are introduced from q_i to the old start states of M_r1 and M_r2 as well as from all of their final states to q_f.

If r₁ and r₂ are regular expressions, M_r1 and M_r2 are their NFAs.

Then, r₁ + r₂ has NFA.

start

!

q_i q_f

!

M_r1

M_r2 ^!

!

L( (r₁ + r₂ ) ) = L(M_r1) È L(M_r2) Convert M_r1 into NFA with single final state.

Convert M_r2 into NFA with single final state.

(35)

Connection Between RE & RL

If r₁ and r₂ are regular expressions, M_r1 and M_r2 are their NFAs.

Then, r₁.r₂ has NFA.

M_r1

!

start q_i ^! M_r2 ^! q_f

where q_i is the new initial state of M_r1 and q_f is the new final state of M_r2.

!-move is introduced from final state of M_r1 to initial state of M_r2 .

L( (r₁.r₂ ) ) = L(M_r1).L(M_r2)

(36)

Connection Between RE & RL

q_f M_s

!

start q_i ^!

!

where : q_i is new start state and q_f is new final state

!-move q_i to q_f (to accept null string)

!-moves q_i to old start, old final(s) to q_f

!-moves old final(s) to q_f

!-move old final to old start (WHY? Repetition) If r₁ is a regular expressions and M_r1 its NFA,

(r₁)* (Kleene star) has NFA:

L( (r1)* ) = ( L(r₁) )*

(37)

Example-1

• Build an NFA- ^e that accepts r

₁

= (a|b)

^*

ba

• The RE r

₁

consists of a, b, ba and a|b

start a

! a

start b q₁

start b

a

b

start !

!

(38)

Example-1

• Build an NFA- ^e that accepts (a|b)

^*

ba (a|b)

^*

a

b

!

! !

!

(39)

Example-1

• Build an NFA- ^e that accepts (a|b)

^*

ba

! a b

! a

b

!

! !

!

(40)

Example-2

R.E. a ( b | c )^* 1. a, b, & c

2. b | c

3. ( b | c )^*

S₀ ^a S₁ S₀ ^b S₁ S₀ ^c S₁

S₁ ^b S₂ S₃ ^c S₄

S₀ S₅

S₂ ^b S₃ S₄ ^c S₅

S₁ S₆

S₀ S₇

e e

e

e e

e

e e

(41)

Example-2

4. a ( b | c )

^*

S₀ ^a S₁ e ^S⁴ ^S⁵

b

S₆ ^c S₇

S₃ S₈

S₂ S₉

e e

S0 ^a S₁ b | c

e e

(42)

Example-3

NFA for : a | abb | a^*b⁺

a abb

a*b⁺

NFA’s :

start

1

b b

a

a a

2

3 4 5

8 7

6

(43)

Example-3

NFA for : a | abb | a^*b⁺

!

! 0

b b

a

a a

2

3 4 5

8 7

6 1

start

(44)

Example-4

Regular Expression: (ab*c) | (a(b|c*))

(45)

Example-4

b e

e e e

a c

c e

e e e

e e

b

a

e

e e

e

1

6 5

4 3

8

2

10

9 12 13 14

11

15

7

16 17

(46)

Construct NFAs for RE over S = {0,1}

• Strings of even length, L={00,01,10,11} *

• **(00+01+10+11) * or**

• ((0+1)(0+1))*

• Strings of length 6, L={ ^aÎS *| the length of a is 6}

• 000000+….+111111

• (0+1)(0+1) (0+1)(0+1) (0+1)(0+1) =(0+1)

⁶

• Strings of length 6 or less, L={ ^aÎS *| the length of a is less than or equal to 6}

• λ +0+1+00+01+10+11….+111111

• (0+1+ λ)

⁶

(47)

Construct NFAs for RE over S = {0,1}

• {aÎS^*| a is a binary number divisible by 4}

• (0+1)*00

• {aÎS*| a does not contain 11}

• (0+10)* (1+ λ)

• {aÎS*| a contains odd number of 1’s}

• 0*(10*10*)*10*

• {aÎS*| any two 0’s in a are separated by three 1’s}

• 1*(0111)*01* + 1*

• Strings over the alphabet {a, b} with exactly three b's

• a^*ba^*ba^*ba^*

• Strings over the alphabet {a, b, c} containing (at least once) bc

• (a|b|c)^*bc(a|b|c)^*

(48)

Construct FAs for RE over S = {a, b}

• Construct NFA for the language L(ab*aa + bba*ab)

• Construct NFA for the language L( (a + b)*b(a + bb)* )

• Construct NFA for the set {aⁿb^m: n >=3, m is even}.

• Construct NFA for the set {aⁿb^m: n >=4, m <= 3}.

• Construct NFA for the set {w: |w| mod 3 =0}.

• Construct NFA for the set {w: |w| mod 3 = 1}

• Construct DFA for the language L( ab*a* ) ∪ L( (ab)*ba )

• Construct DFA for the language L( ab*a* ) ∩ L( (ab)*ba )

• Find the minimal DFA for the language L( a*bb ) ∩ L( ab*ba )

(49)

NFA to Regular Expression

(50)

NFA to RE

• If L is accepted by some NFA- ^e , then L is represented by some regular expression.

• A regular expression for an NFA-^e consists of labels of all the walks from initial state (q₀) to final state (s) q_f.

• The computation of labels of all the walks does not look too difficult but it is complicated by the existence of cycles (The cycles can be traversed arbitrarily, in any order).

(51)

Generalized Transition Graph

• A generalized transition graph (GTG or Expression graph) is like a transition diagram but it can have regular expressions as labels on arcs

• An NFA-e is a GTG.

• An NFA is a GTG.

• A DFA is a GTG.

(52)

Complete GTG

• A complete GTG is a graph in which all edges are present.

• A complete GTG with |V| vertices has exactly |V|

²

edges.

• If a GTG has some edges missing

• Add edges with label f.

Incomplete GTG

(53)

Complete GTG

Incomplete GTG

Complete GTG

(54)

Complete GTG

Incomplete GTG

Complete GTG

(55)

GTG Reduction to Regular Expression

• A GTG G can be reduced to one GTG G’ with just two states (Initial and final states)

• If we reduce an NFA- e in this way, the arc label then corresponds to the

regular expression representing it.

(56)

RE for GTG

• For two-state complete GTG, the Regular Expression is given as.

r = (r

₁

)*r

₂

(r

₄

+ r

₃

(r

₁

)*r

₂

)*

• The regular expression r covers all possible paths from initial state to final state.

• First path q₀ to q_f (Self Loop + Direct path from q₀ to q_f )

OR

• Second path q₀ to q_f (Self Loop + Direct path from q₀ to q_f + Indirect path (q_f to q₀ to q_f))

(57)

Example

r = (r

₁

)*r

₂

(r

₄

+ r

₃

(r

₁

)*r

₂

)*

r = (a)(a+b) ( c + f (a)(a+b) )*

r = (a)(a+b) ( c + f )

r = a(a|b) c

Convert to Complete GTG

Convert the labels into RE

(58)

RE for GTG

• When a GTG has more than two states (initial and final states are distinct).

• Add missing edges in order to make it complete GTG

• Find an equivalent GTG by removing one state at a time.

• Remove non-final and non-initial states only.

• Next, find its regular expression.

(59)

RE for GTG

• Remove state q₂

• Consider all paths from q₁ to q₃

• q₁ to q₁

• q₁ to q₃

• q₃ to q₁

• q₃ to q₃

• Regular Expressions for

• Path q₁ to q₁ is e + af*b

• Path q₁ to q₃ is h + af*c

• Path q₃ to q₁ is i + df*b

• Path q₃ to q₃ is g + df*c

• The above regular expressions becomes labels for transitions.

(60)

RE for GTG

• After removal of state q

₂

• Find RE

(61)

RE for GTG

• Find RE

• r = (r

₁

)*r

₂

(r

₄

+ r

₃

(r

₁

)*r

₂

)*

• r = (e + afb )(h + afc) ( (g + dfc) + (i + dfb) (e + afb )(h + afc) )*

(62)

b a + a ^b

b

q 0 q ₁ q ₂

b a , a ^b

b

q 0 q ₁ q ₂

b

Transition labels are regular

expressions

RE for GTG: Example

Covert to

Transitions into RE

(63)

RE for GTG: Example

Covert to

Complete GTG

Remove q₁

Simplify It

(64)

RE for GTG: Example

Find regular expression r = (r

₁

)*r

₂

(r

₄

+ r

₃

(r

₁

)*r

₂

)*

r = (bba)(bb(a+b)) ( b + ^f (bba)(bb(a+b)) )*

r = (bba)(bb(a+b)) (b+ f )

r = (bba)(bb(a|b)).b

(65)

NFA to RE :Procedure

• Step-1:

• Convert NFA N into NFA N’ with single final State distinct from its initial state.

• Step-2:

• Convert NFA N’ into complete GTG G.

• Let r_ij stand for the label of the edge from q_i to q_j.

• Step-3:

• If the GTG has only two states, with q_i as its initial state and q_j its final state.

• The associated regular expression is r.

r = (r_ii)* r_ij (r_jj + r_ji . (r_ii)* . r_ij )*

(66)

NFA to RE :Procedure

• Step-4:

• If the GTG has three states, with initial state q_i, final state q_j and third state q_k,

• Introduced new edges, labeled r_pq + r_pk (r_kk)*r_kq for p = i, j; q = i, j.

• Remove vertex q_k and its associated edges.

(67)

NFA to RE :Procedure

• Step-5:

• If the GTG has four or more states, pick a state q_k to be removed.

• Apply rule 4 for all pairs of states (q_i, q_j), i ≠ k, j ≠k.

At each step apply the rules

r + f = r

r. f = f.r = f f *= l

• Step-6:

• Repeat step 3 to 5 until correct regular expression is obtained.

(68)

Repeat the process until two states are left.

Initial Transition graph Resulting GTG

NFA to RE :Procedure

q0

q_f

(69)

Find RE for NFA: Example-1

(70)

Example-1

Convert to Complete GTG

(71)

Example-1

Remove State OE

Considerable Paths EE to EE

EE to OO OO to EE OO to OO EO to EO

(72)

Example-1

Remove State OO

Considerable Paths EE to EE

EE to EO EO to EE EO to EO

(73)

Example-1

Find RE

r = (aa+ab(bb)*ba)*(b+ ab(bb)*a) .

(a(bb)*a + (b+a(bb)*ba)(aa+ab(bb)*ba)*(b+ ab(bb)*a) )*

(74)

Find RE for NFA: Example-2

(75)

b a + a ^b

b

q 0 q ₁ q ₂

b a , a ^b

b

q 0 q ₁ q ₂

b

Example-2

Convert transitions into RE

(76)

Example-2

Covert to

Complete GTG

Remove q₁

Simplify It

(77)

Example-2

Find regular expression r = (r

₁

)*r

₂

(r

₄

+ r

₃

(r

₁

)*r

₂

)*

r = (bba)(bb(a+b)) ( b + ^f (bba)(bb(a+b)) )*

r = (bba)(bb(a+b)) (b+ f )

r = (bba)(bb(a|b)).b

(78)

Find RE for NFAs

3

Start ¹ 1 ¹ 2

0

0,1

(79)

Theorem

Languages

Generated by

Regular Expressions

Regular

Languages

=

(80)

Theorem - Part 1

r )

( r L

1. For any regular expression

the language is regular Languages

Generated by

Regular Expressions

Regular

Languages

Í

(81)

Theorem - Part 2

Languages

Generated by

Regular Expressions

Regular

Languages

Ê

L

r ^L ⁽ ^r ⁾ ⁼ ^L 2. For any regular language there is

a regular expression with

(82)

Proof - Part 1

r )

( r L

1. For any regular expression

the language is regular

Proof by induction on the size of r

(83)

Induction Basis

• Primitive Regular Expressions: Æ , l , a

NFAs

) (

)

( M ₁ = Æ = L Æ L

) (

} {

)

( M ₂ l L l

L = =

) ( }

{ )

( M ₃ a L a

L = =

regular

languages

a

(84)

Inductive Hypothesis

• Assume and are regular expressions.

• and are regular languages r 1 r ₂ )

( r ₁

L L ( r ₂ )

(85)

Inductive Step

• We will prove: ( )

( )

( ) ( ) ( ) ₁

1 2 1

2 1

* r L

r L

r r

L

r r

L

× +

Are regular

Languages

(86)

• By definition of regular expressions:

( ) ( ) ( )

( ) ( ) ( ) ( ) ( ( ) )

( )

( ) ₁ ( ) ₁

1 1

2 1

*

r L r

L

r L r

L

r L r

L r

r L

r L r

L r

r L

=

×

È

=

+

(87)

) ( r ₁

L L ( r ₂ )

By inductive hypothesis we know:

and are regular languages

There exists NFAs for the following languages

( ) ( ) ( ) ( )

( )

( ¹ ₁ ) ^* ²

2 1

r L

r L r

L

r L r

L È

Union

Concatenation Star

We also know:

(88)

• Therefore:

( ) ( ) ( )

( ) ( ) ( ) ( ) ₁ ^* ( ( ) ₁ ) ^*

2 1

r L r

L

r L r

L r

r L

r L r

L r

r L

=

×

È

= +

Are regular

languages

(89)

• And trivially:

)) (( r ₁

L is a regular language

(90)

Proof – Part 2

r L ( r ) = L L

2. For any regular language there is a regular expression with

Proof by construction of regular expression.

• For any regular language, there exists NFA.

• We can construct RE for given NFA.

Therefore, for every regular language L

There exists RE r such that L = L(r).

(91)

Linear Grammars

(92)

Linear Grammars

A grammar G = (V, T, S, P) is said to be linear if all productions have at most one variable at the right side and have exactly one variable on left side.

Example:

l

®

® A

aAb A

Ab S

l

®

® S

aSb

S

(93)

A Non-Linear Grammar

bSa S

aSb S

S

SS S

®

l

Grammar : G

)}

( )

( :

{ )

( G w n w n w

L = _a = _b

Number of in string a w

(94)

Another Linear Grammar

Grammar :

Ab B

aB A

A S

®

l

|

} 0 :

{ )

( G = a b n ³

L ⁿ ⁿ

G

(95)

Right-Linear Grammars

A grammar G = (V, T, S, P) is said to be right-linear if all productions are of the form

xB A ®

x A ®

or

a S

abS S

®

Where

A, B ε V and x ε T*

Example

(96)

Regular Expression & Regular Languages

Regular Expression & Regular Languages

Regular Language

• A language L is known as regular if and only if it is recognized by a finite accepter (FA).

• A language L is known as regular if and only if it is described by a regular expression (RE).

• A language L is recognized by a FA if and only if L is described by a regular expression.

• NFA recognize exactly the regular languages.

• Regular expressions describe exactly the regular languages.

How to show that a given language is regular ?

Regular Expression

• A regular expression consists of strings of symbols from some alphabet S , parentheses (), and the operators +, . and *.

• Let S be a given alphabet. Then,

• A strings of symbols is a regular expression if and only if it can be derived from primitive regular expressions by finite applications of recursive

definition.

Precedence of Operators

Rules for REs

r , s

t

Σ

Valid Regular Expressions: Example

• Let Σ ={a, b, c}

• f, λ, a, b, c

• a*, b*

• a.b, b.a, a+b, (a.b)*, (b.a)*, (a+b)*

• (a + b.c)*

• (c+ f )

Why ?

Invalid Regular Expressions: Example

• Let Σ ={a, b, c}

• *a, *b*, +a*, .b*

• +a.b, *b.a, .*a+b, (++a.b)*, (..*b.a***)*, (++a++b**)*

• (+a + b.c)*

• (c+ f *+*)

Why ?

Some Notations

• Parentheses in regular expressions can be omitted when the order of evaluation is clear.

• ((0+1)

) = (0+1)* ¹ 0+1

• ((0

)+(1

)) = 0

+ 1

• For concatenation, × can be omitted.

• r × r × r… r is denoted by r

.

n times

Simple Examples over S = {0,1}

• { aÎS

| a does not contain 1’s}

• 0

• { aÎS

| a contains 1’s only}

• 1 × (1

) (which can be denoted by (1

))

• { aÎS

| a contains only 0’s or only 1’s}

• (00

)+(11

)

• S

• (0+1)

• Note: 0* + 1* ¹ (0+1)*

Examples over S = {0,1}

• Strings of even length, L={00,01,10,11} *

• (00+01+10+11) * or

• ((0+1)(0+1))*

• Strings of length 6, L={ aÎS *| the length of a is 6}

• 000000+….+111111

• (0+1)(0+1) (0+1)(0+1) (0+1)(0+1) =(0+1)

• Strings of length 6 or less, L={ aÎS *| the length of a is less than or equal to 6}

• λ +0+1+00+01+10+11….+111111

• (0+1+ λ)

Examples over S = {0,1}

• { aÎS

| a is a binary number divisible by 4 }

• (0+1)*00

• { aÎS *| a does not contain 11 }

• (0+10)* (1+ λ)

• { aÎS *| a contains odd number of 1 ’s}

• A regular expression consists of strings of symbols from some alphabet ^S , parentheses (), and the operators +, . **and *.**

• Let ^S be a given alphabet. Then,

• a, b

• a.b, b.a, a+b, (a.b), (b.a), (a+b)*

• a, b*, +a, .b

• +a.b, b.a, .a+b, (++a.b), (..b.a***)*, (++a++b**)*

• (c+ f +)

) = (0+1)* ^¹ 0+1

• { ^aÎS

| ^a does not contain 1’s}

• { ^aÎS

| ^a contains 1’s only}

• 1 ^× (1

• { ^aÎS

| ^a contains only 0’s or only 1’s}

• **(00+01+10+11) * or**

• Strings of length 6, L={ ^aÎS *| the length of a is 6}

• Strings of length 6 or less, L={ ^aÎS *| the length of a is less than or equal to 6}

• { ^aÎS

| ^a is a binary number divisible by ⁴ }

• **(0+1)*00**

• { ^aÎS *| ^a does not contain ¹¹ }

• **(0+10)* (1+ λ)**

• { ^aÎS *| ^a contains odd number of ¹ ’s}

• 0(1010)10*

• { ^aÎS *| any two 0’s in a are separated by three ¹

• 1(0111)01* + 1*

( ^r ₁ ^r ₂ ) ^L ( ) ^r ₁ ^L ( ) ^r ₂