• No results found

Regular Languages:

N/A
N/A
Protected

Academic year: 2022

Share "Regular Languages:"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

A. Trivedi – 1 of 20

Regular Languages:

Regular Expressions and Finite Automata

Ashutosh Trivedi

CS208: Automata Theory and Logic

Department of Computer Science & Engineering, IIT Bombay.

(22nd January, 2013)

A. Trivedi Regular Languages

(2)

What are Regular Languages?

– Alanguageis asetofstringsover somealphabet – An alphabet Σ ={a,b,c}is a finiteset of letters, – Σis the set of all strings over Σ, e.g. aabbaa∈Σ, – A languageLover Σ is then a subset of Σ,

– Leven={w∈Σ:w is of even length}

– Lab={w∈Σ:w is of the formanbmforn,m≥0}

– Lanbn ={w ∈Σ:w is of the formanbn forn≥0}

– Lprime={w∈Σ:w has a prime number ofa0s}

– Deterministic and Nondeterministic Finite automatadefine languages that require finite resources (states) to recognize.

Definition (Regular Languages)

We call a languageregular if it can beacceptedby a finite automaton. Examples!

(3)

A. Trivedi – 2 of 20

What are Regular Languages?

– Alanguageis asetofstringsover somealphabet – An alphabet Σ ={a,b,c}is a finiteset of letters, – Σis the set of all strings over Σ, e.g. aabbaa∈Σ, – A languageLover Σ is then a subset of Σ,

– Leven={w∈Σ:w is of even length}

– Lab={w∈Σ:w is of the formanbmforn,m≥0}

– Lanbn ={w ∈Σ:w is of the formanbn forn≥0}

– Lprime={w∈Σ:w has a prime number ofa0s}

– Deterministic and Nondeterministic Finite automatadefine languages that require finite resources (states) to recognize.

Definition (Regular Languages)

We call a languageregular if it can beacceptedby a finite automaton.

Examples!

A. Trivedi Regular Languages

(4)

Why they are “Regular”

– A number of widely different and equi-expressive formalisms precisely capture the same class of languages:

– Deterministic finite state automata

– Nondeterministic finite state automata (also withε-transitions) – Kleene’sregular expressions, also appeared asType-3 languagesin

Chomsky’s hierarchy [Cho59].

– Monadic second-order logicdefinable languages [B¨60, Elg61, Tra62]

– Certain Algebraic connection (acceptability via finite semi-group) [RS59]

We have already seen that:

Theorem (DFA=NFA=ε-NFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by anon-deterministic finite automaton.

In this lecture we introduceRegular Expressions, and prove:

Theorem (REGEX=DFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by aregular expression.

It will be totally exciting if this course can cover MSO-logic connection!

(5)

A. Trivedi – 3 of 20

Why they are “Regular”

– A number of widely different and equi-expressive formalisms precisely capture the same class of languages:

– Deterministic finite state automata

– Nondeterministic finite state automata (also withε-transitions) – Kleene’sregular expressions, also appeared asType-3 languagesin

Chomsky’s hierarchy [Cho59].

– Monadic second-order logicdefinable languages [B¨60, Elg61, Tra62]

– Certain Algebraic connection (acceptability via finite semi-group) [RS59]

We have already seen that:

Theorem (DFA=NFA=ε-NFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by anon-deterministic finite automaton.

In this lecture we introduceRegular Expressions, and prove:

Theorem (REGEX=DFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by aregular expression.

It will be totally exciting if this course can cover MSO-logic connection!

A. Trivedi Regular Languages

(6)

Why they are “Regular”

– A number of widely different and equi-expressive formalisms precisely capture the same class of languages:

– Deterministic finite state automata

– Nondeterministic finite state automata (also withε-transitions) – Kleene’sregular expressions, also appeared asType-3 languagesin

Chomsky’s hierarchy [Cho59].

– Monadic second-order logicdefinable languages [B¨60, Elg61, Tra62]

– Certain Algebraic connection (acceptability via finite semi-group) [RS59]

We have already seen that:

Theorem (DFA=NFA=ε-NFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by anon-deterministic finite automaton.

In this lecture we introduceRegular Expressions, and prove:

Theorem (REGEX=DFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by aregular expression.

It will be totally exciting if this course can cover MSO-logic connection!

(7)

A. Trivedi – 3 of 20

Why they are “Regular”

– A number of widely different and equi-expressive formalisms precisely capture the same class of languages:

– Deterministic finite state automata

– Nondeterministic finite state automata (also withε-transitions) – Kleene’sregular expressions, also appeared asType-3 languagesin

Chomsky’s hierarchy [Cho59].

– Monadic second-order logicdefinable languages [B¨60, Elg61, Tra62]

– Certain Algebraic connection (acceptability via finite semi-group) [RS59]

We have already seen that:

Theorem (DFA=NFA=ε-NFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by anon-deterministic finite automaton.

In this lecture we introduceRegular Expressions, and prove:

Theorem (REGEX=DFA)

A language is accepted by adeterministic finite automatonif and only if it is accepted by aregular expression.

It will be totally exciting if this course can cover MSO-logic connection!

A. Trivedi Regular Languages

(8)

Regular Expressions (RegEx)

– textual (declarative) way to represent regular languages (compare automata)

– UNIX-based OS users will already be familiar with these expressions – ls *DVDRIP*.avi

– rm -rf *.*

– grep automat* /usr/share/dict/words – Also used in AWK, expr, Emacs and vi searches, – Lexical analysis tools likeflex use it for definingtokens

– Some usefulString-set operations: – unionL∪Mdef={w :w ∈Lorw ∈M} – concatenationL.Mdef={u.v:u∈Landv ∈M}

– self-concatenationletL2def=L.L, similarlyL3,L4, .... AlsoL0def={ε}. – S. C. Kleene cite proposed notationLto denoteclosureof

self-concatenation operation, i.e. Ldef=∪i≥0Li. – ExamplesL={ε}andL={0,1}

(9)

A. Trivedi – 4 of 20

Regular Expressions (RegEx)

– textual (declarative) way to represent regular languages (compare automata)

– UNIX-based OS users will already be familiar with these expressions – ls *DVDRIP*.avi

– rm -rf *.*

– grep automat* /usr/share/dict/words – Also used in AWK, expr, Emacs and vi searches, – Lexical analysis tools likeflex use it for definingtokens – Some usefulString-set operations:

– unionL∪Mdef={w :w ∈Lorw ∈M}

– concatenationL.Mdef={u.v:u∈Landv ∈M}

– self-concatenationletL2def=L.L, similarlyL3,L4, .... AlsoL0def={ε}.

– S. C. Kleene cite proposed notationLto denoteclosureof self-concatenation operation, i.e. Ldef=∪i≥0Li.

– ExamplesL={ε}andL={0,1}

A. Trivedi Regular Languages

(10)

Building Regular Expressions

For a regular expressionE we writeL(E) for its language. The set of valid regular expressionsRegEx can be defined recursively as the following:

(empty string) ε∈RegEx L(ε) ={ε}

(empty set) ∅ ∈RegEx L(∅) =∅ (single letter) a∈RegEx L(a) ={a}

(variable) L∈RegEx whereLis a language variable.

(union) E+F ∈RegEx L(E+F) =L(E)∪L(F) (concatenation) E.F∈RegEx L(E.F) =L(E).L(F) (Kleene Closure) E∈RegEx L(E) = (L(E)) (Parenthetic Expression) (E)∈RegEx L((E)) =L(E).

Precedence Rules: ∗> . >+

Example : 01+ 10def= (0.(1)) + ((1).(0))

(11)

A. Trivedi – 6 of 20

Finite Automata to Regular Expressions

Theorem

For every deterministic finite automaton A there exists a regular expression EA such that L(A) =L(EA).

Proof.

– Let states of automatonAbe{1,2, . . . ,n}.

– ConsiderRi,j(k) be the regular expression whose language is the set of labels of path fromi toj without visiting any state with label larger thank. – (Basis): Ri,j(0) collects labels of direct paths fromi toj,

– Ri,j(0)=a1+a2+· · ·+anifδ(i,ak) =jfor 1≤k≤n – ifi=jthen it also includesε.

– (Induction): ComputeRi,j(k) usingRi,j(k−1)’s.

A. Trivedi Regular Languages

(12)

Computing R

i,j(k)

using R

i,j(k−1)

(13)

A. Trivedi – 8 of 20

Finite Automata to Regular Expressions

Theorem

For every deterministic finite automaton A there exists a regular expression EA such that L(A) =L(EA).

Proof.

– Let states of automatonAbe{1,2, . . . ,n}.

– ConsiderRi,j(k) be the regular expression whose language is the set of labels of path fromi toj without visiting any state with label larger thank. – (Basis): Ri,j(0) collects labels of direct paths fromi toj,

– Ri,j(0)=a1+a2+· · ·+anifδ(i,ak) =jfor 1≤k≤n – ifi=jthen it also includesε.

– (Induction): ComputeRi,j(k) usingRi,j(k−1)’s.

Ri,j(k)=Ri,j(k−1)+Ri,k(k−1).(Rk,k(k−1)).Rk(k−1),j . – EA isRi(n)

0,f1+Ri(n)

0,f2+· · ·+Ri(n)

0,fk.

A. Trivedi Regular Languages

(14)

Alternative Method–Eliminating States

Shortcomings of previous reduction:

– The previous method works in all the settings, but is expensive (up ton3 expressions, with afactor of 4 blowup in each step).

– For eachi,j,i0,j0, bothRi,j(k+1)andRi(k+1)0,j0 store expression (Rk(k),k). This duplicationcan be avoided.

Alternative (more intuitive) method:

– A “beast” in the middle: Finite automata with regular expressions – Remove all states except final and initial states in an “intuitive” way.

– Trivial to write regular expressions for DFA with only two states: an initial and a final one.

– The regular expression is union of this construction for every final state.

– Example

(15)

A. Trivedi – 10 of 20

figure2

A. Trivedi Regular Languages

(16)

figure3

(17)

A. Trivedi – 12 of 20

Regular Expressions to Finite Automata

Theorem

For every regular expression E there exists a deterministic finite automaton AE

such that L(E) =L(AE).

Proof.

– Via induction on the structure of the regular expressions we show a reduction to nondeterministic finite automata withε-transitions.

– Result follows form the equivalence of such automata with DFA.

A. Trivedi Regular Languages

(18)

Regular Expressions to Finite Automata

(19)

A. Trivedi – 14 of 20

Regular Expressions to Finite Automata

A. Trivedi Regular Languages

(20)

Syntactic Sugar for Regular Expressions in Unix

[a1a2a3. . .ak] for a1+a2+· · ·+ak

. for a+b+· · ·+z+A+...

| for +

R{5} for RRRRR R+ for ∪i≥1R{i}

R? for ε+R

Also [A-za-z0-9] ,[:digits:], etc.

Applications:

Check the man page of “grep” (regular expression based search tool) and

“lex” (A tool to generate regular expressions based pattern matching tool) to learn more about regular expressions on UNIX based systems.

(21)

A. Trivedi – 16 of 20

Algebraic Laws for Regular Expressions

Associativity:

– L+ (M+N) = (L+M) +NandL.(M.N) = (L.M).N.

Commutativity:

– L+M =M+L. However,L.M6=M.Lin general.

Identity:

– ∅+L=L+∅=Landε.L=L.ε=L Annihilator:

– ∅.L=L.∅=∅ Distributivity:

– left distributivityL.(M+N) =L.M+L.N.

– right distributivity (M+N).L=M.L+N.L.

IdempotentL+L=L.

Closure Laws:

– (L)=L,∅=ε, ε=ε,L+=LL=LL, andL=L++ε.

DeMorgan Type Law: (L+M)= (LM)

A. Trivedi Regular Languages

(22)

Verifying laws for regular expressions

Theorem

– Let E is some regular expressions with variables L1L2, . . . ,Lm.

– Let C be a regular expression where each Li is concretized to some letters a1a2, . . .am.

– Then for any languages L1,L2, . . . ,Lm every string in w in L(E)can be written as w1w2. . .wk where wi is in some language Lji. Then

aj1aj2. . .ajk is in L(C).

– In other words , the set L(E)can be constructed by taking strings aj1aj2. . .ajk from L(C)and replacing aji with Lji.

Proof.

A simple induction over the structure of regular expressionE.

(23)

A. Trivedi – 18 of 20

Example

Theorem (Application)

Proof of a concretized law carries over to abstract law.

Example

Prove that (ε+L)=L.

We can concretize the rule as (ε+a)=a. Let’s prove the concretized law, and we know that the result will carry over to the abstract law.

(ε+a) = (ε.a)

= (ε.a)

= (a)

= a.

First equality holds since (L+M)= (L.M). The second equality holds sinceε=ε. The third equality holds asεis identity for concatenation, while the last equality follows from (L) =L.

A. Trivedi Regular Languages

(24)

Example

q1

start q2 q3

1

0

0 1 0

1

R1,1 R1,2 R1,3 R2,1 R2,2 R2,3 R3,1 R3,2 R3,3

(0) 1 +ε 0 0 +ε 1 1 0 +ε

(1) 1 10 0 +ε 1 1 0 +ε

(2) 1 100 1001 0 01 10 (0 +ε) + 101

R1,1(1) = R1,1(0)+R1,1(0)(R1,1(0))R1,1(0)

= (1 +ε) + (1 +ε)(1 +ε)(1 +ε)

= (1 +ε)ε+ (1 +ε)(1 +ε)(1 +ε)

= (1 +ε)ε+ (1 +ε)1(1 +ε)

= (1 +ε)(ε+ 1(1 +ε)) = (1 +ε)(ε+ 11 + 1ε)

= (1 +ε)(ε+ 1++ 1) = (1 +ε)(1+ 1) = (1 +ε)1

= 11+ 1= 1++ 1= 1++ 1++ε= 1++ε= 1.

(25)

A. Trivedi – 20 of 20

Example

q1

start q2 q3

1

0

0 1 0

1

R1,1 R1,2 R1,3 R2,1 R2,2 R2,3 R3,1 R3,2 R3,3

(0) 1 +ε 0 0 +ε 1 1 0 +ε

(1) 1 10 0 +ε 1 1 0 +ε

(2) 1 100 1001 0 01 10 (0 +ε) + 101

R1,3(3) = R1,3(2)+R1,3(2)(R3,3(2))R3,3(2)

= 1001 + 1001(0 +ε+ 101)(0 +ε+ 101)

= 1001ε+ 1001(0 +ε+ 101)(0 +ε+ 101)

= 1001(ε+ (0 +ε+ 101)(0 +ε+ 101))

= 1001(ε+ (0 + 101)(0 +ε+ 101))

= 1001(ε+ (0 + 101)++ (0 + 101))

= 1001((0 + 101)+ (0 + 101)) = 1001(0 + 101)

A. Trivedi Regular Languages

(26)

J. R. B¨uchi.

Weak second-order arithmetic and finite automata.

Zeitschrift f¨ur Mathematische Logik und Grundlagen der Mathematik, 6(1–6):66–92, 1960.

Noam Chomsky.

On certain formal properties of grammars.

Information and Control, 2(2):137 – 167, 1959.

C. C. Elgot.

Decision problems of finite automata design and related arithmetics.

In Transactions of the American Mathematical Society, 98(1):21–51, 1961.

M. O. Rabin and D. Scott.

Finite automata and their decision problems.

IBM Journal of Research and Developmen, 3(2):114–125, 1959.

B. A. Trakhtenbrot.

Finite automata and monadic second order logic.

Siberian Mathematical Journal, 3:101–131, 1962.

References

Related documents

Moreover, the number of states is the smallest DFA recognizing L is equal to the number of equivalence classes of R

I it can be several words: obtained as a (finite) union of sets of words Can these be used to generate all regular languages?.!. Another view of

We study a class of timed context-sensitive languages called dense-time multistack visibly pushdown languages (dtMVPLs), characterized by dense-time visibly pushdown multistack

– Nondeterministic finite state automata (also with ε-transitions) – Kleene’s regular expressions, also appeared as Type-3 languages in.. Chomsky’s

I Non-deterministic finite state automata (NFA): the Subset construction, worst-case exponential blowup.. I Applications 1: text search

Feels like all non-regular languages needed to remember infinite memory. In {0 n 1 n |n ≥ 0} we need to remember the number of seen 0s and count the 1s

Regular expressions in emacs and lexical analyzers I Search function takes regular expressions.. I Try search

cbna CS310 : Automata Theory 2019 Instructor: Ashutosh Gupta IITB, India 2..