• No results found

Learning to Use Learning in Verification

N/A
N/A
Protected

Academic year: 2022

Share "Learning to Use Learning in Verification"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)

Learning to Use Learning in Verification

Jan Kˇret´ınsk´y

Technische Universit ¨at M ¨unchen, Germany

joint work with T. Br ´azdil (Masaryk University Brno),

K. Chatterjee, M. Chmel´ık, P. Daca, A. Fellner, T. Henzinger, T. Petrov (IST Austria), V. Forejt, M. Kwiatkowska, M. Ujma (Oxford University)

D. Parker (University of Birmingham) published at ATVA 2014, CAV 2015, TACAS 2016

Mysore Park Workshop

Trends and Challenges in Quantitative Verification February 3, 2016

(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable

precise computation

focus on important stuff different objectives

(11)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable precise computation

focus on important stuff

MEM-OUT

different objectives

(12)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable precise computation

focus on important stuff

different objectives

(13)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable precise computation

focus on important stuff

Decision tree, capturing which ACTION is ok to play in the current state

ACTION = rec

ACTION = l>0&b=1&ip mess=1 ->

b’=0&z’=0&n1’=min(n1+1,8)&ip mess’=0

z≤0

Y N

Y N

Y N

different objectives

(14)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable

precise computation

focus on important stuff different objectives

(15)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable

precise computation

focus on important stuff

different objectives

(16)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable

precise computation

focus on important stuff different objectives

(17)

Approaches and their interaction

3/16

Formal methods

I precise

I scalability issues

Learning

I weaker guarantees

I scalable

precise computation

focus on important stuff

different objectives

(18)

What problems?

4/16

I Verification

I (ε)-optimality→−? PAC

I hard guarantees→−? probably correct

I Controller synthesis

I convergence is preferable

I at least probably correct?

I Synthesis

(19)

Markov decision processes

5/16

(S,s0∈S,A,∆ :S→A → D(S))

init

p. . .

s · · ·v1

t err

up

1

down 0.01

0.99

b

0.5 0.5 a 1

c 1

max

strategyσPσ[Reach err]

ACTION = down

Y N

(20)

Markov decision processes

5/16

(S,s0∈S,A,∆ :S→A → D(S))

init

p. . .

s · · ·v1

t err

up

1

down 0.01

0.99

b

0.5 0.5 a 1

c 1

max

strategyσPσ[Reach err]

ACTION = down

Y N

(21)

Markov decision processes

5/16

(S,s0∈S,A,∆ :S→A → D(S))

init

p. . .

s · · ·v1

t err

up

1

down 0.01

0.99

b

0.5 0.5 a 1

c 1 down 0.01

0.99

b

0.5 0.5

c 1

max

strategyσPσ[Reach err]

ACTION = down

Y N

(22)

Markov decision processes

5/16

(S,s0∈S,A,∆ :S→A → D(S))

init

p. . .

s · · ·v1

t err

up

1

down 0.01

0.99

b

0.5 0.5 a 1

c 1 down 0.01

0.99

b

0.5 0.5

c 1

max

strategyσPσ[Reach err]

ACTION = down

Y N

(23)

Markov decision processes

5/16

(S,s0∈S,A,∆ :S→A → D(S))

init

p. . .

s · · ·v1

t err

up

1

down 0.01

0.99

b

0.5 0.5 a 1

c 1 down 0.01

0.99

b

0.5 0.5

c 1

max

strategyσPσ[Reach err]

ACTION = down

Y N

(24)

Markov decision processes

5/16

(S,s0∈S,A,∆ :S→A → D(S))

init

p. . .

s · · ·v1

t err

up

1

down 0.01

0.99

b

0.5 0.5 a 1

c 1 down 0.01

0.99

max

strategyσPσ[Reach err]

ACTION = down

Y N

(25)

Markov decision processes

5/16

(S,s0∈S,A,∆ :S→A → D(S))

init

p. . .

s · · ·v1

t err

up

1

down 0.01

0.99

b

0.5 0.5 a 1

c 1 down 0.01

0.99

max

strategyσPσ[Reach err]

ACTION = down

Y N

(26)

Ex.1: Computing strategies faster: How?

6/16

Reinforcement learning

Fixed-point computation V(s):= max

a∈∆(s)

V(s,a) V(s,a):=X

s0S

∆(s,a,s0)·V(s0)

Order of evaluation?

[ATVA’14]

More frequently evaluate those states that arevisitedmore frequently

by reasonably good schedulers

(27)

Ex.1: Computing strategies faster: How?

6/16

Reinforcement learning

Fixed-point computation V(s):= max

a∈∆(s)

V(s,a) V(s,a):=X

s0S

∆(s,a,s0)·V(s0) Order of evaluation?

[ATVA’14]

More frequently evaluate those states that arevisitedmore frequently

by reasonably good schedulers

(28)

Ex.1: Computing strategies faster: How?

6/16

Reinforcement learning

Fixed-point computation V(s):= max

a∈∆(s)

V(s,a) V(s,a):=X

s0S

∆(s,a,s0)·V(s0)

Order of evaluation? [ATVA’14]

More frequently evaluate those states that arevisitedmore frequently

by reasonably good schedulers

(29)

Ex.1: Computing strategies faster: How?

6/16

Reinforcement learning

Fixed-point computation V(s):= max

a∈∆(s)

V(s,a) V(s,a):=X

s0S

∆(s,a,s0)·V(s0)

Order of evaluation? [ATVA’14]

More frequently evaluate those states that arevisitedmore frequently by reasonably good schedulers

(30)

Ex.1: Computing strategies faster: How?

6/16

Reinforcement learning

Fixed-point computation V(s):= max

a∈∆(s)

V(s,a) V(s,a):=X

s0S

∆(s,a,s0)·V(s0)

Order of evaluation? [ATVA’14]

More frequently evaluate those states that arevisitedmore frequently by reasonably good schedulers

(31)

Ex.1: Computing strategies faster: Algorithm

7/16

1: U(·,·)←1,L(·,·)←0

2: L(1,·)←1,U(0,·)←0

3: repeat

4: sample a path froms0to{1,0}

.actions uniformly from arg max

a

U(s,a) .states according to∆(s,a,s0)

5: for allvisited transitions(s,a,s0)do

6: Update(s,a,s0)

7: untilU(s0)−L(s0)<

——————————————————————————

1: procedureUpdate(s,a,·)

2: U(s,a):=P

s0S∆(s,a,s0)·U(s0)

3: L(s,a):=P

s0S∆(s,a,s0)·L(s0)

(32)

Ex.1: Computing strategies faster: Algorithm

7/16

1: U(·,·)←1,L(·,·)←0

2: L(1,·)←1,U(0,·)←0

3: repeat

4: sample a pathfroms0to{1,0}

.actions uniformly from arg max

a

U(s,a) .states according to∆(s,a,s0)

5: for allvisited transitions(s,a,s0)do

6: Update(s,a,s0)

7: untilU(s0)−L(s0)<

——————————————————————————

1: procedureUpdate(s,a,·)

2: U(s,a):=P

s0S∆(s,a,s0)·U(s0)

3: L(s,a):=P

s0S∆(s,a,s0)·L(s0)

(33)

Ex.1: Computing strategies faster: Algorithm

7/16

1: U(·,·)←1,L(·,·)←0

2: L(1,·)←1,U(0,·)←0

3: repeat

4: sample a path froms0to{1,0}

.actions uniformly from arg max

a

U(s,a) .states according to∆(s,a,s0)

5: for allvisited transitions(s,a,s0)do

6: Update(s,a,s0)

7: untilU(s0)−L(s0)<

——————————————————————————

1: procedureUpdate(s,a,·)

2: U(s,a):=P

s0S∆(s,a,s0)·U(s0)

3: L(s,a):=P

s0S∆(s,a,s0)·L(s0)

(34)

Ex.1: Computing strategies faster: Algorithm

7/16

1: U(·,·)←1,L(·,·)←0

2: L(1,·)←1,U(0,·)←0

3: repeat

4: sample a path froms0to{1,0}

.actions uniformly from arg max

a

U(s,a) .states according to∆(s,a,s0)

5: for allvisited transitions(s,a,s0)do

6: Update(s,a,s0)

7: untilU(s0)−L(s0)<

——————————————————————————

1: procedureUpdate(s,a,·)

2: U(s,a):=P

s0S∆(s,a,s0)·U(s0)

3: L(s,a):=P

s0S∆(s,a,s0)·L(s0)

(35)

Ex.1: Computing strategies faster

8/16

Reinforcement Learning

Value Iteration important parts of the system

faster & sure updates

Guaranteed upper & lower bounds at all times + practically fast convergence

Remark:

I PAC SMC for MDP and (unbounded) LTL[ATVA’14]:|S|,pmin I practical PAC SMC for MC and (unbounded) LTL + mean payoff

[TACAS’16]:pmin

(36)

Ex.1: Computing strategies faster

8/16

Reinforcement Learning

Value Iteration important parts of the system

faster & sure updates

Guaranteed upper & lower bounds at all times + practically fast convergence Remark:

I PAC SMC for MDP and (unbounded) LTL[ATVA’14]:|S|,pmin I practical PAC SMC for MC and (unbounded) LTL + mean payoff

[TACAS’16]:pmin

(37)

Ex.1: Experimental results

9/16

Example Visited states PRISM BRTDP zeroconf

3,001,911 760 4,427,159 977 5,477,150 1411 wlan

345,000 2018 1,295,218 2053 5,007,548 1995 firewire

6,719,773 26,508 13,366,666 25,214 19,213,802 32,214

mer

17,722,564 1950 17,722,564 2902 26,583,064 1950 26,583,064 2900

(38)

Further examples on reinforcement learning

10/16

Sebastian Junges, Nils Jansen, Christian Dehnert, Ufuk Topcu, Joost-Pieter Katoen:

Safety-Constrained Reinforcement Learning for MDPs.TACAS 2016.

I safe and cost-optimizing strategies

I (1) compute safe, permissive strategies

I (2) learn cost-optimal strategies (convergence) among them

Alexandre David, Peter Gjl Jensen, Kim Guldstrand Larsen, Axel Legay, Didier Lime, Mathias Grund Srensen, Jakob Haahr Taankvist:

On Time with Minimal Expected Cost!ATVA 2014.

I priced timed games→priced timed MDPs

I time-bounded cost-optimal (convergence) strategies

I (1) Uppaal TiGa for safe strategies

I (2) Uppaal SMC and learning for cost-optimal strategies

(39)

Ex.2: Computing small strategies: Which decisions?

11/16

Importance of a nodeswith respect to^target and strategyσ:

Pσ[^s

|^target

]

Cut off states withzeroimportance (unreachable or useless) Cut off states withlowimportance (small error,ε-optimal strategy)

(40)

Ex.2: Computing small strategies: Which decisions?

11/16

Importance of a nodeswith respect to^target and strategyσ:

Pσ[^s|^target]

Cut off states withzeroimportance (unreachable or useless) Cut off states withlowimportance (small error,ε-optimal strategy)

(41)

Ex.2: Computing small strategies: Which decisions?

11/16

Importance of a nodeswith respect to^target and strategyσ:

Pσ[^s|^target]

Cut off states withzeroimportance (unreachable or useless)

Cut off states withlowimportance (small error,ε-optimal strategy)

(42)

Ex.2: Computing small strategies: Which decisions?

11/16

Importance of a nodeswith respect to^target and strategyσ:

Pσ[^s|^target]

Cut off states withzeroimportance (unreachable or useless) Cut off states withlowimportance (small error,ε-optimal strategy)

(43)

Ex.2: Small strategies: Which representation?

12/16

How to make use of the exactimportance?

Learndecisions insin proportion to importance ofs Advantages of decision trees over BDDs:

I more readable: predicates

I smaller due to good ordering: entropy

I yet smaller at a cost of an error: pruning

(44)

Ex.2: Small strategies: Which representation?

12/16

How to make use of the exactimportance?

Learndecisions insin proportion to importance ofs

Advantages of decision trees over BDDs:

I more readable: predicates

I smaller due to good ordering: entropy

I yet smaller at a cost of an error: pruning

(45)

Ex.2: Small strategies: Which representation?

12/16

How to make use of the exactimportance?

Learndecisions insin proportion to importance ofs Advantages of decision trees over BDDs:

I more readable: predicates

I smaller due to good ordering: entropy

I yet smaller at a cost of an error: pruning

(46)

Ex.2: Small strategies: Which representation?

12/16

How to make use of the exactimportance?

Learndecisions insin proportion to importance ofs Advantages of decision trees over BDDs:

I more readable: predicates

I smaller due to good ordering: entropy

I yet smaller at a cost of an error: pruning

(47)

Ex.2: Experimental results

13/16

Example #states Value Explicit BDD DT Rel.err(DT) %

firewire 481,136 1.0 479,834 4233 1 0.0

investor 35,893 0.958 28,151 783 27 0.886

mer 1,773,664 0.200016 ——— MEM-OUT ——— *

zeroconf 89,586 0.00863 60,463 409 7 0.106

* MEM-OUT in PRISM,

whereas BRTDP yields: 1887 619 13 0.00014

(48)

Ex.2: Experimental results

13/16

Example #states Value Explicit BDD DT Rel.err(DT) %

firewire 481,136 1.0 479,834 4233 1 0.0

investor 35,893 0.958 28,151 783 27 0.886

mer 1,773,664 0.200016 ——— MEM-OUT ——— *

zeroconf 89,586 0.00863 60,463 409 7 0.106

* MEM-OUT in PRISM,

whereas BRTDP yields: 1887 619 13 0.00014

(49)

Further examples on decision trees

14/16

Pranav Garg, Daniel Neider, P. Madhusudan, Dan Roth:

Learning Invariants using Decision Trees and Implication Counterexamples.POPL 2016.

I positive examples from runs of the program

I negative examples from modifications

I implication examples

Siddharth Krishna, Christian Puhrsch, Thomas Wies:

Learning Invariants Using Decision Trees.

I positive examples: states reachable when preconditions holds

I negative examples: states leaving loop and violating a postcondition

(50)

Summary

15/16

Machine learning in verification

I Scalable heuristics

I Example 1:Speeding upvalue iteration

I technique:reinforcement learning, BRTDP

I idea: focus on updating“most important parts”

= most often visited by good strategies I Example 2:Small and readable strategies

I technique:decision tree learning

I idea: based on theimportance of states, feed the decisions to the learning algorithm

(51)

Discussion

16/16

Verification using machine learning

I How far do we want to compromise?

I Do we have to compromise?

I BRTDP, invariant generation, strategy representation don’t I Don’t we want more than ML?

I (ε-)optimal controllers?

I arbitrary controllers – is it still verification?

I What do we actually want?

I scalability shouldnt overrule guarantees?

I SMC should be PAC; when is it enough?

I Oracle usage seems fine

Thank you

(52)

Discussion

16/16

Verification using machine learning

I How far do we want to compromise?

I Do we have to compromise?

I BRTDP, invariant generation, strategy representation don’t I Don’t we want more than ML?

I (ε-)optimal controllers?

I arbitrary controllers – is it still verification?

I What do we actually want?

I scalability shouldnt overrule guarantees?

I SMC should be PAC; when is it enough?

I Oracle usage seems fine

Thank you

References

Related documents

The learning curves illustrate that our rule based sampling approach is much efficient in comparison to other sampling techniques for binary and multi class datasets. The

Function Definition Source file..

The Wadhwani Foundation serves as programme manager and knowledge partner to the Governments, contributing to: selection of schools; selection of vocational courses to provide

Providing cer- tainty that avoided deforestation credits will be recognized in future climate change mitigation policy will encourage the development of a pre-2012 market in

 A huge subject that includes the physics of space and atmosphere, of the oceans, the interior of the Earth and planets.  At the heart of geophysics is the theory of

In the second half of 2020, a significant increase in needs due in the economic situation in the Islamic Republic of Iran, amplified by COVID-19, coupled with insufficient

• Using effective learning strategies to process and learn new material. • Monitoring one’s own knowledge and

INDEPENDENT MONITORING BOARD | RECOMMENDED ACTION.. Rationale: Repeatedly, in field surveys, from front-line polio workers, and in meeting after meeting, it has become clear that