• No results found

Real-Time Search Algorithms

N/A
N/A
Protected

Academic year: 2022

Share "Real-Time Search Algorithms"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Real-Time Search Algorithms

Chintan Shah (06305013)

Faraz Shahbazker (06305008)

Khem Dhaval (06305021)

Manish Aggarwal

(06305005)

(2)

A Real World scenario

A patient in the ICU hooked to numerous monitoring and life supporting devices

Devices senses some type of trauma (shock) in patients conditions

What to do?

Raises an alarm to call for doctor??

But ... in the time taken by doctor to come, the condition of patient may deteriorate

(3)

continue...

If only ...

A RTCS could diagnose the cause of shock, have suggestions ready

Or take some precursor actions like reducing a particular gas in a ventilator.

Convergence of AI and Real-Time system

(4)

Mars Path Finder problem

Agent needs to navigate the terrain to collect samples

Does not know the terrain in advance

Visibility limited by the strength of sensors

Hostile conditions

Can you think of any algorithm that can be

applied here??

(5)

Goal Start

Search in 2D grid

Goal Current

(6)

Interactive Strategy Gaming

In many Real-Time strategy games

resources are scattered in the map

Agent has to travel from start state to some goal state

Each state of map is either passable or occupied by the wall

Commitment to moves in constant time –

Moving the army in AoE

(7)

Observations

Map may be known/unknown

Real-time response is must

Optimum path not really required

(8)

Agenda

MiniMin

Real-Time A*

Learning – RTA*

Conclusion

(9)

Single Agent Search

Examples:

8-puzzle problem

Autonomus navigation in a n/w of roads

Admissible Heuristic search Algorithms:

A*, IDA*

finds optimal path

A* Requires exponential amount of memory

Fails in case of giving quick sub optimal real time answer

(10)

Limitation

Tries to find the optimal solution

Finds the whole path before making even the first move

Can not work “fast enough”

Can not work with limited visibility

(11)

Two Player Games

MiniMax

Move in constant time

Limited search space

Sub optimal Decision

Can we do some thing similar for SAS?

(12)

MiniMin lookahead search

Specialization of MiniMax for single-agent problems

Limited Search depth

determined by available resources

limited by visibility

Planning Mode:

Apply the heuristic function at the leaves

Back up minimum value of children for each node

Execution Mode:

A single(?) move in the direction of the best child

(13)

MiniMin with nonuniform cost

If the operators have nonuniform costs

Use f(n) = g(n) + h(n)

Back up the min. value of f(n)

12 13

h=7

12

h=7 h=9 h=5

g=2

g=4 g=3

g=4 g=5

g=3

g=4

f=16

f=12 f=14 f=16 f=13

(14)

Time Limited A*

Run A* until time runs out

Make a single move in the direction of the best node in the OPEN list

Time exponential in search depth.

Will exhaust the memory if allotted per move is not very small

(15)

Run minimin to select next state and

continue in this manner until the goal state is found

Cycles ?

Exclude previously visited states

All successors of the current state may have been already visited...

Generating a sequence of moves

(16)

RTA* (Real Time A*)

Let s be the current node

h(n) – Found using depth bounded search algo. such as MiniMin, Time Limited A* etc.

g(n) - distance from s (and NOT the distance from the origin)

Choose the best neighbour

Store the f value of 2

nd

best neighbour with

s

(17)

h=6 h=5

h=10 h=11

h=6 h=5

h=10

h=11

h=12 h=12

h=10 h=11

h=6 h=12

h=10

h=11 h=11

h=11 h=11

(18)

RTA*

1. s = S

start

2. If s ∈ G, then return success

3. a = one_of ( arg_min (h, succ (s))) 4. b = one_of ( arg_min_2 (h, succ (s))) 5. h(s) = 1+h(b)

6. Execute action for a 7. s = a

8. goto 2

(19)

Observations

Requires memory linear in no. of EXECUTED moves

Look ahead requires constant time because of limited search depth

Execution interleaved with planning

(20)

Learning and Search

Learn ... “how to search?”

Different problem instances in the same search space

Different Start states, same Goal state

Improve with each successful search

How to apply previously acquired knowledge

(+ve/-ve) to find Goal-state?

(21)

Learning and Search

Naïve Approach

Remember all optimal paths found so far.

Once you come across an optimal path, follow previously saved trail to the Goal

Learning or Memorization?

General Approach

Learn the heuristic function

h(x) should become more precise ( / less approximate) with every successful search

h(x) => h*(x)

(22)

Learning in vanilla A*

Optimal Path Memorization

Hash-table of know nodes

Insert each node on optimal path along with it's next pointer.

New data structure (hash-table) needed for persistent memory

Can vouch only for nodes on optimal path, but with full confidence

No learning from explored nodes outside optimal path – negative examples are ignored.

(23)

Learning in vanilla A*

Improve the Heuristic ... but how?

Apply ML techniques to learn h(x)

each successful search adds to the training set by providing actual values of h*(x)

use the revised heuristic for next search

Simple Revision

update h(x) to h*(x) for know paths

Danger: unknown siblings who under-

approximate h*(x) appear more attractive

Incremental Updates

Step by step revision of h values

All h values (globally) converge to h*(x)

(24)

Why is learning so important?

Fits in well with our real-world notions

first survive, then thrive

Find a satisficing solution as soon as possible

Try to improve/optimize at leisure

Most realtime search algorithms settle for non- optimal results

Each subsequent execution has to pay a big price for the initial fast search

With learning, this cost decreases over time

It's easy ... at least for RTA*

(25)

Learning RTA*

In RTA* we already have a hash-table

We store all explored nodes with (2nd-best) heuristic

Make hash-table persistent across searches

Remember previously visited nodes and

Remember heuristic calculated at previous visit

But ...using the 2nd-best heuristic for next search can cause problems

A good node may appear less attractive than it's siblings because (2nd best > best)

(26)

Consequence of using 2

nd

Best h(n)

S1

Goal b

a

S2

c

d

?? h(c)=8

h*(c)=20

h(d)=9

h(b) = (9+1)=10 h*(b)=12

e

h(e)=6

(27)

What if we use Best h(n)?

S1

Goal b

a

S2

c

d

?? h(c)=8

h*(c)=20

h(d)=9

h(b) = (6+1) = 7 h*(b)=12

e

h(e)=6

(28)

Learning RTA* - Algorithm

1. s := S

start

2. If s ∈ G, then return success

3. a = one_of ( arg_min (h, succ (s))) 4. h(s) = max (h(s), 1+h(a))

5. Execute action a

6. current_state, s = a

7. goto 2

(29)

2

nd

Best vs. Best

Conservative approach

Bad approximation of h(n)

Less commitment (prone to back-

tracking)

Reaches solution slowly

Wander around

unexplored areas and learn more about the terrain

Aggressive approach

Better approximation of h(n)

Progressive and committed(move forward)

Reaches solution quickly

Learn less about

general terrain

(30)

4

5

6 10

To Goal State 6

Example of RTA*

11

4

5

6 10

To Goal State 6

11

7

5

6 10

To Goal State 6

11

7

8

6 10

To Goal State 6

(31)

Example of LRTA*

4

5

6 10

To Goal State 6

5

4

5

6 10

To Goal State 6

5

6

5

6 10

To Goal State 6

5

6

6

6 10

To Goal State 6

Prefers to move back  rather than forward

(32)

Example of LRTA* (contd.)

7

6

6

6 10

To Goal State 6

7

7

6

6 10

To Goal State 6

7

7

6

8 10

To Goal State 6

7

7

6

8 10

To Goal State 6

See how a new path is  explored here

(33)

Example of LRTA* (contd.)

7

7

7

8 10

To Goal State 6

7

7

7

8 10

To Goal State 7

Cycle is broken. Move forward.

We had gone backwards, last  time we were here!!

(34)

11

7

8

6 10

To Goal State 6

Observations

Final state of LRTA*

7

7

7

8 10

To Goal State 7

Final state of RTA*

Reached goal in 5 steps

Did not explore extra path

Moves forward quickly

h-values rise suddenly and only along the exact path

Reached goal in 10 steps

Explored the extra path (shown in bold)

Reluctant to move forward

h-values rise slowly and

uniformly all around the actual path => algo “learns” h(n)

(35)

Conclusions

Known search method are inadequate for certain applications

Key Issues

Limited Visibility

Real-time action required

Solutions:

Forfeit optimality

Make locally optimal decisions

Interleave search(planning) and movement

MiniMin – limited depth planning phase

RTA* - Interleave planning and execution

LRTA* - Learn from previous searches

(36)

References

BL06. Vadim Bulitko and Greg Lee. Learning in Real Time 

Search: A Unifying Framework. Journal of Artificial Intelligence  Research (JAIR), 25:119 ­ 157, 2006. 

KL06. Sven Koenig and Maxim Likhachev. Real­time adaptive a*. 

 In AAMAS '06: Proceedings of the fifth international joint 

conference on Autonomous agents and multiagent systems, pages  281­288, New York, NY, USA, 2006. ACM. 

Koe01. Sven Koenig.  Minimax real­time heuristic search. Artif. 

Intell., 129(1­2):165­197, 2001. 

(37)

References (Contd.)

Koe04. Sven Koenig. A comparison of fast search

methods for real-time situated agents. In Proceedings of  the Third International Joint Conference on Autonomous Agents  and Multiagent Systems, volume 2, pages 864-871. IEEE

Computer Society Washington, DC, USA, 2004.

Kor90. Richard E. Korf. Real-time heuristic search. Artif. 

Intell., 42(2-3):189-211, 1990.

MHA+94. David John Musliner, James Hendler, Ashok K.

Agrawala, Edmund H. Durfee, Jay K. Strosnider, and C. J.

Paul. The challenges of real-time AI. Technical Report CS- TR-3290, 1994.

References

Related documents

Each node contains only keys and no pointers Nodes are stored level by level from left to right Arithmetic operations on offsets to find child nodes Better Search Performance and

◦ The different types of reinforcement that are used for effective learning are—positive reinforcement, negative reinforcement, punishment and extinction. ◦ In positive

• At each time-step t, only retain those nodes in the time-state trellis that are within a fixed threshold δ (beam width) of the best path. • Given active nodes from the

Building on existing studies of the French feebate (D’Haultfœuille, Givord and Boutin, 2014 [2] ; Durrmeyer, 2018 [4] ), the present paper discusses and attempts to quantify the full

teachers have no option but to embrace ICT with a positive attitude and make optimal use of it in teaching-learning processes in the classroom and beyond .For this it is

Tuned LMS, Hybrid learning, RLS, harmony search and modified harmony search are the algorithms used to detect the angle of arrival and interference for multiple users..

The main aim of the proposed research work is to provide an approach for designing a path planning methodology for UGV in a complex environment with multiple

4 Optimal rates for multi-penalty regularization based on Nystr¨ om type subsampling 75 4.1 Convergence