Real-Time Search Algorithms
●
Chintan Shah (06305013)
●
Faraz Shahbazker (06305008)
●
Khem Dhaval (06305021)
●
Manish Aggarwal
(06305005)
A Real World scenario
●
A patient in the ICU hooked to numerous monitoring and life supporting devices
●
Devices senses some type of trauma (shock) in patients conditions
●
What to do?
– Raises an alarm to call for doctor??
– But ... in the time taken by doctor to come, the condition of patient may deteriorate
continue...
●
If only ...
– A RTCS could diagnose the cause of shock, have suggestions ready
– Or take some precursor actions like reducing a particular gas in a ventilator.
●
Convergence of AI and Real-Time system
Mars Path Finder problem
●
Agent needs to navigate the terrain to collect samples
● Does not know the terrain in advance
● Visibility limited by the strength of sensors
● Hostile conditions
●
Can you think of any algorithm that can be
applied here??
Goal Start
Search in 2D grid
Goal Current
Interactive Strategy Gaming
●
In many Real-Time strategy games
– resources are scattered in the map
– Agent has to travel from start state to some goal state
●
Each state of map is either passable or occupied by the wall
●
Commitment to moves in constant time –
Moving the army in AoE
Observations
●
Map may be known/unknown
●
Real-time response is must
●
Optimum path not really required
Agenda
●
MiniMin
●
Real-Time A*
●
Learning – RTA*
●
Conclusion
Single Agent Search
● Examples:
– 8-puzzle problem
– Autonomus navigation in a n/w of roads
● Admissible Heuristic search Algorithms:
– A*, IDA*
– finds optimal path
– A* Requires exponential amount of memory
– Fails in case of giving quick sub optimal real time answer
Limitation
● Tries to find the optimal solution
● Finds the whole path before making even the first move
– Can not work “fast enough”
● Can not work with limited visibility
Two Player Games
● MiniMax
– Move in constant time
– Limited search space
– Sub optimal Decision
● Can we do some thing similar for SAS?
MiniMin lookahead search
● Specialization of MiniMax for single-agent problems
● Limited Search depth
– determined by available resources
– limited by visibility
● Planning Mode:
– Apply the heuristic function at the leaves
– Back up minimum value of children for each node
● Execution Mode:
– A single(?) move in the direction of the best child
MiniMin with nonuniform cost
● If the operators have nonuniform costs
● Use f(n) = g(n) + h(n)
● Back up the min. value of f(n)
12 13
h=7
12
h=7 h=9 h=5
g=2
g=4 g=3
g=4 g=5
g=3
g=4
f=16
f=12 f=14 f=16 f=13
Time Limited A*
● Run A* until time runs out
● Make a single move in the direction of the best node in the OPEN list
● Time exponential in search depth.
● Will exhaust the memory if allotted per move is not very small
●
Run minimin to select next state and
continue in this manner until the goal state is found
●
Cycles ?
●
Exclude previously visited states
●
All successors of the current state may have been already visited...
Generating a sequence of moves
RTA* (Real Time A*)
●
Let s be the current node
●
h(n) – Found using depth bounded search algo. such as MiniMin, Time Limited A* etc.
●
g(n) - distance from s (and NOT the distance from the origin)
●
Choose the best neighbour
●
Store the f value of 2
ndbest neighbour with
s
h=6 h=5
h=10 h=11
h=6 h=5
h=10
h=11
h=12 h=12
h=10 h=11
h=6 h=12
h=10
h=11 h=11
h=11 h=11
RTA*
1. s = S
start2. If s ∈ G, then return success
3. a = one_of ( arg_min (h, succ (s))) 4. b = one_of ( arg_min_2 (h, succ (s))) 5. h(s) = 1+h(b)
6. Execute action for a 7. s = a
8. goto 2
Observations
●
Requires memory linear in no. of EXECUTED moves
●
Look ahead requires constant time because of limited search depth
●
Execution interleaved with planning
Learning and Search
●
Learn ... “how to search?”
●
Different problem instances in the same search space
●
Different Start states, same Goal state
●
Improve with each successful search
●
How to apply previously acquired knowledge
(+ve/-ve) to find Goal-state?
Learning and Search
● Naïve Approach
– Remember all optimal paths found so far.
– Once you come across an optimal path, follow previously saved trail to the Goal
– Learning or Memorization?
● General Approach
– Learn the heuristic function
– h(x) should become more precise ( / less approximate) with every successful search
– h(x) => h*(x)
Learning in vanilla A*
● Optimal Path Memorization
– Hash-table of know nodes
– Insert each node on optimal path along with it's next pointer.
● New data structure (hash-table) needed for persistent memory
● Can vouch only for nodes on optimal path, but with full confidence
● No learning from explored nodes outside optimal path – negative examples are ignored.
Learning in vanilla A*
● Improve the Heuristic ... but how?
– Apply ML techniques to learn h(x)
● each successful search adds to the training set by providing actual values of h*(x)
● use the revised heuristic for next search
– Simple Revision
● update h(x) to h*(x) for know paths
● Danger: unknown siblings who under-
approximate h*(x) appear more attractive
– Incremental Updates
● Step by step revision of h values
● All h values (globally) converge to h*(x)
Why is learning so important?
● Fits in well with our real-world notions
first survive, then thrive
– Find a satisficing solution as soon as possible
– Try to improve/optimize at leisure
● Most realtime search algorithms settle for non- optimal results
● Each subsequent execution has to pay a big price for the initial fast search
● With learning, this cost decreases over time
● It's easy ... at least for RTA*
Learning RTA*
● In RTA* we already have a hash-table
● We store all explored nodes with (2nd-best) heuristic
● Make hash-table persistent across searches
– Remember previously visited nodes and
– Remember heuristic calculated at previous visit
● But ...using the 2nd-best heuristic for next search can cause problems
– A good node may appear less attractive than it's siblings because (2nd best > best)
Consequence of using 2
ndBest h(n)
S1
Goal b
a
S2
c
d
?? h(c)=8
h*(c)=20
h(d)=9
h(b) = (9+1)=10 h*(b)=12
e
h(e)=6
What if we use Best h(n)?
S1
Goal b
a
S2
c
d
?? h(c)=8
h*(c)=20
h(d)=9
h(b) = (6+1) = 7 h*(b)=12
e
h(e)=6
Learning RTA* - Algorithm
1. s := S
start2. If s ∈ G, then return success
3. a = one_of ( arg_min (h, succ (s))) 4. h(s) = max (h(s), 1+h(a))
5. Execute action a
6. current_state, s = a
7. goto 2
2
ndBest vs. Best
●
Conservative approach
●
Bad approximation of h(n)
●
Less commitment (prone to back-
tracking)
●
Reaches solution slowly
●
Wander around
unexplored areas and learn more about the terrain
●
Aggressive approach
●
Better approximation of h(n)
●
Progressive and committed(move forward)
●
Reaches solution quickly
●
Learn less about
general terrain
4
5
6 10
To Goal State 6
Example of RTA*
11
4
5
6 10
To Goal State 6
11
7
5
6 10
To Goal State 6
11
7
8
6 10
To Goal State 6
Example of LRTA*
4
5
6 10
To Goal State 6
5
4
5
6 10
To Goal State 6
5
6
5
6 10
To Goal State 6
5
6
6
6 10
To Goal State 6
Prefers to move back rather than forward
Example of LRTA* (contd.)
7
6
6
6 10
To Goal State 6
7
7
6
6 10
To Goal State 6
7
7
6
8 10
To Goal State 6
7
7
6
8 10
To Goal State 6
See how a new path is explored here
Example of LRTA* (contd.)
7
7
7
8 10
To Goal State 6
7
7
7
8 10
To Goal State 7
Cycle is broken. Move forward.
We had gone backwards, last time we were here!!
11
7
8
6 10
To Goal State 6
Observations
Final state of LRTA*
7
7
7
8 10
To Goal State 7
Final state of RTA*
● Reached goal in 5 steps
● Did not explore extra path
● Moves forward quickly
● h-values rise suddenly and only along the exact path
● Reached goal in 10 steps
● Explored the extra path (shown in bold)
● Reluctant to move forward
● h-values rise slowly and
uniformly all around the actual path => algo “learns” h(n)
Conclusions
● Known search method are inadequate for certain applications
– Key Issues
● Limited Visibility
● Real-time action required
– Solutions:
● Forfeit optimality
● Make locally optimal decisions
● Interleave search(planning) and movement
● MiniMin – limited depth planning phase
● RTA* - Interleave planning and execution
● LRTA* - Learn from previous searches
References
● BL06. Vadim Bulitko and Greg Lee. Learning in Real Time
Search: A Unifying Framework. Journal of Artificial Intelligence Research (JAIR), 25:119 157, 2006.
● KL06. Sven Koenig and Maxim Likhachev. Realtime adaptive a*.
In AAMAS '06: Proceedings of the fifth international joint
conference on Autonomous agents and multiagent systems, pages 281288, New York, NY, USA, 2006. ACM.
● Koe01. Sven Koenig. Minimax realtime heuristic search. Artif.
Intell., 129(12):165197, 2001.
References (Contd.)
●Koe04. Sven Koenig. A comparison of fast search
methods for real-time situated agents. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, volume 2, pages 864-871. IEEE
Computer Society Washington, DC, USA, 2004.
●Kor90. Richard E. Korf. Real-time heuristic search. Artif.
Intell., 42(2-3):189-211, 1990.
●MHA+94. David John Musliner, James Hendler, Ashok K.
Agrawala, Edmund H. Durfee, Jay K. Strosnider, and C. J.
Paul. The challenges of real-time AI. Technical Report CS- TR-3290, 1994.