• No results found

Novel strategies in reinforcement learning for the design of robust control systems

N/A
N/A
Protected

Academic year: 2023

Share "Novel strategies in reinforcement learning for the design of robust control systems"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

NOVEL STRATEGIES IN REINFORCEMENT

LEARNING FOR THE DESIGN OF ROBUST CONTROL SYSTEMS

               

HITESHKUMAR BHUPENDRABHAI SHAH

 

   

DEPARTMENT OF ELECTRICAL ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY DELHI

INDIA

 

DECEMBER 2011

(2)

                             

© Indian Institute of Technology Delhi (IITD), New Delhi, 2011.

 

(3)

NOVEL STRATEGIES IN REINFORCEMENT

LEARNING FOR THE DESIGN OF ROBUST CONTROL SYSTEMS

     

by

 

HITESHKUMAR BHUPENDRABHAI SHAH

DEPARTMENT OF ELECTRICAL ENGINEERING

 

Submitted

in fulfillment of the requirements of the degree of

DOCTOR OF PHILOSOPHY

   

to the

   

INDIAN INSTITUTE OF TECHNOLOGY DELHI

 

DECEMBER 2011

(4)

Certificate

This is to certify that the thesis entitled, “Novel Strategies in Reinforcement Learning for the Design of Robust Control Systems”, being submitted by Hiteshkumar Bhupendrabhai Shah to the Department of Electrical Engineering, Indian Institute of Technology Delhi, for the award of the degree of Doctor of Philosophy is the record of bona-fide research work carried out by him under my supervision. In my opinion, the thesis has reached the standards of fulfilling the requirements of the regulations relating to the degree.

The results obtained here in have not been submitted to any other University or Institute for the award of any degree or diploma.

(Prof. M. Gopal) Thesis Supervisor

Department of Electrical Engineering Indian Institute of Technology Delhi Hauz Khas, New Delhi, 110016 India

(5)

ii

Acknowledgements

Any proven piece of work remains incomplete if gratitude and respect is not deemed to those who have been supportive during the development period. Though words do not give justice, may they be a humble tribute of respect and admiration.

I pay my tributes to the ALMIGHTY BHAGWAN SWAMINARAYAN, H.D.H.

Guru Pramukh Swami Maharaj and the Sadhus at Swaminarayan Akshardham in New Delhi, for blessing me along in this work and future endeavors.

It gives me great pleasure to take this opportunity to thank and express my deep sense of gratitude and appreciation to my supervisor Prof. M. Gopal, for offering me the opportunity to discover the world of machine learning – a reinforcement learning in particular. Sir provided me with the ‘right from scratch’ support in research problem formulation and in tackling problems that came on both motivationally and logistically.

He taught me the finer details of research and encouraged me to perform beyond my own capabilities. Sir rescued me from many a difficult situations through his enormous experience and knowledge. My demands on his valuable time, manuscript writing, and discussions have been quite phenomenal. However, he has been always been generous with his time, a constant source of inspiration, and exceptionally support. More than just my thesis advisor, he has been my friend, philosopher and guide in the truest sense. I can only summarize his contribution by saying that he will always be a true role-model for me in pursuing my future academic goals.

I am also extremely grateful to my SRC members - Prof. Suresh Chandra, Prof. A.

N. Jha, Prof. I. N. Kar and Dr. S. Janardhanan - for their helpful suggestions and endured patience during my research progress presentations. I do convey my sincere gratitude and respect to Prof. B. Chandra, and Prof. R K Bhatt, who have taught me all the relevant courses at IIT Delhi.

(6)

I owe my thanks to the Board of Management, G. H. Patel College of Engineering and Technology and especially to Dr. C. L. Patel - Chairman of Charutar Vidya Mandal in Vallabh Vidyanagar, Gujarat, for sponsoring me to pursue my Ph.D. program at IIT Delhi.

My sincerest thanks to all my colleagues of the EC department who have given more love, respect, encouragement and acceptance than I could have ever hoped for.

I am grateful to the staff of PG section, Electrical Engineering, and Central Library for their valuable cooperation. I am indebted to the control lab staff members Shri Jaipal Singh and Shri Virender Singh for providing me immense facilities and assistance to carry out my research work. I would also like to thank the Zanskar Hostel caretaker and mess employees for providing a homely environment and preparing lovingly special food. A very special thanks to former and present research scholars for their help – Dr. Rajneesh Sharma, Dr. Arunkumar, Dr. Deepak Adhyaru, Dr. Bharat Sharma, Dr. Mahendra Kumar, C.M.C Krishnan, Ethesham Hassan, Seema Sharma, Pankaj, Paramita, and Neeli Satyanarayana.

I also would like to thank Rohit Khandelwal and Ruchir Shukla for their compassionate suggestions at every pitfall and loving company. Their friendship made me feel a gentle breeze in the sultry atmosphere.

Finally, I would like to express my deepest personal gratitude to my father for teaching me the value of knowledge and time, and of course, for his love. Many thanks to my sister Falguni, brother-in-law Amitkumar, niece Akshara, my entire family and longtime friends for their unconditional support before and during this study.

Thank you to my son Yogi for his encouraging daily prayer and smiles.

Hitesh Shah

(7)

iv

ABSTRACT

In this dissertation, the focus is on the design of control systems, which are robust with respect to external disturbances and modeling uncertainties.

Most controllers are designed not on the physical plant to be controlled, but on a mathematical model of the plant; hence, these controllers often do not perform well on the physical plant and are sometimes unstable. Conventional robust control methods overcome this problem by adding uncertainty to the model. The result is a more general, less aggressive controller, which performs well on both the model and the physical plant.

However, the conventional robust control methods sacrifice some control performance in order to achieve stability.

The conventional robust control exploits significant a priori system knowledge in order to construct a high-performing controller that still guarantees stability. It is fixed and rigid for all time. On the other hand, conventional adaptive control methods typically presuppose only a model structure for the plant a priori; the parameters of this model are altered on-line. However, these schemes are limited in the representational flexibility.

Another approach is to use soft-computing such as neural networks, fuzzy logic, etc., with supervised learning. However, in this machine intelligence approach, modeling and identification procedures for the dynamics of the given uncertain nonlinear system, and the controller design procedures often become time-consuming iterative exercises.

Much of this machine intelligence approach is still an empirical science.

In this dissertation, the emphasis is on a different design philosophy—

Reinforcement Learning. Reinforcement learning (RL) is a machine intelligence approach that emphasizes on learning by the individual from direct interaction with its environment, rather than learning from exemplary supervision or from expert knowledge of the

(8)

environment. Recent advances relating reinforcement learning to dynamic programming are providing solid mathematical foundation.

Work in this dissertation is an attempt towards developing novel strategies in reinforcement learning for the design of robust control systems.

• Despite successful implementation of RL methods to large or continuous state- spaces via various function approximation architectures; their applicability to control problems of interest has remained elusive. Loss of convergence guarantees, choice of an approximator based on various alternatives, etc., are some of the problems. We discuss the bottlenecks that confront RL algorithms when applied for control, and attempts made to overcome them.

• RL based approaches frame controller optimization problem as finding optimal control policy for the environment modeled as a Markov decision process (MDP).

This assumption of modeling the environment severely limits the scope of application of RL methods, as it places a strong constraint on the structure of environment. Further, MDP formalism is quite restrictive in the sense that it allows only a single agent operating in a stationary environment. An alternative framework based on the theory of Markov games, for adaptive-optimal control of unknown nonlinear systems, in presence of external disturbances and uncertainties, gives better results. In Markov game based formulation of controllers, controller- disturber tussle is viewed as a two-player zero-sum Markov game, in which a

‘disturbing’ agent tries to make worst possible disturbance while a ‘control’ agent tries to make best control input. This problem is framed as finding ‘minmax’

solution of a value function. We formulate Markov game based RL control problem for decision making under uncertainty, wherein agent exploits the suboptimalities of the opponent. Use of function approximators in Markov game setup is also thoroughly investigated.

• Control systems generally operate online. Mapping of online learning methods to RL control problems has not been given much attention in the literature. In this thesis, we propose to examine in depth this aspect of control based on RL. We aim to investigate the existing methods and develop new methods for online learning in RL control applications.

(9)

vi

• Model predictive control (MPC) is the most popular advanced control technique in the process industry. The essence of MPC is to optimize, over the manipulable inputs, forecasts of process behavior. The forecasting is accomplished with process model, and therefore, the model is the essential element of the MPC controller.

However, the difficulties in obtaining a good model of the nonlinear process and the excessive computational burden associated with the control optimization, have been serious obstacles to wide spread use of MPC in industrial implementations.

We have used RL framework, where in the design and on-line learning is not based on a model; rather can be implemented only by evaluative feedback during interaction with the plant. Explicit and exact modeling of system dynamics is not required; and the machine learning algorithm realizes adaptivity to uncertainties, without requiring any prior knowledge. RL framework is, in fact, a means to deal with issues arising in MPC⎯system-model requirement, computational complexity, and suboptimality of actions due to limited horizon over which actions are considered.

In addition to the work related to the above points, our contributions include control using a priori knowledge in the form of an approximate mathematical model of the plant, and on-line model learning for adaptive control of nonlinear systems.

(10)

Contents

Certificate . . .. . . i

Acknowledgements . . . ii

Abstract . . . . . . iv

Contents . . . vii

List of Figures . . . .. . . xiii

List of Tables . . . xvii

1. Introduction 1 1.1 A Reinforcement Learning Formulation for Control. . . . . . 3

1.1.1 Model-free reinforcement learning algorithms. . . . . . . 5

1.1.2 Exploration/exploitation dilemma . . . 8

1.1.3 Q-learning controller . . . .. . . .. . . 9

1.2 Reinforcement Learning and Function Approximation .. . . . . 10

1.3 Game Theory Based Reinforcement Learning . .. . . 12

1.3.1 Markov game framework. . . .. . . 13

1.3.2 Minmax-Q .. . . .. . . .. . . 14

1.3.3 Markov game based controller. . . .. . . 15

1.4 Dissertation Outline. . . . . . 17

2. Value Function Approximation in Reinforcement Learning Control: A Comparative Study 21 2.1 Fuzzy Q-learning. . . . . . .. . . . 22

2.2 Neural Q-learning. . . . . . .. . . 25

(11)

viii

2.3 Decision Tree Q-learning. . . .. . . . . . 27

2.3.1 The decision tree structure. . . .. . . . . . 28

2.3.2 The decision tree realization. . . .. . . . . . 29

2.4 Support Vector Machine Q-learning. . . .. 31

2.5 Empirical Performance Comparison. . . .. 32

2.5.1 Controller learning details. . . 32

2.5.2Simulation results and discussion. . . 35

2.6 Dynamic Fuzzy Q-learning. . . 42

2.6.1 Generation of fuzzy rules. . . 45

2.6.2 The DFQC learning algorithm. . . 48

2.6.3 Simulation results. . . .. . . 48

2.7 Hybridization of Model-based Approach with Reinforcement Learning. . . 50

2.7.1 Related work. . . .. . . 51

2.7.2 Basic concept and architecture. . . 52

2.7.3 Simulation results. . . .. . . 53

2.8 Concluding Remarks. . . .. . . 61

3 Fuzzy Decision Tree Function Approximation in Reinforcement Learning 63 3.1 Background and Motivation. . . 63

3.2 Fuzzy Decision Tree Q-learning. . . 64

3.2.1 A fuzzy decision tree structure. . . 65

3.2.2 The fuzzy decision tree algorithm. . . 66

3.2.3 Fuzzy decision tree realization. . . 69

3.2.4 Empirical performance study. . . 71

3.3 Markov Game Fuzzy Decision Tree Controller (MGFDTC) . . . 81

3.3.1 The MGFDTC algorithm. . . 82

(12)

3.3.2 Pseudo-code for MGFDTC realization. . . 88

3.3.3 Empirical performance study. . . 89

3.4 Concluding Remarks. . . .. . . 96

4 Kernel Recursive Least Squares Function Approximation in Game Theory Based Control with Worst Case Design Strategies for Games Against Nature 97 4.1 Background and Motivation. . . 97

4.2 Q-Learning System based on KRLS-SVM. . . 101

4.3 Markov Game Controller based on KRLS-SVM. . . 102

4.4 On-line KRLS-SVM Learning. . . 105

4.5 Learning Algorithm of KRLS-SVM Controller. . . 107

4.6 Controller Learning Details for Simulation. . . 108

4.7 Simulation Results. . . 110

4.7.1 Learning performance study. . . 110

4.7.2 Robustness study. . . 112

4.8 Concluding Remarks. . . 115

5 Minimal Resource Allocation Neural Network (mRAN) for a Reinforcement Learning Algorithm 117 5.1 Background and Motivation. . . 117

5.2 Approximation of Value Function Using mRAN. . . 120

5.3 mRAN Learning Algorithm. . . 123

5.3.1 Growing step. . . 124

5.3.2 Pruning step. . . 126

5.4 Network Topologies and Learning Details. . . 126

5.4.1 mRAN Q-learning controller (mRANQC) . . . 127

5.4.2 Neural network Q-learning controller (NNQC) . . . 128

(13)

x

5.5 Simulation Results and Discussion. . . 128

5.5.1 Learning performance study. . . 128

5.5.2 Robustness study. . . 131

5.6 Concluding Remarks. . . 134

6 A Reinforcement Learning Algorithm with Evolving Neuro-Fuzzy Networks 135 6.1 Background and Motivation. . . 135

6.2 Neuro-Fuzzy Systems . . . 137

6.2.1 Evolving fuzzy neural network (EFuNN). . . 137

6.2.2 Dynamic evolving fuzzy neural network (DENFIS). . . 140

6.3 Approximation of Value Function Using DENFIS . . . 142

6.4 Learning Process in DENFIS Online Model . . . 144

6.5 Network Topologies and Learning Details. . . 146

6.5.1 DENFIS Q-learning controller (DENFISQC) . . . 146

6.5.2 Dynamic fuzzy Q-learning controller (DFQC). . . 147

6.6 Simulation Results . . . 147

6.6.1 Learning performance study. . . 148

6.6.2 Robustness study. . . 150

6.7 Concluding Remarks. . . 153

7 Model-Free Predictive Control Based on Reinforcement Learning 155 7.1 Background and Motivation. . . 155

7.2 Model-free Predictive Control. . . 159

7.2.1 Learning framework. . . 160

7.3 Reinforcement Learning Controller. . . 161

7.3.1 Controller framework. . . 161

(14)

7.3.2 Controller Architecture. . . 162

7.3.3 Pseudo-code for controller design algorithm. . . 166

7.4 Model Predictive Control. . . 167

7.5 Controller Realization. . . 168

7.5.1 CSTR dynamics and control. . . 169

7.5.2 Controller learning details. . . 169

7.6 Simulation Results and Discussion. . . 171

7.6.1 Learning performance study. . . 171

7.6.2 Robustness study. . . 173

7.7 Concluding Remarks. . . 176

8 On-line Model Learning for Policy Iteration 179 8.1 Background and Motivation. . . 179

8.2 Model Learning Algorithm. . . 181

8.2.1 Aggregating information from successive trajectories . . . 183

8.2.2 Policy update step . . . 184

8.3 Model and Data for Simulation . . . 185

8.3.1 CSTR modeling and control . . . 186

8.3.2 Data for simulation. . . 186

8.4 Simulation Results and Discussion. . . 187

8.4.1 Learning performance study . . . 187

8.4.2 Robustness study. . . 189

8.5 Concluding Remarks. . . 193

9 Conclusions and Future Scope of Work 195 Appendix A Inverted Pendulum Swing-up. . . 199

Appendix B Cart-Pole Balancing Task. . . 201

(15)

xii

Appendix C Two-link Robot Arm Control. . . 203

Appendix D Continuous Stirred Tank Reactor (CSTR) . . . 205

References. . . 207

List of Publications. . . 219

Technical Biography of Author. . . 221

References

Related documents

The performances of this controller have been compared with other two controllers in the area of PID like controllers namely Proportional plus Derivative controller and

Parlakci, “Improved robust stability criteria and design of robust stabilizing controller for uncertain linear time-delay systems,” International Journal of Robust and

Apart from the above reported controller techniques such as PID controller, model based control, adaptive control and backstepping controller also fuzzy logic as well

Complex, non-linear systems modeled using the basic first principle modeling approach can’t be directly used as the mathematical model for controller design.. The

This chapter comprise of wide-area control loop selection, model order reduction of plant because created model is of 68 th order and it could pose a formidable

Strategies such as double loop controller, Internal Model Controller (IMC), Modified IMC and robust controller are used for the same process and the performance of each

In this paper, a generic optimization problem arising in supply chain design is modeled in a game theoretic frame- work and solved as a decentralized problem using a mecha- nism

Additional Key Words and Phrases: Reinforcement learning, sequential decision-making, non-stationary environments, Markov decision processes, regret computation, meta-learning,