• No results found

Automated machine learning using particle swarm optimization

N/A
N/A
Protected

Academic year: 2022

Share "Automated machine learning using particle swarm optimization"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

AUTOMATED MACHINE LEARNING USING

PARTICLE SWARM OPTIMIZATION

PRATIBHA SINGH

DEPARTMENT OF ELECTRICAL ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY DELHI

MAY 2021

(2)

©Indian Institute of Technology Delhi (IITD), New Delhi, 2021

(3)

AUTOMATED MACHINE LEARNING USING

PARTICLE SWARM OPTIMIZATION

by

PRATIBHA SINGH

DEPARTMENT OF ELECTRICAL ENGINEERING

Submitted

in fulfillment of the requirements of the degree of Doctor of Philosophy to the

INDIAN INSTITUTE OF TECHNOLOGY DELHI

MAY 2021

(4)

Certificate

This is to certify that the thesis titled Automated Machine Learning using Particle Swarm Optimization being submitted by Ms. Pratibha Singh to the Department of Electrical Engineering, Indian Institute of Technology Delhi, for the award of Doctor of Philosophy is a record of bona-fide research work carried out by her under our guidance and supervision. In our opinion, the thesis has reached the standards fulfilling the requirements of the regulations relating to the degree. The work presented in this thesis has not been submitted elsewhere, either in part or full, for the award of any other degree or diploma.

Prof. Santanu Chaudhury

Department of Electrical Engineering Indian Institute of Technology Delhi New Delhi - 110016, India

Prof. Bijaya Ketan Panigrahi

Department of Electrical Engineering Indian Institute of Technology Delhi New Delhi - 110016, India

(5)

iii

Acknowledgements

I would like to extend my sincere gratitude to my advisors Prof. Santanu Chaudhury and Prof.

Bijaya Ketan Panigrahi. Their important observations and suggestions guided me to determine the direction of the research work and widen my research work. This thesis is only possible because of their guidance, continuous support and motivation. I will always be grateful for their kindness, patience, encouragement and immense knowledge.

I am thankful to the members of my review committee Prof. Niladri Chatterje, Prof. Sumantra Dutta Roy, Prof. K. K. Vishwas, and Prof. Jayadeva for their valuable suggestions during my presentations. They were always being very supportive and generous throughout the discussions.

I admire and express my gratitude Prof. P. S. Gill and Prof. Shailesh Tiwari who has always been very supportive and encouraging throughout my tenure of job under them towards finishing my Ph.D. I am heartily thankful to my friends Ms. Deepti Goyal and Ms. Nidhi Singh for always being there with their unstinting support and continuous encouragement.

I acknowledge the significant contribution of my parents (Sri Shiv Nath Singh and Mrs. Gaya Kumari) and my brothers (Mr. Pradeep Singh and Mr. Sandeep Singh). Their support and encouragement is always my inspiration.

I would like to express my heartiest gratitude and sincere appreciation to my husband Mr.

Shyamal Mishra for his continuous support through all my efforts and encouragement for pursuing my Ph.D.

Pratibha Singh

(6)

v

Abstract

The new era in Artificial Intelligence has seen lots of advances in machine learning algorithms, its many applications in widespread domains, and still has more capabilities to emerge. The performance of these machine learning algorithms is very sensitive and typically dependent on the choice of parameters that accomplishes the assigned task. Human engineers are very prominent in doing this task of selecting parameters with satisfactory performance. They are the key role contributors to the success of many machine learning algorithms. This may also lead to the overhead of the good or bad design of any algorithm. The process of identifying a perfect set of parameters for satisfactory performance is difficult for non-Machine Learning experts and even most of the time for experts also. This problem has a solution which is to automate the process of parameter selection and it is the objective of Automated Machine Learning (AutoML). AutoML makes the machine learning algorithms convenient to use for everyone by automatically determining appropriate parameters selected from large search space with all possible parameters outperforming the performance of human experts. Our work is inspired by the fast evolving field of AutoML and one of the most important population based metaheuristic methods of Particle Swarm Optimization (PSO) which will explore and exploit the optimal solution for the betterment of the design of an algorithm. This thesis presents insight on state-of-the-art methods already popular in AutoML. It has also been observed that random search, meta-heuristic and Bayesian optimization methods have eminent contribution in designing automated pipeline of machine learning. We have proposed optimal architectures which will use PSO for the selection of features and create optimal weighted decision trees for a Random Forest Classifier (RFC). Literature on machine learning further guided us towards the most demanding architecture design of the neural networks in deep learning which outperforms many machine learning algorithms. Deep learning architectures are very popular but emerge with very high computational costs. The main challenge in deep learning is the optimal design of neural network architecture and its hyperparameter optimization. It is difficult to work with every suggested path in its design comprising of different

(7)

vi

combinations of its architecture and hyperparameters. Thus it has motivated us to look for a metaheuristic algorithm that optimizes the search in a large space of possible configurations and drive the solution towards the global optimum. The proposed PSO optimized architecture of Convolutional Neural Network (CNN) is optimal as confirmed through results but a very expensive learning model generated after many numbers of steps of evolution. This Swarm optimized pre-trained CNN model is thus subsequently deployed to work for other applications.

We used transfer learning to extend the contribution of the PSO optimized CNN model in various other applications without putting in much effort. Our swarm optimized models are robust and have shown significant improvement in results in the most popular standard datasets for classification. The results have also been compared with many approaches suggested in most eminent and recent researches in the relevant fields.

(8)

vii

सार

आर्टिर्िर्ियल इंटेर्लजेंस में नए युग ने मिीन लर्निंग एल्गोरिदम में बहुत प्रगर्त देखी है, व्यापक डोमेन में

इसके कई अनुप्रयोग हैं, औि अभी भी उभिने की अर्िक क्षमताएं हैं। इन मिीन लर्निंग एल्गोरिदम का

प्रदििन बहुत संवेदनिील है औि आमतौि पि उन मापदंडों की पसंद पि र्नभिि किता है जो असाइन र्कए गए कायि को पूिा किते हैं । संतोषजनक प्रदििन के साथ मापदंडों के चयन के इस कायि को किने में मानव इंजीर्नयि बहुत प्रमुख हैं। वे कई मिीन लर्निंग एल्गोरिदम की सिलता में महत्वपूर्ि भूर्मका र्नभाते हैं। यह र्कसी भी एल्गोरिथम के अच्छे या बुिे र्डजाइन के ऊपिी र्हस्से को भी जन्म दे सकता है। संतोषजनक प्रदििन के र्लए मापदंडों के एक सही सेट की पहचान किने की प्रर्िया गैि-मिीन लर्निंग र्विेषज्ों के र्लए औि यहां तक र्क अर्िकांि समय र्विेषज्ों के र्लए भी कर्िन है। इस समस्या का एक समािान है जो

पैिामीटि चयन की प्रर्िया को स्वचार्लत किना है औि यह ऑटोमेटेड मिीन लर्निंग (ऑटोएमएल) का

उद्देश्य है। ऑटोएमएल मिीन लर्निंग एल्गोरिदम को सभी के र्लए उपयोग किने के र्लए सुर्विाजनक बनाता है, जो मानव र्विेषज्ों के प्रदििन को बेहति बनाने वाले सभी संभार्वत मापदंडों के साथ बडे खोज स्थान से चुने गए उपयुक्त मापदंडों को स्वचार्लत रूप से र्निािरित किता है। हमािा काम ऑटोएमएल के

तेजी से र्वकर्सत हो िहे क्षेत्र औि पार्टिकल स्वामि ऑर्िमाइजेिन (पी एसओ) के सबसे महत्वपूर्ि जनसंख्या

आिारित मेटाहेरिस्टिक तिीकों में से एक से प्रेरित है, जो एल्गोरिदम के र्डजाइन की बेहतिी के र्लए इष्टतम समािान का पता लगाएगा औि उसका िायदा उिाएगा। यह थीर्सस ऑटोएमएल में पहले से ही लोकर्प्रय अत्यािुर्नक र्वर्ियों पि अंतर्दिर्ष्ट प्रस्तुत किती है। यह भी देखा गया है र्क मिीन लर्निंग की स्वचार्लत पाइपलाइन को र्डजाइन किने में यार्दस्टच्छक खोज, मेटा-हेयुरिस्टिक औि बायेर्सयन ऑर्िमाइजेिन र्वर्ियों

का प्रमुख योगदान है। हमने इष्टतम आर्किटेक्चि का प्रस्ताव र्दया है जो सुर्विाओं के चयन के र्लए पीएसओ का उपयोग किेगा औि िैंडम िॉिेि क्लार्सिायि (आिएिसी) के र्लए इष्टतम भारित र्नर्िय पेड तैयाि

किेगा। मिीन लर्निंग पि सार्हत्य ने हमें डीप लर्निंग में तंर्त्रका नेटवकि के सबसे अर्िक मांग वाले

आर्किटेक्चि र्डजाइन की ओि र्नदेर्ित र्कया, जो कई मिीन लर्निंग एल्गोरिदम से बेहति प्रदििन किता

है। डीप लर्निंग आर्किटेक्चि बहुत लोकर्प्रय हैं, लेर्कन बहुत अर्िक कम्प्यूटेिनल लागत के साथ उभिते हैं।

डीप लर्निंग में मुख्य चुनौती न्यूिल नेटवकि आर्किटेक्चि का इष्टतम र्डजाइन औि इसके हाइपिपैिामीटि

ऑर्िमाइजेिन है। इसकी वास्तुकला औि हाइपिपैिामीटि के र्वर्भन्न संयोजनों से युक्त इसके र्डजाइन में

(9)

viii

प्रत्येक सुझाए गए पथ के साथ काम किना मुस्टिल है। इस प्रकाि इसने हमें एक मेटाहेरिस्टिक एल्गोरिथम की तलाि किने के र्लए प्रेरित र्कया है जो संभार्वत कॉस्टफ़िगिेिन के एक बडे स्थान में खोज को अनुकूर्लत किता है औि समािान को वैर्िक इष्टतम की ओि ले जाता है। कन्वसेिनल न्यूिल नेटवकि (सीएनएन) का

प्रस्तार्वत पीएसओ अनुकूर्लत आर्किटेक्चि इष्टतम है जैसा र्क परिर्ामों के माध्यम से पुर्ष्ट की गई है, लेर्कन र्वकास के कई चिर्ों के बाद उत्पन्न एक बहुत महंगा र्िक्षर् मॉडल है। इस झुंड अनुकूर्लत पूवि-प्रर्िर्क्षत सीएनएन मॉडल को बाद में अन्य अनुप्रयोगों के र्लए काम किने के र्लए तैनात र्कया गया है। हमने बहुत अर्िक प्रयास र्कए र्बना र्वर्भन्न अन्य अनुप्रयोगों में पीएसओ अनुकूर्लत सीएनएन मॉडल के योगदान को

बढाने के र्लए ट्ांसिि लर्निंग का उपयोग र्कया। हमािे झुंड अनुकूर्लत मॉडल मजबूत हैं औि वगीकिर्

के र्लए सबसे लोकर्प्रय मानक डेटासेट में परिर्ामों में महत्वपूर्ि सुिाि र्दखाया है। परिर्ामों की तुलना

प्रासंर्गक क्षेत्रों में सबसे प्रख्यात औि हाल के िोिों में सुझाए गए कई र्दर्ष्टकोर्ों से भी की गई है।

(10)

ix

Contents

Certificate i

Acknowledgements iii

Abstract v

List of figures xiii

List of Tables xv

1 Introduction 1

1.1 AutoML methods 2

1.2 Objectives and scope of work 5

1.3 Major contributions of the Thesis 9

1.4 Layout of thesis 12

2 AutoML: A Review 15

2.1 State-of-the-art methods in AutoML 16

2.1.1. Data Preprocessing 16

2.1.2. Feature engineering 17

2.1.3 Pipeline Synthesis 18

2.1.4 Hyperparameter optimization 19

(11)

x

2.1.5. Architectural search 20

2.1.6 Transfer Learning 21

2.1.7 AutoML packages 25

2.2 Automated Feature selection and model selection in Random Forest 27

2.3 Motivation for the proposed work 30

2.4 Summary 32

3 Optimal feature selection and weighted voting optimization in Random Forest

Classifier using PSO 35

3.1 Particle Swarm Optimization (PSO) 38

3.1.1 Formulation of PSO approach 40

3.1.2 Algorithm of Original PSO 41

3.2 Random Forest Classifier 42

3.3. Proposed Approach of Particle Swarm optimized RFC 43

3.4. Sampling feature subsets from image 54

3.5. Hierarchical Classification 55

3.6. Experiments and Results 55

3.6.1. Experiments on dataset of historical monuments 57

3.6.2. Experiments on UCI repository dataset 60

3.6.3 Comparison of results with state of the arts methods 61

3.7. Conclusion 65

4 Multi-level Particle Swarm optimized Architecture and Hyperparameters in

CNN 67

4.1. Introduction 67

4.2. Convolutional Neural Networks 70

(12)

xi

4.4. Proposed MPSO-CNN method 75

4.5. Hybrid MPSO-CNN Architecture 80

4.6. Implementation Results 84

4.6.1. Benchmark Datasets 85

4.6.2. Result Analysis 85

4.7. Conclusion and future work 87

5 PSO optimized Convolutional Neural Network with Transfer Learning 89

5.1. Introduction and Motivation 89

5.2. Proposed Method 95

5.3. Transfer Learning in PSO optimized CNN Architecture 101

5.4. Implementation and Result analysis 103

5.4.1. Benchmark Datasets 103

5.4.2. Implementation 104

5.4.3. Result Analysis 106

5.5. Conclusion 112

6 Conclusion 115

6.1 Contribution of PSO optimized Random Forest Classifier 116 6.2 Contribution of PSO optimized Convolution Neural Network 117 6.3 Contribution of Transfer Learning of PSO optimized Convolution Neural

Network 119

6.4 Scope of future work 120

Bibliography 125

Publication 149

Biography 151

4.3. Adaptive Particle Swarm Optimization 73

(13)

xiii

List of Figures

2.1 Automated Machine Learning Methods

……….

16

3.1 Original PSO

………

42

3.2 PSO-RFC basic structure

………..

44

3.3 Flow chart of proposed PSO-RFC

……….

45

3.4 Feature generation with SIFT

……….

55

3.5 Hierarchical Classification: PSO-RFC

……….

56

3.6 Hierarchical Structure of Historical Monuments Dataset

…………...

57

3.7 Two level classification for Historical Monuments’ dataset using Hybrid PSO- RFC and RFC

………

59

4.1 Convolution Neural Network Architecture

……….

71

4.2 Hybrid MPSO-CNN Basic Building Block Diagram

………..

76

4.3 Hybrid MPSO-CNN Flow diagram

………

78

4.4 Hybrid MPSO-CNN System Architecture

………...

80

4.5 Evolving CNN architecture using PSO for MNIST, CIFAR10 and CIFAR100 datasets

………..

83

4.6 Evolving CNN architecture using PSO for MNIST, CIFAR10 and CIFAR100 datasets

………..

87

5.1 Flow diagram of PSO optimized CNN Model

………

100

5.2 Basic Framework of proposed Hybrid PSO optimized Deep Transfer Learning.. 102

(14)

xiv

5.3 Network of CNN Model

………

105

5.4 Accuracy vs. Epoch: Optimized CNN model using PSO

…………..

106

5.5 Accuracy vs. Epoch on training and validation data and Loss vs. Epoch on

training and validation data

……….

111

(15)

xv

List of Tables

3.1 Various Versions of PSO

………

39

3.2 (a) Nomenclature used in algorithm

………..………

46

3.2 (b) Particle Swarm Optimization parameters

………

47

3.3 Confusion Matrices of RFC and RFC-PSO for Historical Monuments dataset

58 3.4 Comparison of RFC-PSO to RFC on selected UCI repository datasets

…..…

61

3.5 Comparison of Accuracy on Test data

…..…..…..…..…..…

62

3.6 Comparison of accuracy (%) on test data

…..…..…..…..…...

63 3.7 Comparison of accuracy (%) on test data

…..…..…..…..…...

63

4.1 Range of parameters

…..…..…..…..…..…..…..…..

82

4.2 Value of parameters

…..…..…..…..…..…..…..…..

82

4.3 Benchmark Datasets

…..…..…..…..…..…..…..…..

85

4.5 Comparison of experimental results

…..…..…..…..…..…..

86

5.1 Source Data: MNIST, Target Data: CIFAR-10

…..…..…..…….

108 5.2 Source Data: CIFAR-100, Target Data: CIFAR-10R-10 Dataset

...……..

108 5.3 Source Data: MNIST, Target Data: CIFAR-100

…..…..…..……

109 5.4 Source Data: CIFAR-10, Target Data: CIFAR-100

…..…..…..…..

109 5.5 Source Data: CIFAR-10, Target Data: MNIST

…..…..…..…….

110 5.6 Source Data: CIFAR-100, Target Data: MNIST

…..…..…..……

110

(16)

xvi

5.7 Computation time using Transfer Learning and without using Transfer Learning

.

112

References

Related documents

Introduction Main Ideas Unconstrained Min Constrained Min Submodular Max DS Optimization Submodular Constraints.. +

• data: Comes from various sources such as sensors, domain knowledge, experimental runs, etc.... • Ability of computers to “learn” from “data” or “past

– a: alignment between words in phrase pair (ē, f) – w(x|y): word translation probability. • Inverse

This is a key observation that merits a discussion: the simple MLE algorithm achieves the best fit to the training data because it essentially searches a bigger model; whereas

We want to build a model which predicts well on data A model’s performance is quantified by a loss function.. a sophisticated

Communicating w across the network takes constant amount of time denoted by T c , and communicating a subset of w takes time.. proportional to

The IMDb movie review dataset is considered for classification unlike the chapter 3 where two different datasets are considered as the polarity dataset classification work on

When four different machine learning techniques: K th nearest neighbor (KNN), Artificial Neural Network ( ANN), Support Vector Machine (SVM) and Least Square Support Vector