Design of Statistical and Swarm
Intelligence based Declustering Models for Spatio-Temporal Seismicity Analysis
Rahul Kumar Vijay 2014REC9004
Advisor: Dr. Satyasai Jagannath Nanda
Department of Electronics and Communication Engineering Malaviya National Institute of Technology, Jaipur
This dissertation is submitted for the degree of Doctor of Philosophy
MNIT Jaipur January 2020
I would like to dedicate this thesis to
my loving parents, my wife, and my angel: Pranika . . .
Declaration
I,Rahul Kumar Vijay, declare that this thesis titled, “Design of Statistical and Swarm Intelligence based Declustering models for Spatio-Temporal Seismicity Analysis” and the work presented in it, are my own. I confirm that:
• This work was done wholly or mainly while in candidature for a research degree at this university.
• Where any part of this thesis has previously been submitted for a degree or any other qualification at this university or any other institution, this has been clearly stated.
• Where I have consulted the published work of others, this is always clearly attributed.
• Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.
• I have acknowledged all main sources of help.
• Where the thesis is based on work done by myself, jointly with others, I have made clear exactly what was done by others and what I have contributed myself.
Date:
Mr. Rahul Kumar Vijay Ph.D. Scholar (2014REC9004) Dept. of Electronics and Communication Engineering Malaviya National Institute of Technology, Jaipur
Certificate
This is to certify that the thesis entitled“Design of Statistical and Swarm Intelligence based Declustering models for Spatio-Temporal Seismicity Analysis” being submitted by Mr. Rahul Kumar Vijay (2014REC9004) is the bonafide research work carried out under my supervision and guidance in fulfillment of the requirement for an award of the degree of Doctor of Philosophy in the Department of Electronics and Communication Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India. The matter embodied in this thesis is original and has not been submitted to any other university or institute for awards of any other degree.
Place: Jaipur Date:
Dr. Satyasai Jagannath Nanda Assistant Professor Dept. of Electronics and Communication Engineering Malaviya National Institute of Technology, Jaipur
Acknowledgements
With my immense gratitude, I would like to thank Dr. Satyasai Jagannath Nanda for giving me this opportunity to pursue my PhD under his supervision. The insightful guidance and encouraging support has marked every step of my research work which has been an enjoyable learning experience both professionally and personally.
I would also like to thank Prof. Vishwanath Sinha, Prof. K.K. Sharma, Prof. M. Salim and Dr. Ravi Maddila for being my DREC members during the tenure of my Ph.D. They have provided me with all the suggestion and their valuable comments through the journey of my Ph.D. I am thankful to the DPGC convener Prof. Ghanshyam Singh for his immense support during my Ph.D.
I would like to express my deep gratitude to Prof. D. Boolchandani, Head, Department of Electronics and Communication Engineering, Malaviya National Institute of Technology Jaipur for providing friendly and stimulating atmosphere to complete the thesis work. I am grateful to University Grants Commission, Govt. of India for funding my PhD thesis work through NET fellowship. Also I would like to extend my thanks to our DSP and Image processing lab, MNIT library and Computer Center, Prabha Bhawan for providing me all the necessary hardware and software platform to make this work possible.
I would like to thanks my colleagues Dr. Urvashi P. Shukla, Rahul Ratnakumar, Dinesh Kumar, Rachana Gupta, Vikas Pathak for their cooperation at every stage during the entire dissertation work. At last but not the least unconditional love and support of my parents and wife have enabled me to pursue such a wonderful academic journey.
Finally, I would like to thank all the people who have been associated with this work directly or indirectly.
Rahul Kumar Vijay
Abstract
Earthquake clustering is the most prominent form of seismicity with signatures in space, time, and size (magnitude, intensity, and energy) domains that provide a deep description of the earthquake dynamics. Clustering in the spatio-temporal domain reveals a strong tendency of events to form compact clusters of irregular shapes and various sizes (comprised of foreshock-aftershock events) that are correlated in space and time over large distances. In this thesis, statistical and swarm-intelligence based methods are developed for comprehensive detection and analysis of earthquake clusters in space-time-energy domain. These proposed methods also have the potential to identify uniform seismic areas in the form of independent background (declustering seismicity) along with the earthquake clusters. The background seismicity has stationary behavior (represented by temporally homogeneous Poisson Process) that can be used for further hazard analysis.
Initially, well-known K-means and density based approach are implemented which considers the event’s occurrence time, location, magnitude and depth information present in an earthquake catalog. AK-means based tetra-stage clustering (TSC) model is proposed to classify the seismicity with analysis in a spatio-temporal domain. Due to difficulty in predicting the number of clusters (K-value) in TSC model, density based declustering model is developed which exploit the use of Gaussian Kernel in density estimation for spatio- temporal clustering analysis. Here, earthquake magnitude is also considered as a weight on each event for enhancing the cluster quality and finding the accurate number of clusters for a given catalog.
Furthermore, an intensity-based optimized space-time window is developed by using the two statistical parameters: Coefficient of Variation (Time domain) and Morisita index (Spatial domain). A shared-nearest neighborhood intensity based declustering model is proposed with strong validation and accurate analysis of results based on these parameters. Last contributed work is dedicated for swarm-intelligence based declustering model that effectively utilize the ergodic characteristics of a catalog using Thirumalai-Mountain (TM) metric. Here, an optimization-based framework is developed for solving the problem of seismicity declustering with the concept of TM metric. An effective Quantum Grey Wolf Optimization algorithm is proposed and tested on different benchmark functions. Then, a Quantum Grey Wolf
Optimizer based declustering model in an ergodic framework is proposed that is effectively applied to the real-life earthquake catalog (for solving the problem of seismicity declustering).
In this thesis, prominent earthquake catalogs are used for long-term spatio-temporal seismicity analysis (at long timescales). The proposed models are applied to the historical catalog of Southern California, Himalaya, Indonesia, Japan, Taiwan, Switzerland, Sumatra- Andaman, Philippines and Iran. Results are demonstrated with the use of epicenter plot, space vs time plot,λ plot, cumulative plot, and inter-event time vs distance plot. The superior performance of the proposed declustering models are demonstrated by comparing them with benchmark declustering methods.
Contents
List of Figures xix
List of Tables xxv
1 Introduction 1
1.1 Background . . . 1
1.2 Motivation . . . 2
1.3 Objective of the Thesis . . . 3
1.4 Structure of the Thesis . . . 5
1.5 Conclusion . . . 5
2 Literature Review 7 2.1 Background . . . 7
2.2 Benchmark declustering algorithms . . . 11
2.2.1 Gardner and Knopoff’s declustering approach . . . 11
2.2.2 Reasenberg’s declustering approach . . . 12
2.2.3 Stochastic declustering algorithm . . . 13
2.2.4 Model independent stochastic declustering (MISD) . . . 14
2.3 K-means clustering based approaches . . . 15
2.4 Fuzzy based clustering approaches . . . 16
2.5 Density based clustering approaches . . . 17
2.5.1 Density-based spatial clustering of applications with noise (DBSCAN) 17 2.5.2 Clustering by fast search and finding of density peaks (CFSFDP) . . 19
2.6 Ergodicity based approaches . . . 21
2.7 Entropy based approaches . . . 22
2.8 Inter-event time and distance based approaches . . . 23
2.9 Other approaches . . . 25
2.9.1 Hidden Markov Model (HMM) based declustering approach . . . . 25
2.9.2 Seismic density index based grid network . . . 25
2.9.3 Schuster spectrum-based approach . . . 26
2.9.4 Multi point (m)-Morisita index based approach . . . 26
2.9.5 A 3D nearest-neighbor distance (NND) based approach . . . 27
2.9.6 An approach based on Single link cluster analysis . . . 28
2.9.7 An approach based on multi-resolution wavelet analysis . . . 28
2.9.8 Non-overlapping clustering with evolutionary algorithm . . . 29
2.10 Earthquake catalogs . . . 29
2.10.1 California . . . 30
2.10.2 Himalayan . . . 30
2.10.3 Indonesia . . . 32
2.10.4 Japan . . . 32
2.10.5 Switzerland . . . 32
2.10.6 Taiwan . . . 32
2.10.7 Philippines . . . 33
2.10.8 Iran . . . 33
2.10.9 Sumatra-Andaman . . . 33
2.11 Conclusion . . . 34
3 A Tetra-stage Model for Declustering Earthquake Catalogs 35 3.1 Background . . . 36
3.2 Proposed Tetra-stage clustering model . . . 37
3.2.1 Earthquake catalog . . . 38
3.2.2 Depth thresholding . . . 38
3.2.3 Shallow events . . . 38
3.2.4 Mainshock identification . . . 38
3.2.5 Temporal cluster identification . . . 39
3.2.6 Spatial based cluster identification . . . 40
3.2.7 Categorization . . . 41
3.2.8 Magnitude based cluster identification . . . 41
3.2.9 Deep events . . . 42
3.3 Benchmark catalogs declustering . . . 42
3.3.1 Japan Catalog Analysis . . . 43
3.3.2 Himalaya Catalog Analysis . . . 44
3.3.3 Taiwan Catalog Analysis . . . 45
3.4 Result and Discussions . . . 49
3.4.1 Epicenter plot . . . 51
Contents xv
3.4.2 Space-time plot . . . 52
3.4.3 λ and Cumulative plots . . . 53
3.4.4 Inverse TM metric plot . . . 55
3.4.5 Comparison with benchmark declustering algorithms . . . 58
3.4.6 Comparison with the catalogs used by Nanda et al. (2013) . . . 59
3.5 Conclusion . . . 60
4 Density Peak Space-Time Clustering for Classification of Seismicity 61 4.1 Background . . . 62
4.2 Proposed spatio-temporal density peak clustering . . . 63
4.3 Earthquake catalogs used for analysis . . . 72
4.4 Classification results and analysis . . . 72
4.4.1 Epicenter plot . . . 74
4.4.2 Space-time plot . . . 75
4.4.3 λ and Cumulative plots . . . 76
4.4.4 Inverse TM metric plot . . . 80
4.4.5 Comparison with benchmark declustering methods . . . 81
4.5 Conclusion . . . 82
5 Shared Nearest Neighborhood Intensity Based Declustering Model for Spatio- Temporal analysis of Seismicity 83 5.1 Background . . . 84
5.2 Proposed shared nearest neighborhood intensity based declustering algorithm 85 5.2.1 Coefficient of Variation . . . 86
5.2.2 m-Morisita Index . . . 88
5.3 Earthquake catalogs used in the analysis . . . 89
5.3.1 Philippines . . . 89
5.3.2 Iran . . . 89
5.4 Simulation Results and Discussions . . . 90
5.4.1 Epicenter plot . . . 90
5.4.2 Space-time plot . . . 91
5.4.3 Inter-event time vs inter-event distance plot . . . 92
5.4.4 Cumulative andλ plot . . . 93
5.5 COV based temporal analysis of seismicity . . . 95
5.6 m-MI based spatial analysis of seismicity . . . 96
5.7 Comparative analysis of proposed method with benchmark declustering algorithms . . . 98
5.8 Conclusion . . . 100
6 Ergodicity based Earthquake Declustering Model using Meta-heuristics Opti- mization 101 6.1 Background . . . 102
6.2 An overview of Grey Wolf Optimizer . . . 103
6.2.1 Social hierarchy . . . 103
6.2.2 Hunting mechanism . . . 103
6.2.3 Exploration and Exploitation . . . 104
6.3 Quantum Grey Wolf Optimizer . . . 105
6.3.1 Simulation and comparative analysis of QGWO for benchmark func- tion optimization . . . 108
6.4 Thirumalai-Mountain metric for ergodicity analysis . . . 110
6.5 Proposed declustering model using QGWO algorithm . . . 113
6.5.1 Discretization of the Spatial region . . . 113
6.5.2 Identification of effective Ergodic periods . . . 113
6.5.3 Removal of events from the catalog . . . 114
6.5.4 Selection of the Events . . . 115
6.6 Earthquake catalogs used in the study . . . 116
6.6.1 Himalaya . . . 116
6.6.2 Indonesia . . . 116
6.6.3 Japan . . . 118
6.6.4 Switzerland . . . 118
6.6.5 Taiwan . . . 118
6.7 Results and Discussion . . . 118
6.7.1 Epicenter Plot . . . 118
6.7.2 λ and Cumulative plot . . . 123
6.7.3 Variance plot . . . 123
6.7.4 Convergence plot . . . 128
6.7.5 Comparison of proposed method with PGWO and variableε-DBSCAN based declustering approach . . . 129
6.7.6 Comparison with benchmark declustering methods . . . 129
6.8 Conclusion . . . 130
7 Conclusion and Future work 131 7.1 Conclusions . . . 131
7.2 Future works . . . 132
Contents xvii
Bibliography 135
Appendix A Euclidean and Haversine formula 145
Appendix B Benchmark Functions for Optimization 147
Publications from the Thesis work 151
Curriculam Vitae 153
List of Figures
1.1 Earthquake multi-event classification: It highlights a single (empty circle) event and family of event which consists of foreshocks (empty squares), mainshock (dark circle), and aftershocks (dark squares). . . 2 1.2 Foreshock-mainshock-aftershock sequence: It represents the occurrence
of foreshock and aftershocks related to (a) Tohoku mainshock with 9.0 magnitude and (b) Gorkha (Nepal) mainshock with 7.8 Magnitude. . . 3 1.3 Scope of the thesis. . . 4 1.4 Structure of the thesis. . . 5 2.1 Tohoku earthquake (2011) analysis: (a) Occurrence of earthquake epicenters
and (b) observed Bath law. . . 8 2.2 Tohoku earthquake (2011) analysis: (a) Frequency-Magnitude histogram and
(b) Observed Gutenberg-Richter law. . . 9 2.3 Tohoku earthquake (2011) analysis: (a) Observed Omori-Ustu law and its
(b) cumulative form. . . 11 3.1 A Tetra-stage cluster identification model for seismic catalogs. . . 37 3.2 Earthquake catalog of Japan: (a) epicenters with mainshocks (cluster cen-
troids highlighted in big black dots), (b) cumulative number of events w.r.t.
time, (c) Longitude-time plot and (d) Inverse TM metric plot. . . 43 3.3 Earthquake catalog of Himalaya: (a) epicenters with mainshocks (cluster
centroids highlighted by big black dots), (b) cumulative number of events w.r.t. time, (c) Longitude-time plot and (d) Inverse TM metric plot. . . 44 3.4 Earthquake catalog of Taiwan: (a) epicenters with highlighted mainshocks
(cluster centroids in big black dots) (b) cumulative number of events w.r.t.
time, (c) Longitude-time plot and (d) Inverse TM metric plot. . . 45
3.5 Cluster analysis for Japan: (a) Temporal clustering, (b) Time zone based clustering, Coordinate based clustering on events belong to (c) regular time zone (RTZ), Coordinate based clustering on events belong to (d) danger time zone (DTZ), Coordinate thresholding on (e) RTZ and (f) DTZ. . . 46 3.6 Cluster analysis of Himalaya: (a) Temporal clustering, (b) Time zone based
clustering, Coordinate based clustering on events belong to (c) Regular time zone (RTZ), and events belong to (d) Danger time zone (DTZ), Coordinate thresholding on (e) RTZ and (f) DTZ. . . 47 3.7 Cluster analysis of Taiwan: (a) Temporal clustering, (b) Time zone based
clustering, Coordinate based clustering on events belong to (c) Regular time zone (RTZ) and events belong to (d) Danger time zone (DTZ), (e) Coordinate thresholding on RTZ and (f) DTZ. . . 48 3.8 Earthquake epicenters in Japan for: (a) clustered, (b) declustered catalog
obtained from Nanda et al. model, (c) clustered and (d) declustered catalog obtained from proposed model. . . 49 3.9 Earthquake epicenters in Himalaya for: (a) clustered, (b) declustered catalog
obtained from Nanda et al. model, (c) clustered and (d) declustered catalog obtained from proposed model. . . 50 3.10 Earthquake epicenters in Taiwan for: (a) clustered, (b) declustered catalog
obtained from Nanda et al. model, (c) clustered and (d) declustered catalog obtained from proposed model. . . 51 3.11 Space-time plot of events occurred in Japan for: (a) clustered and (b) declus-
tered catalog obtained by Nanda et al. model, (c) clustered and (d) declustered catalog obtained from the proposed model. . . 52 3.12 Space-time plot of events occurred in Himalaya for: (a) clustered and (b)
declustered catalog obtained from Nanda et al. model, (c) clustered and (d) declustered catalog obtained from the proposed model. . . 53 3.13 Space-time plot of events occurred in Taiwan for: (a) clustered and (b)
declustered catalog obtained from Nanda et al. model, (c) clustered and (d) declustered catalog obtained from the proposed model. . . 54 3.14 Japan catalog analysis in terms of λ and cumulative plot: (a) number of
events (λ) per year and their (b) cumulative sum obtained from true, clustered and declustered catalog using Nanda et al. model, (c) number of events (λ) per year and their (d) cumulative sum obtained from true, clustered and declustered catalog using proposed model.. . . 55
List of Figures xxi 3.15 Himalaya catalog analysis in terms ofλ and cumulative plot: (a) number of
events (λ) per year and their (b) cumulative sum obtained from true, clustered and declustered catalog using Nanda et al. model, (c) number of events (λ) per year and their (d) cumulative sum obtained from true, clustered and declustered catalog using proposed model. . . 56 3.16 Taiwan catalog analysis in terms ofλ and cumulative plot: (a) number of
events (λ) per year and their (b) cumulative sum obtained from true, clustered and declustered catalog using Nanda et al. model, (c) number of events (λ) per year and their (d) cumulative sum obtained from true, clustered and declustered catalog using proposed model. . . 57 3.17 Inverse TM metric plot: Ergodic behavior of aftershocks obtained using
Nanda et al. (grey) and proposed model (black) for (a) Japan (b) Himalaya, and (c) Taiwan. . . 57 3.18 Comparative study on catalog used by with Nanda et al. model: (a)λ plot for
Japan using Nanda et al. model, (b) proposed model, (c)λ plot for Indonesia using Nanda et al. model and (d) proposed model. . . 59 4.1 Estimated temporal density: The figure represents position of local maxima
(square red) and minima (green circle) for catalog of: (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman. . . 66 4.2 Determination of temporal classes: It represents the location of maxima (red
square) belongs to each class whose duration decided by the two nearby local minima points (green circle) for (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman. . . 67 4.3 Proposed two-stage clustering model for analysis of Earthquake catalogs. . 68 4.4 Demonstration of proposed approach on California: (a) Unweighted lo-
cal density ρ w.r.t. 2D locations (neglect size of an earthquake) and the corresponding (b) decision graph (c) weighted local density ρw and their corresponding (d) Weighted decision graph, (e) 3D distribution of seismic events occurred in California during 1983-2004. (f) 3D earthquake clusters obtained from the proposed approach. . . 69 4.5 Epicenter plot: Earthquake epicenters of total (true) events present in the
catalog of (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman. 74 4.6 Epicenter plot: Earthquake epicenters of obtained AFs using proposed ap-
proach for (a) California, (b) Himalaya, (c) Japan, and (d) Sumatra-Andaman. 75 4.7 Epicenter plot: Earthquake epicenters of obtained BGs using proposed ap-
proach for (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman. 76
4.8 Space-time plot: Longitude vs. time distribution of a total events (true events) in the first column, obtained AFs (second column) and BGs (last column).
Here each row corresponds for catalog of California, Himalaya, Japan, and Sumatra-Andaman respectively. . . 77 4.9 λ plot where occurrence of events in a year for true, AFs and BGs events
present in: (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman. 78 4.10 Cumulative plot where cumulative sum of true events, AFs and BGs for: (a)
California (b) Himalaya (c) Japan and (d) Sumatra-Andaman. . . 79 4.11 Cumulative rate of (a) Himalaya and (b) Sumatra-Andaman, obtained from
the variableε-DBSCAN algorithm [82]. . . 79 4.12 Inverse TM metric plot where ergodic behavior shown by true events, AFs
and BGs for: (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman. 80 5.1 ProposedCOV andm-MI based SNN-IBD model to classify the seismicity . 87 5.2 Earthquake catalog of Philippines and Iran: (a) Longitude-Latitude plot, (b)
Latitude-time plot for Philippines catalog, (c) Longitude-Latitude plot and (d) Latitude-time plot for Iran. . . 90 5.3 FMD plot: Frequency-Magnitude distribution with (a)Mc=4.4 for Philip-
pines and (b)Mc=4.2 for Iran. . . 91 5.4 Cluster analysis using proposed SNN-IBD model: Epicenter plots of AFs (in
black dots) and BGs (in gray dots) for (a) Philippines (b) Iran. . . 91 5.5 Cluster analysis using proposed SNN-IBD model: Latitude vs time distribu-
tion of classified AFs for (a) Philippines and (b) Iran, BGs for (c) Philippines and (d) Iran. . . 92 5.6 ∆tvs∆rplot for classified events using SNN-IBD: The red points are AFs
and blue are BGs for (a) Philippines and (b) Iran. . . 93 5.7 Cumulative andλ plots: cumulative sum of events occurred in a year for (a)
Philippines (b) Iran, number of AFs in a year (λ) for (c) Philippines and (d) Iran, number of BGs in a year for (c) Philippines and (d) Iran. . . 94 5.8 Probability density function of∆t: Inter-event time of classified BGs (His-
togram) and their approximation (in red) for (a) Philippines and (b) Iran. . . 96 5.9 m-Morisita index: Im,δ (log-log scale), m∈2,3,4,5 for (a,d) true events,
(b,e) AFs, (c,f) BGs for Philippines and Iran. . . 97 6.1 Social hierarchy in Grey wolves. . . 103 6.2 Position update mechanism in GWO. . . 104 6.3 Implementation of QGWO. . . 105
List of Figures xxiii 6.4 Flowchart of Quantum behaved Grey Wolf Optimizer. . . 107 6.5 Convergence of QGWO and its comparison with GWO on benchmark functions.109 6.6 Epicenter plot: Earthquake epicenters present in (a) Himalaya, (b) Indonesia,
(c) Japan, (d) Switzerland and (e) Taiwan. . . 117 6.7 Inverse TM metric plot: ergodic behavior quantified from inverse TM metric
for (a) Himalaya, (b) Indonesia, (c) Japan, (d) Switzerland and (e) Taiwan catalog. . . 119 6.8 Cluster analysis using QGWO based model: epicenter of clustered events
obtained for: (a) Himalaya, (b) Indonesia, (c) Japan, (d) Switzerland, and (e) Taiwan. . . 120 6.9 Cluster analysis using QGWO based model: epicenter of declustered events
obtained for: (a) Himalaya, (b) Indonesia, (c) Japan, (d) Switzerland and (e) Taiwan. . . 121 6.10 λ plot obtained using QGWO based declustering model: It represents the
number of true events, AFs and BGs occurred in a year for (a) Himalaya, (b) Indonesia, (c) Japan, (d) Switzerland and (e) Taiwan. . . 122 6.11 Cumulative plot using QGWO based declustering model: It represents the
cumulative sum of true events, AFs and BGs w.r.t. time using (a) Himalaya, (b) Indonesia, (c) Japan, (d) Switzerland and (e) Taiwan. . . 124 6.12 Variance plot and its comparative analysis for: (a) Himalaya, (b) Indonesia,
(c) Japan, (d) Switzerland and (e) Taiwan. . . 125 6.13 Comparison of convergence achieved by QGWO and PSO based declustering
model for: (a) Himalaya, (b) Indonesia, (c) Japan, (d) Switzerland and (e) Taiwan. . . 126 6.14 Comparison analysis: (a) convergence and (b) variance obtained from the
PSO [91], PGWO [103] and proposed QGWO optimization algorithm for benchmark Southern California earthquake catalog. . . 127 6.15 Comparative analysis of cumulative plot: cumulative sum of events obtained
from QGWO and variableε-DBSCAN for (a) Sumatra-Andaman and (b) Himalaya earthquake catalog used in [82]. . . 127 A.1 Distance calculation based on: (a) Laws of Haversines (Spherical Trigonom-
etry) (b) Laws of Cosines (Plane Trigonometry). . . 145 B.1 2D representation of benchmark functions used in the study. . . 149
List of Tables
2.1 Magnitude difference between mainshock and largest aftershock for New Zealand seismic region . . . 8 2.2 Earthquake sequences and correspondingcand pvalues . . . 10 2.3 Magnitude dependent space-time windows for aftershock identification ac-
cording to [13] . . . 12 2.4 Magnitude dependent space-time windows for aftershock identification ac-
cording to [14] . . . 12 2.5 Default input parameters and their range used in Reasenberg’s method . . . 13 2.6 Parameter setting to obtain catalogs used in the study . . . 31 3.1 Categorization of the shallow events . . . 41 3.2 Comparison of proposed model with benchmark declustering algorithms . . 58 3.3 Comparison of proposed model with Nanda et al. model [1] (same catalog:
Japan, Indonesia and California) . . . 60 4.1 Parameter list for temporal density estimation for each catalog using [83]. . 65 4.2 List of significant earthquakes occurred in California (1983-2004) . . . 71 4.3 List of significant earthquakes occurred in Himalaya (1965-2015) . . . 71 4.4 List of significant earthquakes occurred in Japan (1965-2015) . . . 73 4.5 List of significant earthquakes occurred in Sumatra-Andaman (1965-2015) . 73 4.6 Comparison of the proposed approach with benchmark declustering methods 81 5.1 Morisita slopeSmof Philippines . . . 96 5.2 Morisita slopeSmof Iran . . . 98 5.3 Comparison of proposed SNN-IBD with benchmark declustering methods . 99 6.1 Best mean fitness values over 100 trial runs of different algorithms. . . 110 6.2 Comparison of mean fitness obtained from proposed QGWO and other
algorithms . . . 111
6.3 Constraints used in proposed declustering model based on QGWO algorithm 115 6.4 Comparison of proposed method with recent declustering approaches on
same datasets. . . 128 6.5 Comparison of proposed model with benchmark declustering methods . . . 128 B.1 Information about the 2D benchmark test functions . . . 147 B.2 Information about of high-dimensional benchmark functions . . . 148
C
HAPTER1
Introduction
Earthquakes are considered as the most devastating natural calamities that arrive without a prior warning. Its after-effects, cause widespread damage and loss of human lives within a few seconds. The deadliest earthquakes followed by tsunamis occurred in past like Sumatra- Andaman (Indian Ocean) in 2004, Tohuku (Japan) in 2011, and there is a probability that more may occur in the future due to the reshaping of earth’s appearance as an active geological system. Nowadays, efforts have been put up to develop tools and techniques that determine the causes and consequences of these unpredictable natural disasters. This is going to help the governments to take precautionary measures in time and save human lives.
1.1 Background
The occurrence of earthquakes is considered as a complex phenomenon in the space-time domain. An in-depth analysis of earthquake occurrence reveals that they do not occur randomly in space and time. The spatial and temporal distributions of regional earthquakes indicate that they primarily concentrate on the tectonic plate boundaries (spatial clustering) and close in time to large events (temporal clustering). Generally, earthquake population is divided into two components: (1) Earthquakes which are independent of all previous earthquakes. They are called eitherBackgroundorSingleevent. Independent earthquakes occur due to the regular movement or loading of tectonic plates. These independent events are homogeneous in time, i.e., a stationary Poisson process. (2) Earthquake events which are dependent on each other. These type of earthquakes belong to a family. The earthquake with the largest magnitude in the family is calledmainshock. If there are several earthquakes with the largest magnitude within a family, the first one is considered to be the mainshock; hence, each family has a single mainshock. The events that occur after the mainshock are known as aftershocks/offsprings. All events that occur prior to the mainshock are termed asforeshocks.
Fig. 1.1. Earthquake multi-event classification: It highlights a single (empty circle) event and family of event which consists of foreshocks (empty squares), mainshock (dark circle), and aftershocks (dark squares).
The multiple events classification is shown in Fig.1.1. The foreshock-aftershock sequences are related to the mainshock, which forms earthquake clusters. The family of offspring events is categorized as dependent earthquakes because they depend on their ancestor. The events that comprise of foreshock and aftershocks (offspring) are part of the triggering process due to the changes in static and dynamic stress. Thus these earthquakes are also called triggered or correlated events.
A real-life example of foreshock-mainshock-aftershocks occurrence is shown in Fig.1.2(a) before and after the deadliest Tohoku (Japan) earthquake on 11th March 2011. The second example (in Fig.1.2(b)) belongs to Gorkha (Nepal) earthquake which occurred on 25th April 2015. In both the cases, earthquake events are observed with respect to their occurrence time and magnitude to distinguish the forshock (few events before the occurrence of mainshock in red), mainshock (large event with black star) itself and aftershocks (successive events after the occurrence of mainshock with blue lines). However, these characteristics are highly dependent on the environment conditions, earthquake fault type and soil behavior etc.
1.2 Motivation
Modeling the behavior of earthquake occurrence remains a challenging task due to the inher- ent presence of clustering phenomenon in terms of foreshock, aftershock and a mainshock.
1.3 Objective of the Thesis 3
Date
9/3/2011 11/3/2011 13/3/2011 15/3/2011 17/3/2011 19/3/2011 21/3/2011 23/3/2011
Magnitude
5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
Foreshocks Aftershocks Mainshock
(a)
Date
21/4/2015 24/4/2015 27/4/2015 01/5/2015 04/5/2015 07/5/2015 10/5/2015
Magnitude
4 4.5 5 5.5 6 6.5 7 7.5 8
Foreshocks Aftershocks Mainshock
(b)
Fig. 1.2. Foreshock-mainshock-aftershock sequence: It represents the occurrence of fore- shock and aftershocks related to (a) Tohoku mainshock with 9.0 magnitude and (b) Gorkha (Nepal) mainshock with 7.8 Magnitude.
The correlation among the earthquakes and clustering (aftershock identification) in multidi- mensional space (location, time, magnitude and depth, etc.) needs to analyze carefully for any short-term forecast and hazard assessment of a seismic region. Accurate identification of foreshock-aftershock sequences and their removal from an earthquake catalog leads to seismicity declustering. It also represents the unbiased estimation of seismicity.
The clustered nature of earthquakes has motivated many researchers to investigate a deep understanding of the spatio-temporal organization of seismicity. It defines the rela- tion/condition by which events are related in space and time. This leads to the development of declustering models for effective separation of clustered and background seismicity. The random and independent nature of background seismicity is useful to evaluate the pre and post-seismic hazards from the earthquake. The short and long-term earthquake forecasting and prediction researches are feasible after the effective seismicity declustering.
1.3 Objective of the Thesis
The identification and separation of background (independent) and clustered events from an earthquake catalog is a very challenging task due to the overlapping and complex nature of the problem in multiple domains: time, location, magnitude, and depth, etc. The objective of the thesis is to analyze the occurrence of seismic activities in earthquake-prone regions by considering prominent features of the catalog. The thesis proposes a number of new models which determine the effective solutions of this problem. Scope of the thesis is shown in Fig.1.3 and main contributions are summarized as follows:
Fig. 1.3.Scope of the thesis.
• The work proposed by Nanda et al. [1] is extended by developing a tetra-stage clustering model. The model considers the depth information of a catalog along with time, location and magnitude information for analysis. It is based on K-means clustering approach. In the model, large events having shallow (less depth) focus are treated as cluster centroids. Then, the events get associated with the centroids using Euclidean distance. This leads to an effective cluster formulation.
• The density-based algorithms [2, 3] are implemented for declustering in the space- time-magnitude domain. They accurately determine the arbitrary shape and varying size of the clusters along with backgrounds.
• The Coefficient of Variation (COV) andm-Morisita index have been explored for the spatio-temporal analysis of seismic activities in a region. A selective space-time- magnitude window is adjusted to detect earthquake clusters and removal of them leads to background seismicity.
• A swarm intelligence based Quantum Grey Wolf Optimizer (QGWO) is developed to remove the clustered events in a seismic prone region covered by the grid cells.
This approach considers the Thirumalai-Mountain (TM) metric to study the ergodic properties of an earthquake catalog. It formulates the declustering phenomenon as an optimization problem.
1.4 Structure of the Thesis 5
Fig. 1.4.Structure of the thesis.
1.4 Structure of the Thesis
The thesis is organized into seven significant chapters as shown in Fig 1.4.Chapter onebegins with the introductory part of the thesis. It highlights the background, motivation, problem statement and organization of the thesis. Chapter twoprovides a detailed literature review about the fundamental laws and algorithms used in the thesis for clustering and classification of seismic activities. Chapter three, four, five, and sixare the backbone of the thesis which describes the contributed work with the development of new approaches to analyzing the long term seismic activities in terms of identification of clustering (foreshock-aftershocks) and regular patterns (independent backgrounds). The conclusion and future scope are given inChapter seven.
1.5 Conclusion
In this chapter, a blueprint of the thesis is presented. The phenomenon and complexity associated with the occurrence of earthquakes are discussed. Multiple event classification of earthquakes and their characteristics to discriminate them have been addressed. The problem of seismicity declustering and the effective solutions have been mentioned along with motivation, objectives/contribution, scope and structure of the thesis in a nutshell.
C
HAPTER2
Literature Review
This chapter reviews fundamental laws and relation for understanding the behavior of earthquake occurrence in a multi-dimensional domain. It helps to do the cluster analysis with respect to time, event position, magnitude, and depth. It also discusses the benchmark algorithms and methods used in the thesis work for comparative analysis. It also addresses the tools and techniques which are explored by the researchers to quantify the spatio-temporal clustering of an active seismic region.
2.1 Background
Firstly, a brief overview of the fundamental laws of aftershocks is described based on their spatial, temporal and amount of the energy (magnitude) released by them. It helps to define a condition or relation for modeling the aftershock activities in a seismic region. These laws are demonstrated on real-life data available from the Northern California Earthquake Data center [4] which belongs to Tohoku earthquake sequences by taking 100 days time interval as shown in Fig.2.1(a). A time series to quantify the presence of foreshock-mainshock-aftershock sequences due to the Tohoku earthquake is illustrated in Fig.1.2 (a) in Chapter one.
1. Bath law: It has been reported in [5] that the earthquake magnitude of the largest aftershock is about 1-2 units less than that of the mainshock. This is a purely empirical relation followed by shallow earthquakes. The difference of mainshock magnitude and largest aftershock for New Zealand (NZ) earthquake sequences is given in Table 2.1.
The occurrence of Tohoku earthquake and their associated events follows the Bath law as depicted in Fig.2.1(b) where it shows the difference between mainshock and largest aftershock about 1.2M in Richter scale.
Longitude
120 125 130 135 140 145 150
Latitude
20 25 30 35 40 45 50
(a)
Time (Year)
2011.15 2011.2 2011.25 2011.3 2011.35 2011.4 2011.45 2011.5 2011.55
Magnitude
4 5 6 7 8 9 10
=1.2M (Difference)
(b)
Fig. 2.1.Tohoku earthquake (2011) analysis: (a) Occurrence of earthquake epicenters and (b) observed Bath law.
2. Gutenberg-Richter (GR) law: The frequency of earthquake occurrence with respect to the magnitude is described by G-R relation [6]. It is a power law given by
logn(m) =a−b×m
n(m) =10a−b×m (2.1)
wheren(m)is the number of earthquake occurrence having magnitude≥m. Thea andbare constants for a particular sequence. The "a" indicates the total seismicity rate in a region. Theb-value is close to unity. These values are determined either from the least square method or maximum likelihood estimation. The frequency-magnitude histogram in a time interval of 100 days after the occurrence of the Tohoku earthquake is shown in Fig.2.2(a). It shows that moderate seismicity (intermediate magnitude) has the highest frequency of earthquake and then decreases as the magnitude increases.
Table 2.1 Magnitude difference between mainshock and largest aftershock for New Zealand seismic region
Sequence Magnitude
Mainshok Largest Aftershock Difference
Fiordland 1960 7.0 5.6 1.4
Westport 1962 5.9 5. 6 0.3
Gisborne 1966 6.2 5.0 1.2
Sedden 1966 6.1 4.1 2.0
Inangahua 1968 7.0 5. 9 1.1
2.1 Background 9
Magnitude
3 4 5 6 7 8 9 10
n(m)
0 100 200 300 400 500 600
(a)
Magnitude (m)
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7
log n(m)
1 2 3 4 5 6 7 8 9 10 11
(b)
Fig. 2.2.Tohoku earthquake (2011) analysis: (a) Frequency-Magnitude histogram and (b) Observed Gutenberg-Richter law.
Thus the Gutenberg-Richter law holds true as defined in Eq.(2.1). The parameters a=11.99 andb=1.46 are determined from maximum likelihood method.
3. Omori-Ustu law (Time domain):
Initially, Omori [7] reported an inverse power law to describe the aftershock frequency with time in the following form:
n(t) = K
t+c (2.2)
where n(t) is the aftershock decay rate from occurrence time (t) of the mainshock, K and care constants and their values depend on a given sequence. The ‘c’ value is usually less than a day typically in the range 0.5-20 Hr. This relation is reported for aftershocks generated from Nobi earthquake (1891, 8.0 M). Ustu [8] revisited the analytical relation described in Eq.(2.2) and modified it which provides a very close approximation to the data. It is given by
n(t) = K
(t+c)p (2.3)
This expression is known as “modified Omori formula” or “Omori-Utsu law” which has an additional exponent parameter p. This law is extensively used to analyze, model and forecast aftershock productivity in time domain. After applying the above relation to more than 200 aftershock sequences, pvalue (generally greater than 1) is found in the range 0.6-2.5 [9]. The typical value of ‘c’ and ‘p’ for regional sequences are
mentioned in Table 2.2. Reasenberg and Jones [10, 11] modulated the Omori-Utsu relation by considering magnitude distribution of aftershocks and described it, in the form of conditional intensity function. Mathematically, it is written as
λ(t,m) = K×s(m)
(t+c)p (2.4)
where s(m) is the magnitude probability density function as described in GR law.
The temporal decay of aftershock activity after the occurrence of Tohoku mainshock (shown in Fig.2.3(a) by blue dots and a corresponding fit is shown in red line as Omori-Ustu formula). The value of parameters defined in Eq.(2.3) are determined as K=921.8,p=1.251 andc=9.479×10−9. The cumulative number of earthquake events after the occurrence of mainshock in 100 days interval is shown in Fig.2.3(b) as a cumulative Omori-Ustu formula.
4. Ustu formula (spatial domain): Ustu [12] also suggested an empirical relation to describe the nature of aftershocks (in the spatial domain) according to earthquake magnitude. It is given by
logD=0.5×M−1.8 (2.5)
where D is the linear dimension of aftershocks in Km following an earthquake of magnitude M. It has been observed that the areal extent of the largest aftershocks of the Inangahua earthquake (New Zealand) withM=7 cover an area 45×25 Km approximately. It is a good agreement to a relation defined in Eq.(2.5).
These fundamental laws are helpful to analyze the aftershock activities in the space-time- magnitude domain. These laws are also used extensively by many of the researchers to do the in-depth analysis of seismic activities which are applicable to forecast, hazard evaluation, development of declustering models, etc. for an earthquake catalog.
Table 2.2 Earthquake sequences and correspondingcand pvalues Earthquake c(days) p
Fiordland, 1960 0.10 1.12 Gisborne, 1966 0.22 1.47
2.2 Benchmark declustering algorithms 11
t ( days after the Mainshock)
0 10 20 30 40 50 60 70 80 90 100
n(t)
0 100 200 300 400 500 600 700 800 900 1000
Oberved data Omori-Ustu Predicted
(a)
t (days after the Mainshock)
0 10 20 30 40 50 60 70 80 90 100
Cumulative no. of Events
500 1000 1500 2000 2500 3000 3500
∆ t=10 d
∆ t=10 d
(b)
Fig. 2.3. Tohoku earthquake (2011) analysis: (a) Observed Omori-Ustu law and its (b) cumulative form.
2.2 Benchmark declustering algorithms
This section discusses benchmark algorithms mostly used for declustering an earthquake catalog. It is narrated as follows:
2.2.1 Gardner and Knopoff’s declustering approach
Declustering based on windowing techniques in a spatio-temporal domain is the simplest approach which identifies the mainshocks, foreshock-aftershocks, and backgrounds present in an earthquake catalog. Initially, Gardner and Knopoff [13] described a space-time window by taking time intervalT(m)and distance intervalL(m)for identifying subsequent quakes for each earthquake having magnitudem. All events lie within the window are considered as aftershocks otherwise treated as backgrounds. They manually identified the aftershocks in a shortened sample of the catalog which corresponds to the magnitude dependent space-time window as given in Table 2.3. Gardner and Knopoff [14] again revisited the concept of space-time window [L(m),T(m)] by providing the specific length (Km) and duration (days) of the window as a function of magnitudem(Table 2.4) to identify the aftershocks. They conducted a chi-square test on the aftershock-depleted catalog for Southern California and shown that after removal of aftershocks from the catalog, residual catalog are Poissonian in time. An approximation of GK window [L(m),T(m)] [14] with optimized parameters for Southern California is given by:
Table 2.3 Magnitude dependent space- time windows for aftershock identifica- tion according to [13]
M L(Km) T(days)
≥4.99 20 100
5.0-5.49 40 150
5.5-5.99 70 200
6.0-6.49 100 280
6.5-6.99 180 400
7.0-7.49 300 650
7.5-7.99 400 1000 8.0-8.49 700 1000 8.5-8.99 900 1000
Table 2.4 Magnitude dependent space- time windows for aftershock identifica- tion according to [14]
M L(Km) T(days)
2.5 19.5 6
3.0 22.5 11.5
3.5 26 22
4.0 30 42
4.5 35 83
5.0 40 155
5.5 47 290
6.0 54 510
6.5 61 790
7.0 70 915
7.5 81 960
8.0 94.0 985
L(m) =100.1238×m+0.983 and T(m) =
100.032×m+2.7389, ifm≥6.5 100.5409×m−0.547, otherwise
(2.6)
An alternative choice of window with different parameter settings are reported by Gru- enthal given at Community Online Resource for Statistical Seismicity Analysis (CORSSA [15]) and Uhrhammer [16] with Eq. (2.7) and Eq. (2.8) respectively.
L(m) =e1.77+(0.037+1.02×m)2
and T(m) =
|e−3.95+(0.62+17.32×m)2|, ifm≥6.5 102.8+0.024×m, otherwise
(2.7)
L(m) =e−1.024+0.804×m and T(m) =e−2.87+1.235×m (2.8)
2.2.2 Reasenberg’s declustering approach
Reasenberg [17] proposed a method by linking earthquake events to make a pair (cluster) based on their spatial and temporal interaction zones. The boundary of a spatial zone is considered according to the stress distribution near the mainshock by taking the following threshold:
2.2 Benchmark declustering algorithms 13 Table 2.5 Default input parameters and their range used in Reasenberg’s method
Parameter Standard Min Max
τmin(days) 1 0.5 2.5
τmax(days) 10 3 15
p1 0.95 0.9 0.99
xk 0.5 0 1
Mminc 1.5 1.6 1.8
rf act 10 5 20
logd(Km) =0.4m0−1.943+k (2.9) wherekis ‘1’ for the distance to a largest earthquake or ‘0’ for the distance to the last one.
Whereas the temporal boundary of the zone is defined by Omori’s law. The Reasenberg method is applied to detect the foreshock-aftershocks present in Central California during 1969-1982 for events havingm≥4. The algorithm is demonstrated with the use of standard input parameters and their range are mentioned in Table 2.5. The parameterτmin andτmax represent the minimum and maximum look- ahead time of observing the next earthquake at a certain probability p1. The look-ahead timeτ in the algorithm is determined as:
τ= −ln(1−p1)×t
10[2×(∆m−1)/3] (2.10)
where Omori rate decay exponent assume to be 1 and∆m=mmainshock−mcmin. Themcmin denotes the minimum magnitude cutoff for the earthquake catalog. Thexkis a factor by which mcmin value can be increased during the cluster formation andmcmin=mcmin+xk×ml, where ml is the magnitude of the largest earthquake in a cluster. The parameterrf act represents number of crack radii [18] surrounding each earthquake within new event, considered to be a part of the cluster.
2.2.3 Stochastic declustering algorithm
Instead of using a deterministic approach (like earlier two), Zhuang et al. [19] proposed a probabilistic approach, called “Stochastic Declustering”. This approach is based on Epidemic Type Aftershock Sequence (ETAS) clustering model proposed by Ogata et al. [20, 21] and it is represented by a conditional intensity function as
λ(t,x,y) =µ(x,y) +
∑
i:ti<t
k(mi)g(t−ti)f(x−xi,y−yi|mi) (2.11)
where µ(x,y)is a time independent background intensity function. Theg(t)and f(x,y|mi) are probability density function of the occurrence time and location respectively. The k(mi)denotes expected number of offspring (aftershocks) generated from a mainshock of magnitudemk.
Stochastic declustering method uses a thinning procedure to assign a probability for choosing an event either as a background or a triggered. Initially, events are ordered in a chronological way from 1 toN. Then, probability of an event jbeing triggered byithevent is estimated by observing the relative contribution of theith event to the occurrence rate at the time and location of event j.
ρi j= k(mi)g(tj−ti)f(xj−xi,yi−yj|mi)
λ(tj,xj,yj) (2.12)
Similarly, the probability to define an event jis a background or a triggered event is given by
ηj= µ(xj,yj)
λ(tj,xj,yj) and ρj=1−ηj=
j−1
∑
i=1
ρi j (2.13)
2.2.4 Model independent stochastic declustering (MISD)
Marsan and Lengline [22] developed a robust stochastic declustering algorithm which is applicable to any other seismicity model (other than ETAS) and hence termed as Model Independent Stochastic Declustering (MISD). According to this approach, an earthquakee of magnitudeme lies in the interval[mi,mi+1]and occurred at timete, triggers aftershocks at locationxand timet>tehaving conditional intensity function as
λe(x,t) =
∑
j
∑
k
λi,j,kθ(tj≤t−te<tj+1)θ(rk≤re(x)<rk+1) (2.14) where indexi, j, andkrepresents magnitude, time and distance respectively for unknownλi jk. θ(X) =1 ifX is true, ‘0’ otherwise. The[tj,tj+1]and[rk,rk+1]represents discrete intervals in time and distance. There(x)is the 2D or 3D distance between triggered earthquake and event location atx. In the beginning, MISD algorithm considers intervals in magnitude, time and distance. Then Expectation-Maximization (EM) algorithm is applied to determine the bestλi jk. The EM algorithm is based on iterative count of probabilitiesρabthat earthquakea triggered by earthquakeb, andφbthatbis a background earthquake.
In this approach, the final solution does not depend on initial choice ofλi jkandµ. Here, expectation and maximization operation is iterated until convergence ofλi jk andµ. At the
2.3K-means clustering based approaches 15 last, declustered catalog is obtained fromφbwhere earthquakebis considered background if its value is less than uniformly generated random number.
2.3 K -means clustering based approaches
K-means algorithm [23] is the most straightforward approach to do cluster analysis of any unlabeled dataset. This algorithm is highly popular because of its easier implementation and has the capability to handle large datasets with quick convergence in most practical cases.
k-means algorithm partitions a set into K clusters; C={ck,k=1,2,3. . . .} of dataset X ={xi},i=1,2,3. . . .nin such a way that the squared error between the empirical mean of a cluster and the points in cluster is minimized. Here, information ofK is given by the user.
Weatherill and Burton [24] described a K-means clustering approach to partition an earthquake catalog of Aegean. They first described the point-sourceK-means approach and then introduced a line source development of the algorithm which is suitable for (the cluster having an analogous shape) known active faults present in the region. Krzanowski and Lai (KL) index and Monte Carlo seismic hazard analysis is used to determine the optimum number of clusters or seismic source zones. Rehman et al. [25] is also reported a similar kind of approach to determine the potential seismogenic sources present in Pakistan. The model is highly useful to determine earthquake hazards and to mitigate high earthquake risk around the country. The optimum number of clusterKis determined in the model by using KL (Krzanowski and Lai) index as follows:
KL(K) =
DIFF(K) DIFF(K+1)
(2.15)
whereDIFF(K) = (K−1)2/dW KK−1−K2/dW KK;d is the number of dimension of data, W KK−1 andW Kk is sum of square distances within the cluster of (K−1) and K partition.
Based on this index,K=19 is determined as an appropriate choice for the number of clusters.
Morales-Esteban et al. [26] proposed a pattern recognition scheme based onK-means clustering algorithm to forecast earthquakes with magnitude greater or equal to 4.5. In this method, each earthquake event is represented by three features: seismic magnitude,b-value and the date of occurrence of an earthquake. The Silhouette index (SI) is used as a cluster validity index in this method and obtained the optimum 27 seismic zones for Spain and Portugal. The results are demonstrated on Spanish seismic temporal data with a sensitivity and a specificity which are 90.00 % and 82.56% in an area no: 26 and 79.31% and 90.38%
in area no.27.
Morales-Esteban et al. [27] again introduced an adaptive MahalanobisK-means algorithm to determine a globally optimal partition of a seismic region with an optimum number of clusters. In this algorithm, an adaptive Mahalanobis distance function is used to discover not only circular but also elliptical shapes clusters which effectively fitted to the earthquake faults present in a region. This algorithm is successfully applied to construct the seismic zoning map of Croatia and Iberian Peninsula. The authors determined the most appropriate number of cluster based on the four validity indexes: Silhouette Width Criterion (SWC), Davies–Bouldin index (VDB), Calinski–Harabasz index (VCH), and Area index.
Nanda et al. [1] considered a K-means clustering approach by developing a tri-stage model for cluster analysis of historical seismic catalogs of Indonesia, Japan, and California.
The model groups earthquake events according to their coordinate, occurrence time and magnitude in the three different stages and finally classified them either as background or foreshock-aftershocks. In this method, they manually determined the choice of a fixed number of cluster centroids based on the magnitude information present in the catalog.
2.4 Fuzzy based clustering approaches
Fuzzy based clustering approaches [28] are mostly suitable for overlapping datasets where points may belong to more than one cluster based on a membership grade. It indicates the degree by which points belong to different clusters. In Fuzzy based approaches, boundaries defined between different classes are not crisp, whereasK-means based approaches partition the data points into crisp clusters.
Ansari et al. [29] reported an unsupervised fuzzy clustering algorithm to determine seismic zones in Iran. They used the Gath and Giva [30] fuzzy clustering algorithm to carry out the fuzzy classification without a priori assumptions on a number of clusters in the catalog.
The evaluation of cluster validity by an algorithm is measured using fuzzy hyper-volume and particle density index. This algorithm is derived from a combination of fuzzyK-mean algorithm and the fuzzy maximum likelihood estimation (FMLE).
Wang et al. [31] proposed a method based on a fuzzy clustering with principal component analysis. This method determines the number of earthquake fault segments in a region. The method incorporates Gustafson and Kessel fuzzy clustering method (GK-FCM) with an adaptive distance norm which can detect clusters of different geometrical shapes from the given data set. The cluster validity index: Partition index, Separation index, and Xie and Beni’s index are used to find the best partition of data. This approach is demonstrated on a synthetic dataset and then applied to the Landers earthquake (1992) sequences occurred at Southern California. The method found optimum 12 number of faults which is matched
2.5 Density based clustering approaches 17 that shown by surface mapping. Similarly, Gvishiani et al. [32] introduced a fuzzy based Discrete Perfect Sets (DPS) algorithm to recognize the earthquake-prone areas of California.
Here, the location of epicenter clusters recognized by the DPS algorithm is compared with the location recognized for Earthquake-Prone Areas (EPA) [33] method. This method uses only information of earthquake coordinates instead of any geophysical, geomorphological and geological attributes associated with a region.
2.5 Density based clustering approaches
Density-based clustering (DBC) algorithms categorize a dataset into meaningful sub-classes based on the crowdedness of points in large spatial databases. They consider clusters as high-density areas in the database separated by regions of low density. Among others, these algorithms are popular due to their potential for detecting clusters of arbitrary shape and ability to handle outliers present in a dataset. Further, these algorithms do not need a priori information about the number of clusters or their shapes as input parameters. The number of clusters evolves with the progress of the algorithm. In this thesis work, two most prominent algorithms are utilized and they are discussed below:
2.5.1 Density-based spatial clustering of applications with noise (DB- SCAN)
Density-based spatial clustering of applications with noise (DBSCAN) is proposed by Ester et al.[2] which utilizes a minimum number of input parameters to cluster large spatial databases effectively. It is a fundamental density-based approach for cluster analysis which does not require prior information about the number of clusters. It determines the number of clusters automatically, based on the estimated density with the progress of the algorithm. This uses a predefined neighborhood radius (ε) to estimate the density and discover clusters having a significant number of data points (MinPts). The algorithm is described with the help of some fundamental definitions, outlined as:
Definition 1: ε-distance neighborhood of a point – Theε neighborhood of a point pin the datasetEN×Dis defined as
Nε(p) ={q∈P:d(p,q)≤ε} (2.16) whered(.)represents Euclidean distance between two points.
Definition 2: Direct density-reachable– A pointqis said to be directly density reachable from point pafter considerationε andMinPtsif
q∈Nε(p) (2.17) and
|Nε(p)| ≥MinPts (2.18)
where|Nε(p)|represents number of data points inNε(p).
Here, there are two types of points lying in a cluster, points inside the cluster (core points) and points on the border of the cluster (border points). The density reachable is symmetric when it involves two core points (points p andq). But in case of one core point and one border point it becomes non-symmetric (border pointqis directly reachable from pbut not vice-versa).
Definition 3Density-reachable – A pointqis density-reachable from a point pif there is a sequence of data points q1,q2, ...,qj with q1= p and qj=q such that qj+1 is directly density-reachable fromqj.
Definition 4Density-connected – Two points pandqare density-connected with respect to ε and MinPtsif there exists a point xsuch that both pandqare density-reachable fromx after considerationε andMinPts.
Definition 5Cluster – A clusterC∈EN×Dis a non-empty set satisfying the condition of maximality and connectivity defined as follows:
• Maximality: For all p,q∈N×D,if p∈Candqare density-reachable from pwith respect toεandMinPtsthenq∈C.
• Connectivity: For all p,q∈C, pandqare density connected whenε andMinPtsare considered.
Definition 6Outliers – LetC1, ..Ck be clusters ofEN×D, then Outliers are defined as a set of points inEN×Dwhich do not belong to any clusterCiwith respect toε andMinPts.
These definitions allow the DBSCAN algorithm to discover clusters of arbitrary shape in the presence of outliers. Nanda and Panda [34] reduced the computational complexity associated with the DBSCAN by efficiently implementing new merging criteria at the initial stage of evolution of clusters. The algorithm is then applied to detect the hazardous regions present in the earthquake catalog of Japan.
Georgoulas et al. [35] proposed a seismic-mass density-based clustering (SM-DBSCAN) scheme to determine the potential seismic sources in the vicinity of the Hellenic arc. The algorithm considers a weight to each data point by its earthquake magnitude (seismic mass) in the estimation of density. Space and time information is incorporated in the algorithm to discover the clusters that come from the same fault but at different time instances.
2.5 Density based clustering approaches 19 Recently, Scitovski introduced a modified implementation of the DBSCAN algorithm called ‘Rough DBSCAN’ [36] to determine non-convex shapes of seismic zones present in an area of the Republic of Croatia. The author also paid attention to determine the ε parameter which can influence the number and configuration of earthquake zones.
2.5.2 Clustering by fast search and finding of density peaks (CFSFDP)
Clustering by fast search and finding of density peaks also known as density peak clustering (DPC) is reported in [3]. It is based on the assumption that cluster centers are surrounded by neighbors with less local density and they are at a relatively large distance from the points with a higher local density. By this assumption, the algorithm primarily determines the cluster centers accurately; then remaining points are assigned to the same clusters as its nearest neighbor with higher density. Further, the advantage is, the algorithm finds out cluster number intuitively and able to detect clusters regardless of their shapes and the dimension of given data. The algorithm also determines the membership of a point to a cluster by considering not only connectivity but also the separation of points. Therefore its performance is robust for the radius of a neighborhood, compared to DBSCAN [2]. The clusters obtained from DPC mainly depends on computation of two important quantities for eachithdata point:
one is its local densityρiand another is its distanceδifrom points of higher density.
DPC algorithm determines the local density (ρ) for each point present in the dataset EN×Din three different ways.
1.Counting method: Theρiofithpoint can be determined by counting the number of points with a hard threshold. It is defined as
ρi =∑
j
χ(d(xi,xj)−dc)where χ(x) =
1 if x<0 0 otherwise
(2.19)
It increases the counter by one wheneverρiis equal to the number of points that are closer thandctoithpoint. Thedcis a predefined cut-off distance parameter. The similarity measure d(xi,xj)between pointxiandxjis taken as Euclidean distance.
2. Kernel based method: In this approach, DPC adopts kernel density estimation technique.
Here, Gaussian function is utilized for kernel andρi(soft threshold) is determined as ρi=
∑
j
exp
−d(xi,xj)2 dc2
(2.20) In both approaches, choice ofdcis important to obtain the robust local densityρi. Generally, it is taken as the average number of neighbors around 1-2% of the total number of points in
the dataset.
3.K-nearest neighbor (kNN) based method:K-nearest neighbor (KNN) based approach given in [37, 38] for determining the local densityρ. It require ‘K’ as input to find the K-nearest neighbor instead of ‘dc’. The setting of ‘K’ is feasible compare to ‘dc’ to get the robust performance of the algorithm. Ifηk is theKth nearest neighbor ofxithen KNN ofxi is
κη(xi) ={ j ∈ EN×D|d(xi,xj)≤d(xi,ηk(xi))} (2.21) Then,ρiis defined as:
ρi=exp
−1 k
∑
xj
d(xi,xj)2
,∀xj∈κη(xi) (2.22) After obtaining the local densityρifrom any of the above methods,δiis easily computed by finding the minimum distance between pointxi and any other point with higher local density as follows:
δi=
minj d(xi,xj) ifρj>ρi
maxj d(xi,xj) otherwise
(2.23)
Based on these two quantities ρ, δ: a decision graph is drawn between ρ and δ to determine the cluster centroids. The points in a graph for which value of δ and ρ are comparatively high (i.e., points have density peak and large separation) are considered as cluster centroids in the DPC. After that, the remaining data points are assigned to the same cluster as its nearest neighbor of higher density.
Outliers present in a catalog are determined by finding a border region for each cluster in DPC (without using a noise-signal cutoff as in DBSCAN algorithm). It is also defined as the set of points assigned to that cluster (within a distancedc from data points belonging to other clusters. Then one finds the point of highest density within its border region for each cluster by defining its density asρb. The points of the cluster). whose density is higher than ρbare assigned as a part of the cluster, and the rest of the points assigned to the cluster are considered as outliers. An improved method for outlier detection in DPC [38] is presented in the following steps:
Step 1: If E is the whole dataset of sizeN, then the kNN distance δiK of a point iis determined as:
δiK=max
j∈κη