• No results found

Proposed shared nearest neighborhood intensity based declustering algorithm

This algorithm SNN-IBD takes care of clustering patterns which are irregular, elongated in shape, having variable densities and occurring in a short time interval. This is most suitable for seismic patterns where the events are triggered spontaneously by the main-shocks, having moderate to a large magnitude. There are also background/independent patterns which occur randomly in time or space or in both domains. The flow chart of the proposed algorithm is shown in Fig.5.1, and the step-by-step procedure is outlined as follows:

1. Input: An earthquake catalog comprise of N samples and D features (here D=4 representing : time (t), longitudeθ, latitudeφ, magnitudeM) as follows:

EN×D=

e11 e12 e13 . . . e1D e21 e22 e23 . . . e2D . . . . eN1 eN2 eN3 . . . eND

(5.1)

2. Calculation of similarity between events: The algorithm determines the Euclidean distance between eventiand jin spaceds(.)and timedt(.)as follows:

dsiijj) = q

i−θj)2+ (φi−φj)2 (5.2) dt(ti,tj) =

q

(ti−tj)2 (5.3)

3. Set initial parameters: Initially algorithm sets value for three input parameters:

temporal neighborhood radiusεt, spatial neighborhood radiusεs and average seismic magnitude threshold (SM). These parameters help to determine the neighborhood points around each event and correlates them in space-time-magnitude domain for making a compact cluster. The initial values of these parameters are set randomly in range [0, 5]. Then, these parameters are updated iteratively usingCOV andm-Morisita index, defined in Section 5.2.1 and 5.2.2.

4. Determine shared nearest neighborhood core points: The algorithm declares any observable eventeas a core point, if it satisfies shared nearest neighborhood criteria:

α ∈E|(ds(e,α)≤εsanddt(e,α)≤εt) (5.4)

whereα is theε-shared nearest neighborhood events ofeand have space-time correla- tion withe.

5. Identify intensity based clusters: If the average magnitude (µM) of all obtained α from Eq.(5.4) are greater than or equal to a pre-defined threshold SM, then its neighborhood region is considered to be an intense zone. Then, a cluster is formed which represents the space-time-magnitude correlated events as aftershocks ‘AF’.

µM= Mα1+Mα2...+Mαp

p ; If µM ≥SM thenei∈AF (5.5) 6. Assign label to every event: Eventeand its neighbors are assigned to a new cluster and it is marked as visited; otherwise, this event is labeled as background ‘BG’. The step 4 is repeated until all the events are marked.

7. Updation of parameters and result validation: The selection of εts,SM cutoffs and validation of classified seismicity are carried out usingCOV andm-Morisita index.

The cutoff values are changed if theCOV and m-Morisita index do not reflect the optimum classification in terms of AFs and BGs. These two statistical parameters are presented in the following subsections.

5.2.1 Coefficient of Variation

Coefficient of Variation (COV) [50] is standard deviation normalized by mean of inter-event times of events

COV(T) = q

E[∆t2]−(E[∆t])2

E[∆t] (5.6)

where ∆t is the inter-event time between the successive pair of earthquake events for a specified durationT represented as

∆t=ti+1−ti, ∀i=1,2...N−1∈T (5.7) The inter-event distance∆rin Kilometers forithevent is

∆r= RE

cos(sinφisinφi+1+cosφicosφi+1cos(|θi+1−θi|)) (5.8) whereφ andθ represent latitude and longitude (in radians) respectively.RE is 6371Kms, an approximate radius of earth.

5.2 Proposed shared nearest neighborhood intensity based declustering algorithm 87

Fig. 5.1.ProposedCOV andm-MI based SNN-IBD model to classify the seismicity TheCOV discriminates seismic activities (in the time domain) into three categories as follows:

• IfCOV(T) =0, then∆t=constant and distribution governed by the events is said to be periodic.

• IfCOV(T)≈1, then the process is said to be Poisson distributed where∆thas expo- nential behavior.

• IfCOV(T)≥1, then the process follows a power law distribution with∆t growing withT.

Here, background events belong toCOV(T)≈1. TheCOV(T)value for background events obtained from the model is compared with threshold value 1 and the error obtained is used to fine-tune theεst andSM parameters.

5.2.2 m-Morisita Index

Them-Morisita index (m-MI) [56, 87] represents degree of the spatial clustering present in the data. It is computed on a grid ofQcell (quadrant) of changing size δ (i.e., the length of the diagonal), which is superimposed on the points of the studied dataset. Then, this index measures how many times it has to randomly selectm(m≥2) points which belong to the same cell, than in the case of a random distribution generated from a Poisson process.

Mathematically

Im,δ =Qm−1

Qi=1ni(ni−1). . . .(ni−m+1) N(N−1). . . .(N−m+1)

(5.9) whereQis the number of quadrants (cells) necessary to cover the study area,niis the number of points belong toith cell andN is the total number of points present in the data set. The multi-point refers to the value ofm(default value is 2). For a given value ofm, a plot can be drawn by computingIm,δ with a relatively higher cell size δ, which decreases until a minimum value is reached. The plot categorizes the events into three type of distributions :

• If the sampled distribution has been generated from a Poisson process,Im,δ fluctuates around 1.

• If the samples are dispersed (i.e. repel one another),Im,δ decreases to zero asδ is reduced.

• If the samples are clustered, then, at small scales, number of empty cells increases, increasing the value ofIm,δ.

Them-Morisita index is used to study the fractal patterns in the region of interest. The term multi-fractal (self-similar) is related to Rényi’s generalized dimensionsDq, forq=mwith the following power law:

Im,δ ∝δ(m−1)(Dm−E) (5.10)

and

lim

δ→∞

log(Im,δ)

logδ ≈ (m−1)(Dm−E)≈ −Sm (5.11)

whereDm ≈ E− Sm m−1

,m∈ {2,3,4,5} (5.12) Here,Eis the dimension of Euclidean space in which the earthquake events are located (here, E=2 corresponds to the coordinates in XY plane), Smis termed asm-Morisita slope and its dependence withmis referred as them-Morisita slope spectrum. The Smis estimated from the slope of linear regression (by fitting the data points of the plot relating log(Im,δ)