• No results found

is given in Appendix A. This stage classifies the entire seismic region into distinct zones based on the spatial correlation between the centroid and remaining events. In each zone, cluster analysis in the time domain is achieved using a variable ‘ε’ density-based clustering algorithm. In the algorithm, neighborhood radiusε (time window) is varied to determine magnitude based temporal correlation among the events of each seismic zone. The variation of ‘ε’ is a time-dependent function of ‘magnitude’ of the event (empirical relation makes out higher magnitude leads to a larger value of ‘ε’ i.e. the number of days in time). This method is applied to analyze the seismic activities of Himalaya and Sumatra-Andaman region for the time interval between 1965 to 2015 (51 Years).

Here, this chapter introduces a two-stage density based earthquake clustering approach which considers the event’s magnitude, time and location to identify AFs clustering events and BGs present in the earthquake catalogs. In the first stage, events are categorized into different temporal classes by finding the density peak using Gaussian kernel density estimation in the time domain. Then, a decision graph-based approach (inspired by a literature ‘Clustering by fast search and find of density peaks’ [3] as described in Section 2.5.2 in Chapter 2) is used. It considers the event’s local density and their distance with a point having higher local density along with weighting strategy. The decision graph finds the cluster centroid effectively and then events are assigned to nearby the cluster centroid which determines irregular and compact space-time clusters (comprised of AFs) and BGs present in the catalog. Performance analysis of the proposed work is carried out on historical catalogs of California, Himalaya, Japan, Sumatra-Andaman with the use of epicenter plot, space-time plot, cumulative plot,λ plot and inverse TM metric plot.

4.2 Proposed spatio-temporal density peak clustering

Conventional clustering approaches do not directly support the spatial, temporal, and non- spatial attribute (magnitude) present in the earthquake catalog. Therefore, clustering of earthquake database by conventional algorithms is not meaningful due to the biased nature of events with dependency on location, occurrence time and event’s size (magnitude). Here, a two-stage clustering model is proposed which finds effective density peak in the space-time domain and thereafter clustering of events with the separation of AFs and BGs from the catalogs.

Let us consider an earthquake catalogEN×D

EN×D=

 e1 e2 e3 ... eN

=

t1 θ1 φ1 m1 t2 θ2 φ2 m2 . . . . tN θN φN mN

(4.1)

where each ith, i=1,2,3, ....N event is represented by its occurrence time (ti), events position (longitudeθi, and latitudeφi) and magnitude (mi) in Richter scale i.e. here dimen- sion/feature vector D=4. The earthquake depth is not considered in the analysis. The similarity between events with respect to 2D coordinates (longitude;θ, latitude;φ) is cal- culated using Euclidean distance as given in (4.2) and time interval among the events is determined using (4.3).

dsiijj) = q

i−θj)2+ (φi−φj)2 (4.2) dt(ti,tj) =

q

(ti−tj)2 (4.3)

Initially, events present in the catalogEN×D are sorted in chronological order of their occurrence time. This makes an ordered time series, comprise of events having location and magnitude information. This time series needs to be accurately cut using an appropriate temporal threshold to classify the events into different classes. The objective here is to isolate the events so that they will not interfere with each other during the next phase of clustering in the spatial domain. It also avoids the merging of clusters due to the belongingness of events with the different classes. This can be achieved by using a Gaussian kernel density estimation on 1D temporal data with a suitable bandwidth. The temporal density (Kτ) is written as

Kτ(t) = 1 NhD

N n=1

(2π)−D/2exp

−(t−tn) 2h

2

(4.4) wheret represents time, N is the total number of events, D=1 is the dimension, and h is the bandwidth. Here, a method reported by Botev et al. [83] is used to estimate the density with reliable and extremely fast manner, by assuming Gaussian kernel, described in Eq.(4.4). The method automatically selects the optimized value of bandwidth hafter selecting the number of points p which is used for uniform discretization of the interval [min(t), max(t)]. The number of equally spaced points p (at which the density is to be estimated) is generally a power of 2. When p≥29, it is rounded up to a power of 2 during the calculations (as implemented very efficiently using the Fast Fourier Transform), and the

4.2 Proposed spatio-temporal density peak clustering 65 Table 4.1 Parameter list for temporal density estimation for each catalog using [83].

Catalog p Classes (k) Bandwidth (h)

California 128 33 0.3970

Himalaya 128 33 0.1170

Japan 128 30 0.0681

Sumatra-Andaman 128 30 0.0828

final result is interpolated by approximation. So it is appropriate to specify pas a power of two. Here, p=128 is chosen for all catalogs and obtained bandwidth for each catalog is mentioned in Table 4.1. The estimated temporal density for each catalog is shown in Fig.4.1. The selection of pis decided in such a way that events in each class should not be empty and sufficient enough to do the next stage of clustering on events of each class. It is also important because it directly influences the maximum time interval allowed between each class. From the Fig.4.1, observed density variation allows determining the position of local maxima (peak) and minima which is indicated by a red square and a green circle respectively. The peaks (red square) in the estimated density belong to each class whose boundary (extreme positions) is defined by the position of local minima. Breaking the series on points of lowest density in each class does not cut the aftershock tail to each other. Events of each class are represented by a different color in Fig.4.2 for the catalog of California, Himalaya, Japan, and Sumatra-Andaman. Mathematically, it is written as

EN×D=

iε τ1

E(i,:) +

iε τ2

E(i,:), ...+

iε τk

E(i,:) (4.5)

where

i

ε τ1

E(i,:) =Cτ1,

iε τ3

E(i,:) =Cτ2, ...,

iε τk

E(i,:) =Cτk (4.6) In the Eq.(4.6),Cτkrepresents the events ofktemporal class which hasτktime interval.

Now, events belong to each temporal class is processed independently for clustering with an approach which is inspired by an article “clustering by fast search and find of density peaks”. It is proposed by Rodriguez and Laio [3], which is based on the assumption that cluster centroids are surrounded by neighbors with less local density and they have a relatively large distance from the points with a higher local density. On the basis of this assumption, the algorithm primarily determines the centroids effectively; then the remaining points are assigned to the same clusters as its nearest neighbor is having a higher density. A further advantage is, the algorithm finds out the cluster number intuitively and able to detect clusters regardless of their shapes and dimension of given data. The algorithm also determines the membership of a point to a cluster by considering not only the connectivity but also the

Time (years)

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

Temporal density

0 0.1 0.2 0.3 0.4 0.5 0.6

Estimated temporal density Local and Global maxima Local and Global minima

(a)

Time (years)

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Temporal density

0 0.02 0.04 0.06 0.08 0.1 0.12

Estimated density Global and Local maxima Global and Local minima

(b)

Time (years)

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Temporal density

0 0.05 0.1 0.15 0.2 0.25

Estimated density Global and local maxima Global and local minima

(c)

Time (years)

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Temporal density

0 0.05 0.1 0.15 0.2 0.25 0.3

Estimated density Global maxima and Local minima Global minima and Local minima

(d)

Fig. 4.1.Estimated temporal density: The figure represents position of local maxima (square red) and minima (green circle) for catalog of: (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman.

separation of points. Therefore its performance is robust compared to ‘ε’ and ‘MinPts’

dependent density-based algorithms like DBSCAN.

The clustering procedure used here, depends on computation of two important quantities for eachith∈Cτk event: one is its local densityρiand another is its distanceδifrom events of higher density.

Local density for eachith∈Cτk event is determined based on the event’s 2D location and its magnitude given for the each temporal class. Here also, the Gaussian function is used as a kernel to determine density which is given by

ρi=

j

exp

−ds2(θ,φ) dc2

(4.7)

4.2 Proposed spatio-temporal density peak clustering 67

Time (years)

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

Temporal density

0 0.1 0.2 0.3 0.4 0.5 0.6

(a)

Time (years)

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Temporal density

0 0.02 0.04 0.06 0.08 0.1 0.12

(b)

Time (years)

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Temporal density

0 0.05 0.1 0.15 0.2 0.25

(c)

Time (years)

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Temporal density

0 0.05 0.1 0.15 0.2 0.25 0.3

(d)

Fig. 4.2. Determination of temporal classes: It represents the location of maxima (red square) belongs to each class whose duration decided by the two nearby local minima points (green circle) for (a) California (b) Himalaya (c) Japan and (d) Sumatra-Andaman.

The choice of cut-offdc is important to obtain the robust local densityρi. Generally, it is settled as the average number of neighbors around one to two percent of the total number of events in the class.

Local density of eachith∈Csevent is determined not only based on event’s neighborhood 2D locations but also impose a weight according to their size given by the magnitudemion each event. Hence, the weighted local densityρw is estimated by assigning a weight to each ithevent based on their normalized earthquake’s magnitude (mn(i)) as follows.

ρw(i) =ρ(i)×mn(i) (4.8) where

mn(i) = mi

max(M) (4.9)

Earthquake catalog < 𝜃, 𝜑, 𝑡, 𝑚, 𝑑 >

Estimate: Temporal Density estimation using Gaussian kernel 𝐾 (𝑡)

Parallel Selection: Select events 𝒆𝒊∈ 𝐶 class for 𝑠 = 1,2,3, … . 𝑞 during time interval T

Distance Metric: Calculate the Euclidean distance 2D (𝜽𝒊, 𝝋𝒋) from (4.2)

Decision Graph: Find the number of cluster 𝒌 and the Cluster centroids 𝑸 by observing the Decision Graph Delta Metric: Calculate 𝜹𝒊 for each 𝒊𝒕𝒉 point from (4.10)

Cluster Output: Get the cluster output for belongs to 𝐶 and remaining are considered as Outliers Assignment: Assign each point to nearest neighbour

centroid 𝑸𝒋

𝐶

𝐶

𝐶

𝐶

….. …..

Density Metric 𝝆𝒊 : Estimate the density 𝝆𝒊 based on kernel using (4.7) for

each 𝒊𝒕𝒉 point 𝒆𝒊

… ..

...

..

… … . . . . .

. . . . . . . . . . . . . . . . . . . . Non-Clustered Seismicity

(Backgrounds) Clustered seismicity

(Aftershocks)

Space-time density peak clustering

Weighted Density Metric 𝝆𝒘𝒊: Assign a weight based on the Magnitude to each ith point density 𝝆𝒊

𝐶 = (𝐴𝐹 ∪ 𝐴𝐹 ∪ 𝐴𝐹 … … 𝐴𝐹 ) Final Clustering output for each class

Determine: Local minima, local maxima and temporal threshold from 𝐾 (𝑡)

Fig. 4.3.Proposed two-stage clustering model for analysis of Earthquake catalogs.

ρw(i)is the weighted local density,ρ(i)is the true local density (unweighted) determine from Eq.(4.7). Themiis magnitude of earthquake forith event.

After calculating the weighted local density ρw for eachith event from Eq.(4.8),δiis easily computed by finding the minimum distance between events and any other event with

4.2 Proposed spatio-temporal density peak clustering 69

Data Points

0 1000 2000 3000 4000 5000 6000 7000 8000

estimated density ρ

200 400 600 800 1000 1200 1400 1600

(a)

ρ

0 500 1000 1500

δ

0 0.5 1 1.5 2 2.5 3 3.5 4

(b)

Data Points

0 1000 2000 3000 4000 5000 6000 7000 8000 Weighted estimated density ρw

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Hector Mine, 1999 Northridge, 1994 Landers,1992

San Simeon, 2003 North Palm Springs,1986

Coalinga, 1983

Superstition Hills, 1987

(c)

ρ

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

δ

0 0.5 1 1.5 2 2.5 3 3.5 4

Coalinga Earthquake, 1983

North Palm Springs Earthquake,1986

Northridge Earthquake,1994

San Simeon, 2003

Hector Mine Earthquake,1999 Superstition Hilles,1987

Landers Earthquake, 1992

(d)

-114 -116

Longitude -118 -120 32 -122 34 Latitude 36 2005

1980 1985 1990 1995 2000

38

Time (Years)

(e)

-114 -116

Longitude -118 -120 -122 32 34 Latitude 36 2000 2005

1990 1995

1985 1980 38

Time (years)

20 40 60 80 100 120 140

(f)

Fig. 4.4. Demonstration of proposed approach on California: (a) Unweighted local densityρ w.r.t. 2D locations (neglect size of an earthquake) and the corresponding (b) decision graph (c) weighted local densityρwand their corresponding (d) Weighted decision graph, (e) 3D distribution of seismic events occurred in California during 1983-2004. (f) 3D earthquake clusters obtained from the proposed approach.

higher weighted local density as follows:

δi=





minj dsijij) ifρw(j)>ρw(i) maxj dsijij) otherwise

(4.10)

Based on these two quantitiesρw,δ: a decision graph is drawn among them to deter- mine the cluster centroids. The events in the graph for which the value ofδ and ρw are comparatively high (i.e., events have density peak and large separation) are considered as cluster centroids. After that, each remaining event is assigned to the same cluster as its nearest neighbor of higher density. This process is applied to each of the temporal class in a concurrent manner, to determine the effective overall non-overlapping space-time clusters for each catalog used in the analysis. The flow chart of the proposed approach is shown in Fig.4.3.

Outliers present in the databases are determined by finding a border region for each cluster in the algorithm without using a noise-signal cutoff as in DBSCAN. It is defined as the set of events assigned to that cluster but being within a distancedcfrom events belonging to other clusters. Then one finds the point of highest density within its border region for each cluster by defining its density asρb. Points of the cluster whose density is higher thanρb are assigned as a part of the cluster, and the rest events (points) assigned to the cluster are considered as outliers.

The demonstration of decision graph (with weightedρw(i)and unweighted densityρ(i)) and analysis is carried out on the benchmark earthquake catalog of California. The catalog is downloaded from Northern California earthquake data center (description of catalog given in section 4.3). Here, events used in the analysis have a magnitude greater than 3.0 in the Richter scale. The 3D representation of events is shown in Fig.4.4(e) w.r.t. longitude, latitude and the occurrence year.

Initially, the unweighted local density ρ of events according to their 2D locations is determined using Eq.(4.7) by taking the value of dc=0.1 which is shown in Fig.4.4(a).

The large variation in density shows the existence of clusters in the spatial domain at the occurrence time of mainshock (large earthquake) in California. The earthquake clusters are not only localized in space-time domain but also have high magnitude with a mainshock.

Because in the earthquake catalog, the density of points either in space or in time also depends on the size of an event, i.e., magnitude. Therefore, the local density of points also considers the earthquake magnitude by computing the weighted ρw. From the Fig.4.4(b), ρw(i)is obtained from Eq.(4.8), and it shows the effective variation in density and finds the peak which is not seen in Fig.4.4(a). The envelope of computedρw(i)remains the same asρ(i), but variation in density is sharp enough to localized the events based on the magnitude as well. Theρw(i)helps to find out the significant earthquakes (have high density and intensity) shown by the arrow in Fig.4.4(b). The decision graph is also drawn after computingδ based on both ρw(i) and ρi. The comparative analysis (in Fig.4.4(c) and Fig.4.4(d) shows the information about the cluster centroids. The true cluster centroid is the significant large

4.2 Proposed spatio-temporal density peak clustering 71 Table 4.2 List of significant earthquakes occurred in California (1983-2004)

Date Earthquake’s location Latitude Longitude Magnitude Depth (0N) (0E) (Richter scale) (Km)

02-05-1983 Coalinga 36.2317 -120.312 6.7 9.58

08-07-1986 North Palm Springs 33.99 -116.608 6 9.47

24-11-1987 Superstition Hills 33.015 -115.852 6.6 11.18

28-06-1992 Landers 34.20 -116.827 6.3 214.50

17-01-1994 Northridge 34.213 -118.537 6.7 18.20

16-10-1999 Hector Mine 34.6033 -116.265 7.1 13.73

22-12-2003 San Simeon 35.7005 -121.1005 6.5 8.38

Table 4.3 List of significant earthquakes occurred in Himalaya (1965-2015)

Date location Latitude Longitude Magnitude Depth

(0N) (0E) (Richter) (Km)

14-03-1965 Hindu-Kush 36.300 70.700 7.5 219.00

11-08-1974 Markansu val. Tajikistan 39.457 73.830 7.3 09.00

29-05-1976 Longling, China 24.531 98.710 7.2 10.00

30-12-1983 Kunduz 36.372 70.738 7.2 214.50

23-08-1985 Wuqia County, China 39.431 75.224 7.3 6.80

06-08-1988 India-Myanmar border 25.149 95.127 7.2 90.50

19-10-1991 Chamoli 30.780 78.774 7.0 10.30

09-08-1993 Hindu-Kush 36.379 70.868 7.0 214.50

11-07-1995 Myanmar-China Border 21.966 99.196 7.1 12.50

08-11-1997 Manyi, Tibat 35.069 87.325 7.5 33.00

26-01-2001 Bhuj, Gujrat India 23.419 70.232 7.7 16.00

14-11-2001 Kunlun, China 35.946 90.541 7.8 10.00

03-03-2002 Hindu-Kush 36.502 70.482 7.4 225.6

08-10-2005 Kashmir 34.539 73.588 7.6 26.0

20-03-2008 Xinjiang-Xizang border 35.490 81.467 7.2 10.0

25-04-2015 Gorkha, Nepal 28.147 84.708 7.8 15.0

26-10-2015 Hindu-Kush 36.440 70.717 7.5 212.52

earthquake which has a much high magnitude as well as high local density, clearly indicated in Fig.4.4(d) after considering the weighted local density from Eq.(4.8).

In space-time clustering of an earthquake catalog, two or more clusters get overlapped due to their presence at the same location but occur in large time-interval and maybe vice- versa. Overlapping of clusters in the spatio-temporal domain gives the relatively low value ofδ either in time or in the spatial domain. In the proposed method, time-based separation among the clusters is done in the first stage and then applied the weight mechanism (decided by earthquake’s magnitude) to do the effective grouping of events in space (coordinates)

belonging to each temporal class. It determines the true cluster centroids in each class based on the decision graph and avoids the merging of two more clusters. This method gives the total 152 number of clusters in all 33 temporal class for California as shown in Fig.4.4(f).