NDVI = NIR−RED
3.12 Partial Least Square Regression (PLSR)
PLS is a bilinear calibration method using data compression by reducing the large number of measured collinear spectral variables to a few non-correlated principal components (PCs). The PCs represents the relevant structural information, which is present in the reflectance measurements to predict the dependent variable (Hansen and Schjoerring 2003)
PLS regression uses component projection successively to find latent structures.
Visual inspection of score-plots and validation residual variance plots were used to find the optimal number of PCs, so that over-fitting was prevented. In most cases, this procedure can reduce the number of spectral variables to a few independent PCs. The final model predicting ŷi had the following form (Eq. 3.15):
ŷi= b0+b1t1i+b2t2i+…. +bntni……….. (3.15)
Where t1i to tni are the scores from principal component (PC) 1 to n for variable i. The scores were calculated on the basis of mean-centered data. By linear regression of t versus y in the calibration iteration process, the regression coefficient bn was obtained.
Due to the initial centering of y, the centered mean b0 was added in order to obtain yi. Validation of the models was performed by comparing differences in R2 and root mean square error (RMSE).
RMSE values were calculated according to Eq. (3.16):
RMSE = √∑ (𝒚̂𝒊−𝒚𝒊)𝟐
𝒏 𝒊=𝟏
𝒏 ……….(3.16)
60 Where ŷi and yi were the predicted and measured crop variables, respectively, and n the number of samples (n = 360). RMSE provides a direct estimate of the modelling Error expressed in original measurement units (Kvalheim, 1987).
The flexibility of the PLS-approach, its graphical orientation and its inherent ability to handle incomplete and noisy data with many variables (and observations) makes PLS a simple but powerful approach for the analysis of data of complicated problems (Wold et al., 2001).
It can also be extended in various directions as PLSR provides an approach to the quantitative modelling of often complicated relationships between predictors (X) and response (Y) with complex problems seldom more realistic than multiple linear regressions (MLR) including stepwise selection variants.
Hence it is found to be appropriate to determine combination of wavelengths to build a model to assess Arecanut crop water demand.
Partial Least Squares (PLS) regression is a multivariate analysis technique used in cases where there are a large number of independent variables or predictors and these independent variables are highly collinear (Wold et al. 2001). The PLS method reduces the entire reflectance spectra to a small number of relevant factors and regresses them to the dependent variable (Gomez et al. 2008).
A number of variants of PLS exist for estimating the factor and loading matrices for modelling. The most common of these are Non-linear Iterative Partial Least Squares (NIPALS) and Statistically Inspired Modification of PLS (SIMPLS) algorithms. This study employed Partial Least Squares regression for modelling the Arecanut crop water requirement from the Hyperion reflectance spectra. The regression was performed in MATLAB®.
Hence it is found to be appropriate to determine combination of wavelengths to build a model to assess Arecanut crop water demand. In the present study X and Y inputs for the PLSR model are obtained crop water requirement values and corresponding spectral signatures from Arecanut crop water requirement map and pre-processed Hyperion imagery pixels respectively.
61 3.13 Identification of significant wavelengths
The spectral response of functional groups or molecules is often dispersed over several adjacent wavelengths, leading to strong collinearity in some regions of the spectra, while other regions may be corrupted by noise, or in general, may contain irrelevant information (Gosselin et al. 2010). Hence it is necessary that the wavelengths that are relevant for modelling a particular property be identified. This can be carried out either through projection methods, variable selection or a combination of both.
3.13.1 VIP scores and β coefficients
The optionally modified output from the PLSR algorithm can be employed to purely identify a subset of important variables. The Variable Importance in PLS projections
(VIP) is such a measure to accumulate the importance of each variable j being reflected by w from each component. The VIP measure vj is computed as in equation Vj= √P ∑𝐴𝑎=1[(𝑞𝑎 2 𝑡𝑎′𝑡𝑎(w𝑎𝑗/||w𝑎||)2]/ ∑𝐴𝑎=1(𝑞𝑎 2 𝑡𝑎′𝑡𝑎) ………...(3.17) Where (waj/‖wa‖)2 represents the importance of the jth variable and the variance explained by each component is given by the expression qa2ta′ ta. The vj weights are a measure of the contribution of each variable according to the variance explained by each PLS component (Mehmood et al. 2012). Variable j can be eliminated if vj< u, for some user-defined threshold u ∈(0, ∞). It is generally accepted that a variable should be selected if vj> 1.
Another Variable selection method is to use the vector of regression coefficients (β) which is a single measure of association between each variable and the response. Even in this case, variables having small absolute value of regression coefficients can be eliminated (Mehmood et al. 2012).
The wavelengths that are significant for modelling the crop water requirement from the Hyperion reflectance spectra using PLS regression were identified by setting thresholds for both Variable Importance for Projection (VIP) and the PLS regression coefficients, β. This was implemented in MATLAB®.
In the following chapter the classification of stressed Arecanut crops from Hyperion/
field reflectance data is presented. Also the cause for a particular disorder called crown choke is discussed.
62
63