HUI GE College of Hydrology and Water Resources, Hohai University hui_ge@126

Flood Classification Based on Improved Principal Component Analysis and Hierarchical Cluster Analysis HUI GE College of Hydrology and Water Resources, Hohai University hui_ge@126.com www.themegallery.com

Contents Introduction Principal Component Analysis Hierarchical Cluster Analysis Case Study Conclusion

Introduction • Flood classification is an optimization problem for recognizing the magnitude of flood intensity. • Flood classification will not only affect the real-time reservoir operation, but also influence the flood hazard assessment. • It plays an important role in establishing the effective rules of real-time reservoir operation. Therefore, flood classification is very important both in theory and in practice. • Flood classification • superflood • great flood • moderate flood • minor flood • Flood processes is subject to the comprehensive effect of many factors such as the weather changing, underlying surface, human activities etc. therefore, comprehensive multi-index classification methods are the major trend. • IPCA-HCA model based on improved principal component analysis (IPCA) and hierarchical cluster analysis (HCA).

Principal Component Analysis • A principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Its general objectives are (1) data reduction and (2) interpretation. • The k principle components can then replace the initial p variables, and the original data set, consisting of n measurements on k principle components. • Data processing • Steps: • Data processing • Eigenvalue and eigenvector • The number of principal components • Weight of principal components • Total component score If there are p indexes and n flood processes, that matrix of flood samples is

Principal Component Analysis Traditional PCA Standardization PCA Improved PCA Mean value Original matrix comprises the information in two aspects: variance —— the information of variation degree of indexes ; correlation coefficient matrix—— the information of interaction degree between indexes. Standardization makes variance of each index is 1, which eliminates the difference of indexes. So, the dimensionless method of original matrix must be improved, mean value method is one of the better methods. Where,

Principal Component Analysis • Data processing —— Mean value • Steps: • Data processing • Eigenvalue and eigenvector • The number of principal components • Weight of principal components • Total component score mean value processing matrix The mean value processing does not change the correlation coefficient between indexes, all information of the correlation coefficient matrix is reflected in corresponding covariance matrixes. the covariance matrix of is mean value of each index in Y is 1,as a result In particular, That is, principal diagonal elements of the covariance matrix of mean value processing data are the squares of variation coefficients of indexes.

Principal Component Analysis • Eigenvalue and eigenvector • Steps: • Data processing • Eigenvalue and eigenvector • The number of principal components • Weight of principal components • Total component score Then eigenvalue and eigenvector of Up×p are calculated. If the eigenvalues are arranged in descending order, that is, and the corresponding eigenvector is , Consider the linear combinations, Where, is respectively known as the first principal component, second principal component, …, and kth principal component. The first principal component is the linear combination with maximum variances.

Principal Component Analysis • The number of principal components • Steps: • Data processing • Eigenvalue and eigenvector • The number of principal components • Weight of principal components • Total component score The number of principal components k is determined by the accumulative percentage of explained variance E, namely the smallest k when . Then these k components can “replace” the original p variables without much loss of information.

Principal Component Analysis • Weight of principal components • Total component score • Steps: • Data processing • Eigenvalue and eigenvector • The number of principal components • Weight of principal components • Total component score Sum of the component scores of k principal components , total component score is The weight of each principal component is its variance contribution rate The flood intensity can be evaluated according to the total score. The weight is the proportion of total variance explained by kth principal component.

Hierarchical Cluster Analysis Squared Euclidean Distance Principal Component Ward method Ward considered hierarchical clustering procedures based on minimizing the “loss of information” from joining two groups. This method is usually implemented with loss of information taken to be an increase in an error sum of squares criterion. The sum of the squared deviations of every item in the cluster is from the squared Euclidean distance of cluster mean.

Case Study Table 3 The result of flood classification Table 1 Historical flood processes of Yichang station Table 2 Variance explained Note：Hm, Qm, W3d, W7d, W15d respectively represent the maximum flood level, Peak flow, 3-day, 7-day, 15-day flood volume. Note：Prin1 and Prin2 respectively represent the first principal component and the second principal component. Fig.1 scatter diagram of principal component score

Conclusion 2. Improve and apply 1.Better method 3. Recommendation Flood samples has significant influence on classification. This paper made an preliminary analysis due to limited flood samples. With the optimization of samples, classification and description of flood types can be more accurate and effective. Index of flood classification is also worthy to be further studied and discussed in the future. Mean value method can overcome the disadvantages of traditional PCA, and effectively improved the dimensionless method. The model we proposed is universal, and can be applied to a wide range of applications in other similar systems. IPCA-HCA model has fully considered the multifactor influence on flood classification, is characterized by clear principle and simple calculation, and can yet be regarded as a new approach for flood classification.

Thank You ! Flood Classification Based on Improved Principal Component Analysis and Hierarchical Cluster Analysis HUI GE College of Hydrology and Water Resources, Hohai University hui_ge@126.com www.themegallery.com

HUI GE College of Hydrology and Water Resources, Hohai University hui_ge@126