HUI GE College of Hydrology and Water Resources, Hohai University hui_ge@126

Download Presentation

HUI GE College of Hydrology and Water Resources, Hohai University hui_ge@126

Loading in 2 Seconds...

- 101 Views
- Uploaded on
- Presentation posted in: General

HUI GE College of Hydrology and Water Resources, Hohai University hui_ge@126

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Flood Classification Based on Improved Principal Component Analysis and Hierarchical Cluster Analysis

HUI GE

College of Hydrology and Water Resources,

Hohai University

hui_ge@126.com

www.themegallery.com

Introduction

Principal Component Analysis

Hierarchical Cluster Analysis

Case Study

Conclusion

- Flood classification is an optimization problem for recognizing the magnitude of flood intensity.
- Flood classification will not only affect the real-time reservoir operation, but also influence the flood hazard assessment.
- It plays an important role in establishing the effective rules of real-time reservoir operation. Therefore, flood classification is very important both in theory and in practice.

- Flood classification
- superflood
- great flood
- moderate flood
- minor flood

- Flood processes is subject to the comprehensive effect of many factors such as the weather changing, underlying surface, human activities etc. therefore, comprehensive multi-index classification methods are the major trend.
- IPCA-HCA model based on improved principal component analysis (IPCA) and hierarchical cluster analysis (HCA).

- A principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Its general objectives are (1) data reduction and (2) interpretation.
- The k principle components can then replace the initial p variables, and the original data set, consisting of n measurements on k principle components.

- Data processing

- Steps:
- Data processing
- Eigenvalue and eigenvector
- The number of principal components
- Weight of principal components
- Total component score

If there are p indexes and n flood processes, that matrix of flood samples is

Traditional PCA

Standardization

PCA

Improved PCA

Mean value

Original matrix comprises the information in two aspects:

variance —— the information of variation degree of indexes ;

correlation coefficient matrix—— the information of interaction degree between indexes.

Standardization makes variance of each index is 1, which eliminates the difference of indexes.

So, the dimensionless method of original matrix must be improved, mean value method is one of the better methods.

Where,

- Data processing —— Mean value

- Steps:
- Data processing
- Eigenvalue and eigenvector
- The number of principal components
- Weight of principal components
- Total component score

mean value processing matrix

The mean value processing does not change the correlation coefficient between indexes, all information of the correlation coefficient matrix is reflected in corresponding covariance matrixes.

the covariance matrix of is

mean value of each index in Y is 1,as a result

In particular,

That is, principal diagonal elements of the covariance matrix of mean value processing data are the squares of variation coefficients of indexes.

- Eigenvalue and eigenvector

- Steps:
- Data processing
- Eigenvalue and eigenvector
- The number of principal components
- Weight of principal components
- Total component score

Then eigenvalue and eigenvector of Up×p are calculated. If the eigenvalues are arranged in descending order, that is, and the corresponding eigenvector is , Consider the linear combinations,

Where, is respectively known as the first principal component, second principal component, …, and kth principal component. The first principal component is the linear combination with maximum variances.

- The number of principal components

- Steps:
- Data processing
- Eigenvalue and eigenvector
- The number of principal components
- Weight of principal components
- Total component score

The number of principal components k is determined by the accumulative percentage of explained variance E, namely the smallest k when

.

Then these k components can “replace” the original p variables without much loss of information.

- Weight of principal components

- Total component score

- Steps:
- Data processing
- Eigenvalue and eigenvector
- The number of principal components
- Weight of principal components
- Total component score

Sum of the component scores of k principal components ,

total component score is

The weight of each principal component is its variance contribution rate

The flood intensity can be evaluated according to the total score.

The weight is the proportion of total variance explained by kth principal component.

Squared Euclidean Distance

Principal Component

Ward method

Ward considered hierarchical clustering procedures based on minimizing the “loss of information” from joining two groups.

This method is usually implemented with loss of information taken to be an increase in an error sum of squares criterion.

The sum of the squared deviations of every item in the cluster is from the squared Euclidean distance of cluster mean.

Table 3 The result of flood classification

Table 1 Historical flood processes of Yichang station

Table 2 Variance explained

Note：Hm, Qm, W3d, W7d, W15d respectively represent the maximum flood level, Peak flow, 3-day, 7-day, 15-day flood volume.

Note：Prin1 and Prin2 respectively represent the first principal component and the second principal component.

Fig.1 scatter diagram of principal component score

2. Improve and apply

1.Better method

3. Recommendation

Flood samples has significant influence on classification. This paper made an preliminary analysis due to limited flood samples. With the optimization of samples, classification and description of flood types can be more accurate and effective.

Index of flood classification is also worthy to be further studied and discussed in the future.

Mean value method can overcome the disadvantages of traditional PCA, and effectively improved the dimensionless method.

The model we proposed is universal, and can be applied to a wide range of applications in other similar systems.

IPCA-HCA model has fully considered the multifactor influence on flood classification, is characterized by clear principle and simple calculation, and can yet be regarded as a new approach for flood classification.

Thank You !

Flood Classification Based on Improved Principal Component Analysis and Hierarchical Cluster Analysis

HUI GE

College of Hydrology and Water Resources,

Hohai University

hui_ge@126.com

www.themegallery.com