Constructing Fuzzy Signature Based on Medical Data

1 / 65

Constructing Fuzzy Signature Based on Medical Data - PowerPoint PPT Presentation

Constructing Fuzzy Signature Based on Medical Data. Student: Bai Qifeng Client: Prof. Tom Gedeon. Proposal. Explore an approach to automatic construct Fuzzy signature based on medical database It contains three questions:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Constructing Fuzzy Signature Based on Medical Data' - berit

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Constructing Fuzzy Signature Based on Medical Data

Student: Bai Qifeng

Client: Prof. Tom Gedeon

Proposal

Explore an approach to automatic construct

Fuzzy signature based on medical database

It contains three questions:

• How to identify SARS suspect patients group?
• How to explore the relationships among symptoms?
• How to construct fuzzy signature based on above analysis?
Fuzzy Logic Theory
• Fuzzy logic uses linguistic rules which reflect uncertainty or vagueness of concepts in natural in natural language.

If 50m/h is the boundary of “slow” and “fast” , Conventional bivalent sets regards 50.1m/h as fast.

What if current speed is 49.9m/h?

In real world, it should be a smooth shift.

Now, assume there are three temperatures

We can get the fuzzy sets:

A fuzzy set is a set whose elements have degrees of membership.

Slight

Moderate

Sever

Extreme

e

1

39.8

37.8

38.4

0.8

0.6

0.4

0.2

0

37.3

37.9

38.6

39.1

40

Fuzzy Set
Assume:

IF Fever = Slight THEN dose = Low.

IF Fever = Moderate THEN dose = Ave.

Fuzzy value of fever is slight = 0.29 and moderate = 0.71

Value of dose will share properties of both Low and Ave range.

IN

OUT

Why use fuzzy sets
Problem Definition
• A Major issue in fuzzy applications is how to create fuzzy rules
• the number of rules have an exponential increase with the number of inputs and terms.
• At least one activated rule for every input.

e.g. 5 terms, 2 inputs => 25 rules

5 terms, 5 inputs => 3,125 rules

Sketch of Solution
• Three possible solutions
• Decrease T :

Sparse Fuzzy System

• Decrease K:

Hierarchical Fuzzy System

• Decrease both simultaneously :

Sparse Hierarchical Fuzzy Rule Bases

Hierarchical Fuzzy Systems
• Hierarchical fuzzy systems reduce to the dimension of the sub-rule bases k by using meta – levels
Fuzzy Signatures
• Fuzzy signatures structure data into vectors of fuzzy values, each of which can be a further vector.
• Each signature corresponds to a nested vector structures or, equivalently, to a tree graph.
Fuzzy Signatures
• The relationship between higher and lower levels is govern by fuzzy aggregations.
• Fuzzy aggregation contains union, average, intersection etc.

Examples:

• Union: AUB = max [A, B] = A or B
• Intersection: A∩B = min [A,B] = A and B
Clustering
• The aim of cluster analysis is to classify objects based on similarities among them.
• Definition of cluster is a group of objects that are more similar to one another than to members of other clusters.
• Clustering is unsupervised classification: no predefined classes
Clustering: Similarity
• How to evaluate the similarities of data?
• Cluster analysis adapts the distance between two points as the criterion of similarity.
• Distance-type measure has Euclidean distance and City block distance.
Clustering: Fuzzy C-Means

Bezdek define objective function as :

represents the deviation of data with centre. The number m governs the influence of membership grades. uij represents the degree of membership of the data point xj belonging to v .

Clustering: Cluster Valid Index
• Xie and Beni Index
• The numerator calculates the compactness of data in the same cluster and the denominator computes the separateness of data in different clusters.
• Smaller value of numerator validity index indicates that the clusters are more compact and larger values of denominator denotes the clusters are well separated.
Factor Analysis
• Factor analyses are performed by examining the pattern of correlations between the observed measures.

. X is a vector of variables, where

is a vector of r<p latent variables called factors, is a (p*r) matrix of coefficients (loadings),

is a vector of random errors.

Factor Analysis: Principal component analysis
• Principal component analysis aims to reduce the dimension of variables and these new variables can interpret most of cases.
Factor Analysis: Principal component analysis

. x is the p dimensional variables, where U is an orthogonal matrix.

• The loading of matrix U and vector Z( ) , which correspond to the variance and vector of the principal components respectively.
• The value represents the contribution ratio which indicates how much percentage the principal component represents of the total tendency of the variables.
• Usually, an accumulative contribution ratio of 70 - 80 percent can effectively represent the major variations in the original data.
Factor Analysis: PCA vs FA
• Direction is reversed: the measured responses are based on the underlying factors while in PCA the principal components are based on the measured responses
Factor Analysis: Factor Rotation
• For identify some variables having similar factor loading, we could rotate the factor coordinates in any direction without changing the relative locations of the points to each other.
Experiment: Scatter of Raw Data
• Gravities of components are deviated by the noise or outliers.
Experiment: Scatter After Clustering

Collected data can represent the pattern of the disease more accurately.

Experiment: KMO and Bartlett’s Test
• KMO test indicates the possibility of containing underlying factors.
• KMO < .50, factor analysis is not useful.
• Bartlett's test indicate whether variables are unrelated.
• significance level < .05 significant relationships
Experiment: PCA Model

Accumulative contribution ratio = 63%

Experiment: PCA Model

It denotes that variables could be divided into 3 factors

Experiment: Constructed fuzzy signature
• Hierarchical clustering or K-means can be used to cluster each factor
• Weighted aggregation method in this fuzzy signature had higher performance
• 3 weights & 3 aggregations
Experiment: Possible rule bases

Aggregations:

• Min (fever, cough, chest)
• Min (dyspnes, lymphopenia)
• Max (Min (kinase, malaise), Min(aspartate, dehydrogenase) )

Rules

• If a patient has fever, cough and chest.
• If a patient has dyspnes and lymphopenia.
• If patient has kinase and malaise or has aspartate and dehydrogenase
Experiment: Possible rule bases

Further assumption:

• If a patient has fever, cough and chest, he/she would has 64% possibility to get SARS
• If he/she has kinase and malaise or has aspartate and dehydrogenase simultaneously, the possibility is increasing to 93%
• If he/she has dyspnes and lymphopenia, he/she can be diagnosed as a SARS Patient
Conclusion

• Fuzzy signatures are capable of improving the applicability of fuzzy systems.
• Fuzzy signatures have the ability to cope with complex structured data and interdependent features problems.
• With weighted aggregated, fuzzy signatures can assist experts to make decision by removing redundant information
Further Work
• Further research can be focused on evaluating underlying relationships between the structures of fuzzy signatures, aggregation functions and weights of each vector.

Thank you

---- Bai Qifeng

Appendix
• Demo of Fuzzy Control
• Sparse Fuzzy System
• Automatic Constructing Fuzzy Signature
• Fuzzy c-Means
Fuzzy Control

Fuzzy control is the most important current application in fuzzy theory.

Usually, three steps in Fuzzy control:

• Fuzzification
• Rule evaluation
• Defuzzification
Demo of Fuzzy Control
• The most common one is the centre of gravity
Demo of Fuzzy Control
• Use a procedure originated by Ebrahim Mamdani as demo.

The application is to balance a pole on a mobile platform that can move in only two direction, to the left or the right. The angle between the platform and the pendulum and the angular velocity of this angle are chosen as the inputs of the system. Output is corresponding to the speed of the platform.

Fuzzification
• First of all, the different levels of input and output are defined by specifying the membership functions for the fuzzy sets.
• For similarity, it is assumed that all membership functions are spread equally. Hence, this explains why no actual scale is included in the graphs
Input Angel

Input Angel Velocity

Output Speed

Fuzzification
Rule Evaluation
• The next step is to define the fuzzy rules. The fuzzy rules are a series of if-then statements.

For example:

If angle is zero and angular velocity is zero

then speed is also zero.

If angle is zero and angular velocity is low

then the speed shall be low.

Rule Evaluation
• The full set of rules are listed in table
Rule Evaluation
• Suppose an example has
• 0.75 and 0.25 for zero and positive low angles
• 0.4 and 0.6 for zero and negative low angular velocities.
Rule Evaluation
• Consider the rule

"if angle is zero and angular velocity is zero, the speed is zero".

Rule Evaluation
• Consider the rule

"if angle is zero and angular velocity is negative low, the speed is negative low".

Rule Evaluation
• Consider the rule

"if angle is positive low and angular velocity is zero, the speed is positive low".

Rule Evaluation
• The Results overlap and are reduced to the following figure
Defuzzification
• Defuzzification is used to choose an appropriate representative value as the final output.
• The most common one is the centre of gravity
Sparse Fuzzy Systems
• Sparse fuzzy systems can be used in situations where full knowledge of the problem domain is not available. Problem domain experts often work with only important fuzzy rules.
• Self learning algorithms to tune the parameters of a fuzzy system for accuracy improvement can also lead to sparse fuzzy systems.

In most cases, parameter tuning involves the reshaping of the fuzzy sets in the rule antecedents. It can happen that the shrinking of the fuzzy sets leads to gaps between neighboring fuzzy sets.

• Generating a sparse fuzzy system benefits from the reduced number of rules. (Chong 2004)
Sparse Fuzzy Systems
• Sparse system can reduce T. The essential idea is based on the omission of less important fuzzy rules to form sparse fuzzy systems.
• In sparse systems, it would be possible that inputs do not match any of the rule antecedents.
• Fuzzy rule interpolation is used to infer these rules for the inputs from existing fuzzy rules in the system.
Interpolation overview
• Tomato colours: back

IF colour = Red THEN it is Ripe

IF colour = Green THEN it is Unripe

• What about a yellow tomato?

Potential tomato colours:

Automatic Construct Fuzzy Signature
• Sub-Structure may be hidden in large data set.
• More separable the elements in subspace, the easier sub-rule base selection is.
• Finding suitable Π and Z0 affect each other.
Sugeno and Yasukawa Approach
• Sugeno and Yasukawa (1991) introduced a solution for sparse rule-base generation.
• It clusters output data sample and induces the rules by projecting clusters of output to input domains.
• Cons: it only produces necessary rules for the input-output sample data
Projection-based Fuzzy Rule Extraction
• Perform c-Means to cluster data along output space. The FS index of Fuzzy c-Means can be used to get a optimal number of clusters.
• For each fuzzy output cluster, all points contained in the cluster are projected back to input dimensions.
• The projected points in each dimension are clustered again. In this procedure, the FS index is used in conjunction with the merging index. This process will produce multiple fuzzy clusters in each dimension.
• Each of the clusters in the input dimension is a projection of the multi-dimensional input cluster to that input dimension. Then, the clusters from the individual dimensions are combined to form the multi-dimensional input cluster.
• For each of the multi-dimensional clusters identified, a rule can be created.
Fuzzy c-Means
• Let as a fuzzy partition C
Fuzzy c-Means
• Dunn defined a fuzzy objective function:

vi is cluster center of i set

• Bezdek extended it to:

represents the deviation of data with . The number m governs the influence of membership grades.

Fuzzy c-Means
• Limitation: it needs to know the number of clusters.
• How to find an optimal number of clusters.
• A cluster validity index proposed by Fukuyama and Sugeno (FS):
Finding Suitable Subspace
• Rules: age & experience to salary

Y

O

Age

M

l

G

Exp

M

L

A

A

H

L

A

H

L

H

Finding Suitable Subspace
• Rule in a tree (Age/Exp/Con)

Prune tree

Y

O

M

Age

G

Exp

L

A

H

L

A

H

H

Exp

l

G

M

Age

Y

O

M

L

H

H

H

L

A

A

L

A

Finding Suitable Subspace
• Rule in a tree (Exp/Age/Con) back

Prune rule tree

l

G

M

Exp

Age

O

M

L

H

L

A

A

Fuzzy Signatures in SARS Diagnosis
• The following scheme is of some daily symptom signatures of patients:
Fuzzy Signatures in SARS Diagnosis
• Two examples with linguistic values and fuzzy signatures.
Fuzzy Signatures in SARS Diagnosis
• An aggregation method can compare components regardless of the different numbers of sub-components.
Fuzzy Signatures in SARS Diagnosis
• Aggregation methods for different symptoms here are different with that of signatures of same symptoms.

Here, we define weights are:

fever = 1, Cough = 0.9, Nausea = 0.4, Sore = 0.25

back