1 / 19

character recognition based on probability tree model

character recognition based on probability tree model. Presenter: Huang Kaizhu. Outline. Introduction How probability can be used in character recognition? What is probability tree model? Two improvement direction Integrate Prior knowledge Relax the tree structure into a hyper tree

rajah-stone
Download Presentation

character recognition based on probability tree model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. character recognition based onprobability tree model Presenter: Huang Kaizhu

  2. Outline • Introduction • How probability can be used in character recognition? • What is probability tree model? • Two improvement direction • Integrate Prior knowledge • Relax the tree structure into a hyper tree • Experiments in character recognition

  3. Disease Diagnosis problem • How a doctor get to know a patient have a cold? • A. The patient has a headache? • B. The patient has a sore throat? • C. The patient has a fever? • D. The patient can breathe well via his nose? • Now a patient has the following symtoms. A is no, B is yes, C is no, D is yes • What is the hidden principle of the doctor in making a judgment?

  4. Disease Diagnosis problem(cont) • A good doctor will get his answer by checking: P1= P(Cold=true,A=N, B=Y,C=N,D=Y) Vs P2= P(Cold=false,A=N, B=Y,C=N,D=Y) if P1>P2, the patient is judged to have a cold if P2>P1, the patient is judged to have no cold

  5. What is Probability Model Classifier? • A Probability model classifier is a kind of classifier based on the probability inductions. The focus is now changed into how to calculate: P(Cold=true,A=N, B=Y,C=N,D=Y) and P(Cold=false,A=N, B=Y,C=N,D=Y) Now a classification Problem is change into a distribution estimation problem

  6. Used in character recognition • How can the probability model used in character recognition? • (similar to the Disease Diagnosis Problem) • Find a probability distribution of the features for every type of character. P(‘a’, f1,f2,f3,…,fn), P(‘b’,f1,f2,f3,…,fn),…, P(‘z’,f1,f2,f3,…,fn) • Compute in what probability a unknown character belongs to each type of character. And classify this character into the class with the highest probability. For example: P(‘a’,fu1, fu2 ,… ,fun, )> P(C,fu1, fu2 ,… ,fun, ) , C=‘b’,’c’,…’z’ We judge the unknown character into ‘a’ How can we estimate the joint Probability P(C, f1,f2,f3,…,fn)? C=‘a’,’b’…,’z’

  7. Estimate the joint Probability • 1. Estimation based on direct counting P(Cold=true,A=N, B=Y,C=N,D=Y) =Num(Cold=true,A=N, B=Y,C=N,D=Y)/TotalNum; Impractical!! Reasons: Huge samples needed. if the num of features is n ,at least 2n samples are needed for binary features.. • 2. Estimation based on Dependence relationship between features

  8. Advantage • Joint Probability can be written into a product form. P(A,B,C,D) =P(C)P(A|C)P(D|C)P(B|C) • BY estimating each item of the above according to counting process,We can avoid the sample exploration problem Probability tree model is a kind of model based on the above principle

  9. Probability tree model • It assume that dependence relationship among features can be represented as a tree. • It seeks to find out a tree structure to represent the dependence relationship optimally and the probability can be written into:

  10. Algorithm 1.Obtaining P(vi ) and P(vi,vj) for each pair of (vi,vj) by accumulating process . Vi is the feature 2.Calculating the mutual entropy 3.Utilizing Maximum spanning tree algorithm to find the optimal tree structure,which the edge weight between two nodes vi,vj is I((vi,vj) This algorithm was proved to be optimal in [1]

  11. Two problems of tree model • Can’t process sparse data or missing data For example, if the samples are too sparse, maybe nose problem never happens in all the records of the patients with cold and nose problem happens 2 times in all the records of the patients without cold Thus no matter what symptom a patient has, a “cold=FALSE” judgment will be made since the P(cold=true,A,B,C,D =FALSE)= P( cold=true,D=false|C)*…=0 < P(cold=false,A,B,C,D =FALSE); • Can’t perform well in multi-dependence relationship

  12. 2 Our improvements • To problem1: Introduce prior knowledge to overcome it • So the example in last slide:

  13. Key point of Technique 1 • When a variable(feature) are always the same in one class, we replace its probability with a proportion of the variable probability in the whole database

  14. To Problem2: Introduce Large Node methods to overcome it CLT LNCLT

  15. Algorithm • 1. Find out the tree model • 2.Refine the tree model based on frequent itemset Basic idea: if two variable come out together with each other more frequently, more possible it will be combined into a large node

  16. Experiments1---Handwritten digit Lib Database setup: • 60000-digit training lib ,10000-digit test lib • Database is not sparse Purpose: evaluate the technique to problem 2 The digits recognized correctly by LNCLT are wrongly recognized into the right-bottom digits by CLT

  17. Experiments1---Printed character Lib Database setup: • 8270 training lib , • Database is sparse Purpose To evaluate the technique to Problem 1:sparse data Before introducing Prior knowledge: Recognition rate of training data: 86.9% After introducing Prior knowledge: Recognition rate of training data: 97.7%

  18. Demo

More Related