Loading in 5 sec....

Incrementally Learning Parameter of Stochastic CFG using Summary StatsPowerPoint Presentation

Incrementally Learning Parameter of Stochastic CFG using Summary Stats

Download Presentation

Incrementally Learning Parameter of Stochastic CFG using Summary Stats

Loading in 2 Seconds...

- 55 Views
- Uploaded on
- Presentation posted in: General

Incrementally Learning Parameter of Stochastic CFG using Summary Stats

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Incrementally Learning Parameter of Stochastic CFG using Summary Stats

Written by:Brent Heeringa

Tim Oates

- To learn the syntax of utterances
Approach:

- SCFG (Stochastic Context Free Grammar)
M=<V,E,R,S>

V-finite set of non-terminal

E-finite set of terminals

R-finite set of rules, each r has p(r).

Sum of p(r) of the same left-hand side = 1

S-start symbol

1)Expensive storage: need to store a corpus of complete sentences

2)Time-consuming: algorithms needs to repeat passes throughout all data

- Inducing context-free structure from corpus(sentences)
- Learning – the production(rules) probabilities

General method: Inside/Outside algorithm

Expectation-Maximization (EM)

Find expectation of rules

Maximize the likelihood given both expectation & corpus

Disadvantage of Inside/Outside algo.

Entire sentence corpus must be stored using some representation(eg. chart parse)

Expensive storage (unrealistic for human agent!)

- Use Unique Normal Form (UNF)
- Replace all terminal A-z to 2 new rules
- A->D p[A->D]=p[A->z]
- D-> z p[D->z]=1

- No two productions have the same right hand side

- Replace all terminal A-z to 2 new rules

- Use Histogram
- Each rule has 2 histograms (Hor, HLr)

- Hor -contructed when parsing sentences in O
- HLr- -will continue to be updated throughout learning process

- Why?!
- Recently used rules has more impact on histogram

- Relative entropy
- T decrease- increase prob of rules used
- (if s large, increase prob of rules used when parsing last sentence )

- T increase- decrease prob of rules used
(eg pt+1(r)=0.01* p t+1(r)

Inside/Outside

O(n3)

Good

3-5 iterations

Bad

Need to store complete sentence corpus

Proposed Algo

O(n3)

Bad

500-1000 iterations

Good

Memory requirements is constant!