cmsc 671 fall 2001
Download
Skip this Video
Download Presentation
CMSC 671 Fall 2001

Loading in 2 Seconds...

play fullscreen
1 / 24

CMSC 671 Fall 2001 - PowerPoint PPT Presentation


  • 273 Views
  • Uploaded on

CMSC 671 Fall 2001. Class #25-26 – Tuesday, November 27 / Thursday, November 29. Today’s class. Neural networks Bayesian learning. Machine Learning: Neural and Bayesian. Chapter 19. Some material adapted from lecture notes by Lise Getoor and Ron Parr. Neural function.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CMSC 671 Fall 2001' - dyani


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cmsc 671 fall 2001

CMSC 671Fall 2001

Class #25-26 – Tuesday, November 27 / Thursday, November 29

today s class
Today’s class
  • Neural networks
  • Bayesian learning
machine learning neural and bayesian

Machine Learning: Neural and Bayesian

Chapter 19

Some material adapted from lecture notes by Lise Getoor and Ron Parr

neural function
Neural function
  • Brain function (thought) occurs as the result of the firing of neurons
  • Neurons connect to each other through synapses, which propagate action potential (electrical impulses) by releasing neurotransmitters
  • Synapses can be excitatory (potential-increasing) or inhibitory (potential-decreasing), and have varying activation thresholds
  • Learning occurs as a result of the synapses’ plasticicity: They exhibit long-term changes in connection strength
  • There are about 1011 neurons and about 1014 synapses in the human brain
brain structure
Brain structure
  • Different areas of the brain have different functions
    • Some areas seem to have the same function in all humans (e.g., Broca’s region); the overall layout is generally consistent
    • Some areas are more plastic, and vary in their function; also, the lower-level structure and function vary greatly
  • We don’t know how different functions are “assigned” or acquired
    • Partly the result of the physical layout / connection to inputs (sensors) and outputs (effectors)
    • Partly the result of experience (learning)
  • We really don’t understand how this neural structure leads to what we perceive as “consciousness” or “thought”
  • Our neural networks are not nearly as complex or intricate as the actual brain structure
comparison of computing power
Comparison of computing power
  • Computers are way faster than neurons…
  • But there are a lot more neurons than we can reasonably model in modern digital computers, and they all fire in parallel
  • Neural networks are designed to be massively parallel
  • The brain is effectively a billion times faster
neural networks
Neural networks
  • Neural networks are made up of nodes or units, connected by links
  • Each link has an associated weight and activation level
  • Each node has an input function (typically summing over weighted inputs), an activation function, and an output
layered feed forward network
Layered feed-forward network

Output units

Hidden units

Input units

executing neural networks
“Executing” neural networks
  • Input units are set by some exterior function (think of these as sensors), which causes their output links to be activated at the specified level
  • Working forward through the network, the input function of each unit is applied to compute the input value
    • Usually this is just the weighted sum of the activation on the links feeding into this node
  • The activation function transforms this input function into a final value
    • Typically this is a nonlinear function, often a sigmoid function corresponding to the “threshold” of that node
learning neural networks
Learning neural networks
  • Backpropagation
  • Cascade correlation: adding hidden units

Take it away, Chih-Yun!

Next up: Sohel

learning bayesian networks

B

E

A

C

Learning Bayesian networks
  • Given training set
  • Find B that best matches D
    • model selection
    • parameter estimation

Inducer

Data D

parameter estimation
Parameter estimation
  • Assume known structure
  • Goal: estimate BN parameters q
    • entries in local probability models, P(X | Parents(X))
  • A parameterization q is good if it is likely to generate the observed data:
  • Maximum Likelihood Estimation (MLE) Principle: Choose q* so as to maximize L

i.i.d. samples

parameter estimation in bns
Parameter estimation in BNs
  • The likelihood decomposes according to the structure of the network

→ we get a separate estimation task for each parameter

  • The MLE (maximum likelihood estimate) solution:
    • for each value x of a node X
    • and each instantiation u of Parents(X)
    • Just need to collect the counts for every combination of parents and children observed in the data
    • MLE is equivalent to an assumption of a uniform prior over parameter values

sufficient statistics

sufficient statistics example
Sufficient statistics: Example

Moon-phase

  • Why are the counts sufficient?

Light-level

Earthquake

Burglary

Alarm

model selection
Model selection

Goal: Select the best network structure, given the data

Input:

  • Training data
  • Scoring function

Output:

  • A network that maximizes the score
structure selection scoring

Same key property: Decomposability

Score(structure) = Si Score(family of Xi)

Structure selection: Scoring
  • Bayesian: prior over parameters and structure
    • get balance between model complexity and fit to data as a byproduct
  • Score (G:D) = log P(G|D)  log [P(D|G) P(G)]
  • Marginal likelihood just comes from our parameter estimates
  • Prior on structure can be any measure we want; typically a function of the network complexity

Marginal likelihood

Prior

heuristic search

DeleteEA

AddEC

B

E

Δscore(C)

Δscore(A)

A

B

E

C

A

C

B

E

B

E

A

A

ReverseEA

Δscore(A)

C

C

Heuristic search
exploiting decomposability

DeleteEA

DeleteEA

AddEC

B

E

Δscore(C)

Δscore(A)

Δscore(A)

A

B

E

B

E

C

A

A

C

C

B

E

A

To recompute scores,

only need to re-score families

that changed in the last move

ReverseEA

Δscore(A)

C

Exploiting decomposability
variations on a theme
Variations on a theme
  • Known structure, fully observable: only need to do parameter estimation
  • Unknown structure, fully observable: do heuristic search through structure space, then parameter estimation
  • Known structure, missing values: use expectation maximization (EM) to estimate parameters
  • Known structure, hidden variables: apply adaptive probabilistic network (APN) techniques
  • Unknown structure, hidden variables: too hard to solve!
handling missing data
Handling missing data
  • Suppose that in some cases, we observe earthquake, alarm, light-level, and moon-phase, but not burglary
  • Should we throw that data away??
  • Idea: Guess the missing valuesbased on the other data

Moon-phase

Light-level

Earthquake

Burglary

Alarm

em expectation maximization
EM (expectation maximization)
  • Guess probabilities for nodes with missing values (e.g., based on other observations)
  • Compute the probability distribution over the missing values, given our guess
  • Update the probabilities based on the guessed values
  • Repeat until convergence
em example
EM example
  • Suppose we have observed Earthquake and Alarm but not Burglary for an observation on November 27
  • We estimate the CPTs based on the rest of the data
  • We then estimate P(Burglary) for November 27 from those CPTs
  • Now we recompute the CPTs as if that estimated value had been observed
  • Repeat until convergence!

Earthquake

Burglary

Alarm

ad