week 7 video 1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Week 7 Video 1 PowerPoint Presentation
Download Presentation
Week 7 Video 1

Loading in 2 Seconds...

play fullscreen
1 / 63

Week 7 Video 1 - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

Week 7 Video 1. Clustering. Clustering. A type of Structure Discovery algorithm. Clustering. You have a large number of data points You want to find what structure there is among the data points You don’t know anything a priori about the structure

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Week 7 Video 1' - harlow


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
week 7 video 1
Week 7 Video 1

Clustering

clustering
Clustering
  • A type of Structure Discovery algorithm
clustering1
Clustering
  • You have a large number of data points
  • You want to find what structure there is among the data points
  • You don’t know anything a priori about the structure
  • Clustering tries to find data points that “group together”
trivial example
Trivial Example
  • Let’s say your data has two variables
    • Probability the student knows the skill from BKT (Pknow)
    • Unitized Time
  • Note: clustering works for (and is effective in) large feature spaces
slide5

+3

0

-3

time

0 1

pknow

not the only clustering algorithm
Not the only clustering algorithm
  • Just the simplest
  • We’ll discuss fancier ones as the week goes on
how did we get these clusters
How did we get these clusters?
  • First we decided how many clusters we wanted, 5
    • How did we do that? More on this in the next lecture
  • We picked starting values for the “centroids” of the clusters…
    • Usually chosen randomly
    • Sometimes there are good reasons to start with specific initial values…
slide9

+3

0

-3

time

0 1

pknow

slide10
Then…
  • We classify every point as to which centroid it’s closest to
    • This defines the clusters
    • Typically visualized as a voronoi diagram
slide11

+3

0

-3

time

0 1

pknow

slide12
Then…
  • We re-fit the centroids as the center of the points in each cluster
slide13

+3

0

-3

time

0 1

pknow

slide14
Then…
  • Repeat the process until the centroids stop moving
  • “Convergence”
slide15

+3

0

-3

time

0 1

pknow

slide16

+3

0

-3

time

0 1

pknow

slide17

+3

0

-3

time

0 1

pknow

slide18

+3

0

-3

time

0 1

pknow

slide19

+3

0

-3

time

0 1

pknow

not very good clusters
Not very good clusters

+3

0

-3

time

0 1

pknow

what happens
What happens?
  • What happens if your starting points are in strange places?
  • Not trivial to avoid, considering the full span of possible data distributions
one solution
One Solution
  • Run several times, involving different starting points
  • cf. Conati& Amershi (2009)
exercises
Exercises
  • Take the following examples
  • (The slides will be available in course materials so you can work through them)
  • And execute k-means for them
  • Do this by hand…
  • Focus on getting the concept rather than the exact right answer…
  • (Solutions are by hand rather than actually using code, and are not guaranteed to be perfect)
exercise 7 1 1
Exercise 7-1-1

+3

0

-3

time

0 1

pknow

pause here with in video quiz
Pause Here with In-Video Quiz
  • Do this yourself if you want to
  • Only quiz option: go ahead
solution step 1
Solution Step 1

+3

0

-3

time

0 1

pknow

solution step 2
Solution Step 2

+3

0

-3

time

0 1

pknow

solution step 3
Solution Step 3

+3

0

-3

time

0 1

pknow

solution step 4
Solution Step 4

+3

0

-3

time

0 1

pknow

solution step 5
Solution Step 5

+3

0

-3

time

0 1

pknow

notes
Notes
  • K-Means did pretty reasonable here
exercise 7 1 2
Exercise 7-1-2

+3

0

-3

time

0 1

pknow

pause here with in video quiz1
Pause Here with In-Video Quiz
  • Do this yourself if you want to
  • Only quiz option: go ahead
solution step 11
Solution Step 1

+3

0

-3

time

0 1

pknow

solution step 21
Solution Step 2

+3

0

-3

time

0 1

pknow

solution step 31
Solution Step 3

+3

0

-3

time

0 1

pknow

solution step 41
Solution Step 4

+3

0

-3

time

0 1

pknow

solution step 51
Solution Step 5

+3

0

-3

time

0 1

pknow

notes1
Notes
  • The three clusters in the same data lump might move around for a little while
  • But really, what we have here is one cluster and two outliers…
  • k should be 3 rather than 5
    • See next lecture to learn more
exercise 7 1 3
Exercise 7-1-3

+3

0

-3

time

0 1

pknow

pause here with in video quiz2
Pause Here with In-Video Quiz
  • Do this yourself if you want to
  • Only quiz option: go ahead
solution
Solution

+3

0

-3

time

0 1

pknow

notes2
Notes
  • The bottom-right cluster is actually empty!
  • There was never a point where that centroid was actually closest to any point
exercise 7 1 4
Exercise 7-1-4

+3

0

-3

time

0 1

pknow

pause here with in video quiz3
Pause Here with In-Video Quiz
  • Do this yourself if you want to
  • Only quiz option: go ahead
solution step 12
Solution Step 1

+3

0

-3

time

0 1

pknow

solution step 22
Solution Step 2

+3

0

-3

time

0 1

pknow

solution step 32
Solution Step 3

+3

0

-3

time

0 1

pknow

solution step 42
Solution Step 4

+3

0

-3

time

0 1

pknow

solution step 52
Solution Step 5

+3

0

-3

time

0 1

pknow

solution step 6
Solution Step 6

+3

0

-3

time

0 1

pknow

solution step 7
Solution Step 7

+3

0

-3

time

0 1

pknow

approximate solution
Approximate Solution

+3

0

-3

time

0 1

pknow

notes3
Notes
  • Kind of a weird outcome
  • By unlucky initial positioning
    • One data lump at left became three clusters
    • Two clearly distinct data lumps at right became one cluster
exercise 7 1 5
Exercise 7-1-5

+3

0

-3

time

0 1

pknow

pause here with in video quiz4
Pause Here with In-Video Quiz
  • Do this yourself if you want to
  • Only quiz option: go ahead
exercise 7 1 51
Exercise 7-1-5

+3

0

-3

time

0 1

pknow

notes4
Notes
  • That actually kind of came out ok…
as you can see
As you can see
  • A lot depends on initial positioning
  • And on the number of clusters
  • How do you pick which final position and number of clusters to go with?
next lecture
Next lecture
  • Clustering – Validation and Selection of k