slide1
Download
Skip this Video
Download Presentation
ADVENTURES IN DATA MINING

Loading in 2 Seconds...

play fullscreen
1 / 47

ADVENTURES IN DATA MINING - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

ADVENTURES IN DATA MINING. Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 [email protected] This material is based in part upon work supported by the National Science Foundation under Grant No. 9820841

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ADVENTURES IN DATA MINING' - makani


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

ADVENTURES IN DATA MINING

Margaret H. Dunham

Southern Methodist University

Dallas, Texas 75275

[email protected]

This material is based in part upon work supported by the National Science Foundation under Grant No. 9820841

Some slides used by permission from Dr Eamonn Keogh; University of California Riverside;[email protected]

slide2

The 2000 ozone hole over the antarctic seen by EPTOMS

http://jwocky.gsfc.nasa.gov/multi/multi.html#hole

data mining outline
Data Mining Outline
  • Introduction
  • Techniques
    • Classification
    • Clustering
    • Association Rules
  • Examples

Explore some interesting data mining applications

introduction
Introduction
  • Data is growing at a phenomenal rate
  • Users expect more sophisticated information
  • How?

UNCOVER HIDDEN INFORMATION

DATA MINING

but it isn t magic
But it isn’t Magic
  • You must know what you are looking for
  • You must know how to look for you

Suppose you knew that a specific cave had gold:

  • What would you look for?
  • How would you look for it?
  • Might need an expert miner
slide6

Description

Behavior

Associations

“If it looks like a duck, walks like a duck, and quacks like a duck, then it’s a duck.”

“If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then it’s a terrorist.”

Classification Clustering Link Analysis

(Profiling) (Similarity)

classification
CLASSIFICATION

Assign data into predefined groups or classes.

classification ex grading

x

<90

>=90

x

A

<80

>=80

x

B

<70

>=70

x

C

<50

>=60

D

F

Classification Ex: Grading
slide10

Katydids

Given a collection of annotated data. (in this case 5 instancesof Katydidsand five ofGrasshoppers), decide what type of insect the unlabeled example is.

Grasshoppers

(c) Eamonn Keogh, [email protected]

slide11

The classification problem can now be expressed as:

  • Given a training database predict the class label of a previously unseen instance

previously unseen instance =

(c) Eamonn Keogh, [email protected]

slide12

10

9

8

7

6

5

4

3

2

1

1

2

3

4

5

6

7

8

9

10

Antenna Length

Abdomen Length

Katydids

Grasshoppers

(c) Eamonn Keogh, [email protected]

slide14

1

0.5

0

50

100

150

200

250

300

350

400

450

0

Handwriting Recognition

(c) Eamonn Keogh, [email protected]

George Washington Manuscript

slide17

Dallas Morning News

October 7, 2005

clustering
CLUSTERING

Partition data into previously undefined groups.

two types of clustering
Two Types of Clustering

Partitional

Hierarchical

(c) Eamonn Keogh, [email protected]

hierarchical clustering example iris data set
Hierarchical Clustering ExampleIris Data Set

Versicolor

Setosa

Virginica

The data originally appeared in Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic Problems," Annals of Eugenics 7, 179-188.

Hierarchical Clustering Explorer Version 3.0, Human-Computer Interaction Lab, University of Maryland, http://www.cs.umd.edu/hcil/multi-cluster .

association rules link analysis
ASSOCIATION RULES/ LINK ANALYSIS

Find relationships between data

association rules examples
ASSOCIATION RULES EXAMPLES

People who buy diapers also buy beer

If gene A is highly expressed in this disease then gene A is also expressed

Relationships between people

Book Stores

Department Stores

Advertising

Product Placement

http://www.amazon.com/Data-Mining-Introductory-Advanced-Topics/dp/0130888923/ref=sr_1_1?ie=UTF8&s=books&qid=1235564485&sr=1-1

slide25

Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Prentice Hall, 2003.

DILBERT reprinted by permission of United Feature Syndicate, Inc.

data mining outline1
Data Mining Outline
  • Introduction
  • Techniques
  • Examples
    • Vision Mining
    • Law Enforcement (Cheating, Plagiarism, Fraud, Criminal Behavior,…)
    • Bioinformatics
vision mining
Vision Mining
  • License Plate Recognition
    • Red Light Cameras
    • Toll Booths
    • http://www.licenseplaterecognition.com/
  • Computer Vision
    • http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/vid/
slide28

How Stuff Works, “Facial Recognition,” http://computer.howstuffworks.com/facial-recognition1.htm

slide29

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

no little cheating
No/Little Cheating

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

rampant cheating
Rampant Cheating

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

slide32

Jialun Qin, Jennifer J. Xu, DaningHu, Marc Sageman and Hsinchun Chen, “Analyzing Terrorist Networks: A Case Study of the Global Salafi Jihad Network”  Lecture Notes in Computer Science, Publisher: Springer-Verlag GmbH, Volume 3495 / 2005 , p. 287.

slide33

http://www.time.com/time/magazine/article/0,9171,1541283,00.htmlhttp://www.time.com/time/magazine/article/0,9171,1541283,00.html

slide34
DNA

http://www.visionlearning.com/library/module_viewer.php?mid=63

Basic building blocks of organisms

Located in nucleus of cells

Composed of 4 nucleotides

Two strands bound together

central dogma dna rna protein

DNA

transcription

RNA

translation

Protein

Central Dogma: DNA -> RNA -> Protein

CCTGAGCCAACTATTGATGAA

CCUGAGCCAACUAUUGAUGAA

Amino Acid

www.bioalgorithms.info; chapter 6; Gene Prediction

human genome
Human Genome

Scientists originally thought there would be about 100,000 genes

Appear to be about 20,000

WHY?

Almost identical to that of Chimps. What makes the difference?

Answers appear to lie in the noncoding regions of the DNA (formerly thought to be junk)

rnai nobel prize in medicine 2006
RNAi – Nobel Prize in Medicine 2006

siRNA may be artificially added to cell!

Double stranded RNA

Short Interfering RNA (~20-25 nt)

RNA-Induced Silencing Complex

Binds to mRNA

Cuts RNA

Image source: http://nobelprize.org/nobel_prizes/medicine/laureates/2006/adv.html, Advanced Information, Image 3

mirna
miRNA
  • Short (20-25nt) sequence of noncoding RNA
  • Known since 1993 but significance not widely appreciated until 2001
  • Impact / Prevent translation of mRNA
  • Generally reduce protein levels without impacting mRNA levels (animal cells)
  • Functions
    • Causes some cancers
    • Guide embryo development
    • Regulate cell Differentiation
    • Associated with HIV
tcgr mature mirna window 5 pattern 3

C Elegans

Homo Sapiens

Mus Musculus

All Mature

ACG

CGC

GCG

UCG

TCGR – Mature miRNA(Window=5; Pattern=3)
tcgrs for xue training data
TCGRs for Xue Training Data

C. Xue, F. Li, T. He, G. Liu, Y. Li, nad X. Zhang, “Classification of Real and Pseudo MicroRNA Precursors using Local Structure-Sequence Features and Support Vector Machine,” BMC Bioinformatics, vol 6, no 310.

slide41

Affymetrix GeneChip® Array

http://www.affymetrix.com/corporate/outreach/lesson_plan/educator_resources.affx

microarray data analysis
Microarray Data Analysis
  • Each probe location associated with gene
  • Measure the amount of mRNA
  • Color indicates degree of gene expression
  • Compare different samples (normal/disease)
  • Track same sample over time
  • Questions
    • Which genes are related to this disease?
    • Which genes behave in a similar manner?
    • What is the function of a gene?
  • Clustering
    • Hierarchical
    • K-means
microarray data clustering
Microarray Data - Clustering

"Gene expression profiling identifies clinically relevant subtypes of prostate cancer"

Proc. Natl. Acad. Sci. USA, Vol. 101, Issue 3, 811-816, January 20, 2004

big brother
BIG BROTHER ?
  • Total Information Awareness
    • http://infowar.net/tia/www.darpa.mil/iao/index.htm
    • http://www.govtech.net/magazine/story.php?id=45918
    • http://en.wikipedia.org/wiki/Information_Awareness_Office
  • Terror Watch List
    • http://www.businessweek.com/technology/content/may2005/tc20050511_8047_tc_210.htm
    • http://www.theregister.co.uk/2004/08/19/senator_on_terror_watch/
    • http://blog.wired.com/27bstroke6/2008/02/us-terror-watch.html
  • CAPPS
    • http://www.theregister.co.uk/2004/04/26/airport_security_failures/
    • http://www.heritage.org/Research/HomelandDefense/BG1683.cfm
    • http://www.theregister.co.uk/2004/07/16/homeland_capps_scrapped/
    • http://en.wikipedia.org/wiki/CAPPS
slide45

http://ieeexplore.ieee.org/iel5/6/32236/01502526.pdf?tp=&arnumber=1502526&isnumber=32236http://ieeexplore.ieee.org/iel5/6/32236/01502526.pdf?tp=&arnumber=1502526&isnumber=32236

ad