ADVENTURES IN DATA MINING
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

ADVENTURES IN DATA MINING PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

ADVENTURES IN DATA MINING. Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 [email protected] This material is based in part upon work supported by the National Science Foundation under Grant No. 9820841

Download Presentation

ADVENTURES IN DATA MINING

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Adventures in data mining 2579917

ADVENTURES IN DATA MINING

Margaret H. Dunham

Southern Methodist University

Dallas, Texas 75275

[email protected]

This material is based in part upon work supported by the National Science Foundation under Grant No. 9820841

Some slides used by permission from Dr Eamonn Keogh; University of California Riverside;[email protected]


Adventures in data mining 2579917

The 2000 ozone hole over the antarctic seen by EPTOMS

http://jwocky.gsfc.nasa.gov/multi/multi.html#hole


Data mining outline

Data Mining Outline

  • Introduction

  • Techniques

    • Classification

    • Clustering

    • Association Rules

  • Examples

Explore some interesting data mining applications


Introduction

Introduction

  • Data is growing at a phenomenal rate

  • Users expect more sophisticated information

  • How?

UNCOVER HIDDEN INFORMATION

DATA MINING


But it isn t magic

But it isn’t Magic

  • You must know what you are looking for

  • You must know how to look for you

Suppose you knew that a specific cave had gold:

  • What would you look for?

  • How would you look for it?

  • Might need an expert miner


Adventures in data mining 2579917

Description

Behavior

Associations

“If it looks like a duck, walks like a duck, and quacks like a duck, then it’s a duck.”

“If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then it’s a terrorist.”

Classification Clustering Link Analysis

(Profiling) (Similarity)


Classification

CLASSIFICATION

Assign data into predefined groups or classes.


Classification ex grading

x

<90

>=90

x

A

<80

>=80

x

B

<70

>=70

x

C

<50

>=60

D

F

Classification Ex: Grading


Adventures in data mining 2579917

Katydids

Given a collection of annotated data. (in this case 5 instancesof Katydidsand five ofGrasshoppers), decide what type of insect the unlabeled example is.

Grasshoppers

(c) Eamonn Keogh, [email protected]


Adventures in data mining 2579917

The classification problem can now be expressed as:

  • Given a training database predict the class label of a previously unseen instance

previously unseen instance =

(c) Eamonn Keogh, [email protected]


Adventures in data mining 2579917

10

9

8

7

6

5

4

3

2

1

1

2

3

4

5

6

7

8

9

10

Antenna Length

Abdomen Length

Katydids

Grasshoppers

(c) Eamonn Keogh, [email protected]


Adventures in data mining 2579917

Facial Recognition

(c) Eamonn Keogh, [email protected]


Adventures in data mining 2579917

1

0.5

0

50

100

150

200

250

300

350

400

450

0

Handwriting Recognition

(c) Eamonn Keogh, [email protected]

George Washington Manuscript


Rare event detection

Rare Event Detection


Adventures in data mining 2579917

Dallas Morning News

October 7, 2005


Clustering

CLUSTERING

Partition data into previously undefined groups.


Adventures in data mining 2579917

http://149.170.199.144/multivar/ca.htm


Adventures in data mining 2579917

What is Similarity?

(c) Eamonn Keogh, [email protected]


Two types of clustering

Two Types of Clustering

Partitional

Hierarchical

(c) Eamonn Keogh, [email protected]


Hierarchical clustering example iris data set

Hierarchical Clustering ExampleIris Data Set

Versicolor

Setosa

Virginica

The data originally appeared in Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic Problems," Annals of Eugenics 7, 179-188.

Hierarchical Clustering Explorer Version 3.0, Human-Computer Interaction Lab, University of Maryland, http://www.cs.umd.edu/hcil/multi-cluster .


Association rules link analysis

ASSOCIATION RULES/ LINK ANALYSIS

Find relationships between data


Association rules examples

ASSOCIATION RULES EXAMPLES

People who buy diapers also buy beer

If gene A is highly expressed in this disease then gene A is also expressed

Relationships between people

Book Stores

Department Stores

Advertising

Product Placement

http://www.amazon.com/Data-Mining-Introductory-Advanced-Topics/dp/0130888923/ref=sr_1_1?ie=UTF8&s=books&qid=1235564485&sr=1-1


Adventures in data mining 2579917

Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Prentice Hall, 2003.

DILBERT reprinted by permission of United Feature Syndicate, Inc.


Data mining outline1

Data Mining Outline

  • Introduction

  • Techniques

  • Examples

    • Vision Mining

    • Law Enforcement (Cheating, Plagiarism, Fraud, Criminal Behavior,…)

    • Bioinformatics


Vision mining

Vision Mining

  • License Plate Recognition

    • Red Light Cameras

    • Toll Booths

    • http://www.licenseplaterecognition.com/

  • Computer Vision

    • http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/vid/


Adventures in data mining 2579917

How Stuff Works, “Facial Recognition,” http://computer.howstuffworks.com/facial-recognition1.htm


Adventures in data mining 2579917

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.


No little cheating

No/Little Cheating

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.


Rampant cheating

Rampant Cheating

Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.


Adventures in data mining 2579917

Jialun Qin, Jennifer J. Xu, DaningHu, Marc Sageman and Hsinchun Chen, “Analyzing Terrorist Networks: A Case Study of the Global Salafi Jihad Network”  Lecture Notes in Computer Science, Publisher: Springer-Verlag GmbH, Volume 3495 / 2005 , p. 287.


Adventures in data mining 2579917

http://www.time.com/time/magazine/article/0,9171,1541283,00.html


Adventures in data mining 2579917

DNA

http://www.visionlearning.com/library/module_viewer.php?mid=63

Basic building blocks of organisms

Located in nucleus of cells

Composed of 4 nucleotides

Two strands bound together


Central dogma dna rna protein

DNA

transcription

RNA

translation

Protein

Central Dogma: DNA -> RNA -> Protein

CCTGAGCCAACTATTGATGAA

CCUGAGCCAACUAUUGAUGAA

Amino Acid

www.bioalgorithms.info; chapter 6; Gene Prediction


Human genome

Human Genome

Scientists originally thought there would be about 100,000 genes

Appear to be about 20,000

WHY?

Almost identical to that of Chimps. What makes the difference?

Answers appear to lie in the noncoding regions of the DNA (formerly thought to be junk)


Rnai nobel prize in medicine 2006

RNAi – Nobel Prize in Medicine 2006

siRNA may be artificially added to cell!

Double stranded RNA

Short Interfering RNA (~20-25 nt)

RNA-Induced Silencing Complex

Binds to mRNA

Cuts RNA

Image source: http://nobelprize.org/nobel_prizes/medicine/laureates/2006/adv.html, Advanced Information, Image 3


Mirna

miRNA

  • Short (20-25nt) sequence of noncoding RNA

  • Known since 1993 but significance not widely appreciated until 2001

  • Impact / Prevent translation of mRNA

  • Generally reduce protein levels without impacting mRNA levels (animal cells)

  • Functions

    • Causes some cancers

    • Guide embryo development

    • Regulate cell Differentiation

    • Associated with HIV


Tcgr mature mirna window 5 pattern 3

C Elegans

Homo Sapiens

Mus Musculus

All Mature

ACG

CGC

GCG

UCG

TCGR – Mature miRNA(Window=5; Pattern=3)


Tcgrs for xue training data

TCGRs for Xue Training Data

C. Xue, F. Li, T. He, G. Liu, Y. Li, nad X. Zhang, “Classification of Real and Pseudo MicroRNA Precursors using Local Structure-Sequence Features and Support Vector Machine,” BMC Bioinformatics, vol 6, no 310.


Adventures in data mining 2579917

Affymetrix GeneChip® Array

http://www.affymetrix.com/corporate/outreach/lesson_plan/educator_resources.affx


Microarray data analysis

Microarray Data Analysis

  • Each probe location associated with gene

  • Measure the amount of mRNA

  • Color indicates degree of gene expression

  • Compare different samples (normal/disease)

  • Track same sample over time

  • Questions

    • Which genes are related to this disease?

    • Which genes behave in a similar manner?

    • What is the function of a gene?

  • Clustering

    • Hierarchical

    • K-means


Microarray data clustering

Microarray Data - Clustering

"Gene expression profiling identifies clinically relevant subtypes of prostate cancer"

Proc. Natl. Acad. Sci. USA, Vol. 101, Issue 3, 811-816, January 20, 2004


Big brother

BIG BROTHER ?

  • Total Information Awareness

    • http://infowar.net/tia/www.darpa.mil/iao/index.htm

    • http://www.govtech.net/magazine/story.php?id=45918

    • http://en.wikipedia.org/wiki/Information_Awareness_Office

  • Terror Watch List

    • http://www.businessweek.com/technology/content/may2005/tc20050511_8047_tc_210.htm

    • http://www.theregister.co.uk/2004/08/19/senator_on_terror_watch/

    • http://blog.wired.com/27bstroke6/2008/02/us-terror-watch.html

  • CAPPS

    • http://www.theregister.co.uk/2004/04/26/airport_security_failures/

    • http://www.heritage.org/Research/HomelandDefense/BG1683.cfm

    • http://www.theregister.co.uk/2004/07/16/homeland_capps_scrapped/

    • http://en.wikipedia.org/wiki/CAPPS


Adventures in data mining 2579917

http://ieeexplore.ieee.org/iel5/6/32236/01502526.pdf?tp=&arnumber=1502526&isnumber=32236


Adventures in data mining 2579917

Thanks!


  • Login