1 / 12

Burton D. Morgan Entrepreneurial Competition - PowerPoint PPT Presentation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Burton D. Morgan Entrepreneurial Competition' - tacita

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• get feedback and possibly funding from nationally known entrepreneurs and venture capitalists

• get seed funding for your business in the form of prizes totaling at least \$50,000 (possibly more)

• get space in the Purdue Technology Incubator

• The competition is open to all Purdue students.

• Callouts on the 5th and 6th September, 7-9 pm in Krannert Auditorium.

Register with [email protected] or call 4-7324

• CS 590M Fall 2001: Security Issues in Data Mining

Lecture 6: Time Series, Regression, Data Mining Process

• Problem: Prediction of Numerical Values

• Similar to Classification, but continuous class

• Strong Statistical base

• Data mining community primarily concerned with scale

• Data: Sequence of vectors xi, yi, i=1,…,n

• Goal: Find function f such that f(x)y for

• Training data xi, yi

• x, y where y is unknown

• Note that f captures relationship between x and y, but doesn’t imply causality

• Curse of dimensionality: As the number of attributes/values grows,

• Space of possible functions f grows exponentially

• Number of training examples needed to learn best f grows exponentially

• Solution: Constrain space of possible functions

• Decision Trees

• Regression Trees (e.g., CART)

• Decision tree with automatic selection of number of choices at each node

• Regression Splines (e.g., MARS)

• Handles discontinuity at choice points

• Artificial Neural Networks

• Capable of computing arbitrarily complex functions

• Time/value data

• Goals:

• Learn function

• Identify repeated patterns of value change

• Given a values over a time fragment, find time fragments with similar values given:

• Shift of values

• Scaling of values

• Stretching of time

• Find commonly occurring patterns of values (e.g., the time fragments that would give the most similar fragments under the above conditions)

• Transformation

• Use DFT to transform to frequency domain

• Drop all but first few frequencies

• Index in R* tree and search

• Window-based

• Sliding window across sequence

• Index key features in special data structure

• Count entries at each index point

• Cross-Industry Standard Process for Data Mining (CRISP-DM)

• European Community funded effort to develop framework for data mining tasks

• Goals:

• Encourage interoperable tools across entire data mining process

• Take the mystery/high-priced expertise out of simple data mining tasks

• Understanding project objectives and requirements

• Data mining problem definition

• Data Understanding

• Initial data collection and familiarization

• Identify data quality issues

• Initial, obvious results

• Data Preparation

• Record and attribute selection

• Data cleansing

• Modeling

• Run the data mining tools

• Evaluation

• Determine if results meet business objectives