MIS 696 Final Presentation Fall 2008

MIS 696 Final PresentationFall 2008 Mary Burns Katherine Carl Jiesi Cheng Soomi Cheong Koren Elder Li Fan Chun-neng Huang Brent Langhals Matthew Pickard Nathan Twyman Shuo Zeng Xinlei Zhao

What is MIS? As a discipline? As a field of research?

MIS: A Conventional Definition Management Information (Computer Science) Systems (Engineering)

2007: A Normative Approach, Decision Tree 1999: A Simple Model and Key Researchers The Quest: From the Seven Pillars to the Tree of Decision 2006: Methodological Approach 2008: An IS approach to MIS? 2000: Additional Pillars 2005: Another Model, Publication Trends 1998: Seven Pillars 2001: Another 2D Model, A Timeline of Researchers 2004: A 2D Model, Research Institutions 2003: A 3D Model, Timeline, Endnote Library 2002: Researchers, More of the Same

The Brainstorm “Discovery consists of seeing what everybody has seen and thinking what nobody has thought.” –Albert Szent-Gyorgyi Nathan

The Ideagora Journal Trends Web of Science Graphical Representation Clustering Validation of 2007 Decision Tree

The Realization We are a large and intelligent group of people, but can we deliver all of these analyses in a semester? We need a way to manage a large quantity of data.

Contribution: Database Data Data Data Data Data

Contribution: Database • Basic article info • Category

Contribution: Database

Contribution: Database • Web of Knowledge and Google Citations

Our Contribution: A Database • Article Dimensions • Rigor vs. Relevance • Theoretical vs. Applied • Innovation vs. Review • Behavioral vs. Technical

Analysis 1:Statistical Analysis of the Corpus

Attribute-Based Clustering & Analysis of MIS Papers

Purpose and Methodology • Purpose • Classify the MIS papers from a different perspective – the general attributes of the papers • Provide useful information to assist the trend analysis and prediction about MIS research • Methodology • Clustering: Use Fuzzy k-Means Clustering Algorithm • Validation: Use Partition Index (SC) to determine the best number of clusters • Cluster Evaluation: Label the papers with cluster numbers • Analysis: Analyze the clustering results

Attributes of Papers • 8 Attributes / 4 Attributes Pairs • Theoretical vs. Applied • Rigor vs. Relevance • Review vs. Innovation • Technical vs. Behavior • Scoring and Data Processing • Every attribute of a paper is given a score 1~5 • The score of one attribute is considered the reverse score of the other attribute in the pair (i.e. scoreTheoretical = 3 equals to scoreApplied = -3)

MIS-Paper Space Definition- A 4-dimensional space

Fuzzy k-Means Clustering • Average value of scores in the same pair are used as the coordinates of the paper in MIS-Paper Space, it is 4-dimensional • All coordinates of papers are used as the raw data in the clustering procedure • Because the best number of clusters could not be decided at the beginning, the clustering procedure will run several times with the number of clusters predefined from 3~15

Validation • Goal of clustering • Group the papers with as many similarities as possible • Separate different groups as far away from each other as possible • Choice of validation index • Partition Index: The ratio of the sum of compactness and separation of the clusters • The lower the ratio, the better

Validation (Cont’d)

Validation (Cont’d) • Best number of clusters: 7 • Reasons • It is the “elbow” point, the increase of performance after 7 is not as prominent as that before 7 • Although 12 has the lowest index value, too many clusters (too few papers every cluster) will affect the generalization of the characteristics of every cluster

Cluster Evaluation • Choose the largest membership value and label the paper with cluster number • Center and number of papers of every cluster

Characteristics Table

Data Visualization

Data Visualization (Cont’d)

Domain-Cluster Paper Distribution

Possible Analysis Results • By analyzing the paper distribution in domain and clusters, we can generate • Authors’ research map • Universities’ research map • Journals’ preference on paper types • By analyzing the above result with a time series, we can generate • Trend and prediction of authors’, universities’ research • Journals’ preference

Benefits • Catch the latest research hotspot in every domain • Follow the changes of the preference of journals • Acquire real-time information about the changes of universities’ and professors’ roles in the MIS community • Discover the unexplored domain in MIS area

Discussion & Future Work • Two difficulties • Need information from perspectives to reasonably explain the results • Attribute score may contain bias, which will affect the performance of the clustering • Future work • Select new attributes to evaluate papers • Examine the effect of score bias and design better approach • Replace manual analysis with automatic process, such as Text Mining and Social Network Analysis

Text Mining SQL 2005 Data Mining

Data Mining Algorithms Association rules Seq. Clustering Neural Network Decision Trees Naïve Bayes Time Series Clustering Classification Regression Segmentaion Assoc. Analysis Anomaly Detect. Seq. Analysis Time series √ - first choice √ - second choice

Naïve Bayesian • Based on Bayesian Theorem with “Naïve” assumption • The fastest algorithm, and gives reasonable accuracy • Best used for • Advanced data exploration (correlation, attribute discrinimation, etc) • Manual feature selection • Parallel correlation counting • Parameters: • MAXIMUM_INPUT_ATTRIBUTES • MAXIMUM_OUTPUT_ATTRIBUTES • MINIMUM_NODE_SCORE • MAXIMUM_STATES

Decision Trees • Best accuracy for classification, regression, association prediction in many cases. • Multiple internal algorithms • Bayesian with K2 prior, Uniform prior • Entropy-based • Bayesian Gaussian for regression trees • Complete/simple-binary splits • Patent-pending technologies • Automatic feature-selection • High cardinality attribute handling • Continuous attribute handling • Parallel correlation counting • Parameters: • COMPLEXITY_PENALTY • MAXIMUM_INPUT_ATTRIBUTES • MAXIMUM_OUTPUT_ATTRIBUTES • MINIMUM_LEAF_CASES • FORCE_REGRESSORS • SCORE_METHOD • SPLIT_METHOD

Clustering • Segmentation, profiling • Multiple internal algorithms • K-means • EM • Automatic feature selection on input attributes, automatic high cardinality attribute handling • Parameters • CLUSTER_COUNT • MAXIMUM_INPUT_ATTRIBUTES • CLUSTER_METHOD • MAXIMUM_STATES • MINIMUM_CLUSTER_CASES • MODELLING_CARDINALITY • STOPPING_TOLERANCE

Neural Network • Classification, segmentation, association prediction, segmentation. • Conjugate gradient method • 0-1 hidden layer • Early stopping criteria • Automatic feature selection • Parameters • MAXIMUM_INPUT_ATTRIBUTES • MAXIMUM_OUTPUT_ATTRIBUTES • MAXIMUM_STATES • HIDDEN_NODE_RATION • HOLDOUT_PERCENTAGE

SQL 2005 Data Mining

MIS 696 Final Presentation Fall 2008

MIS 696 Final Presentation Fall 2008

Presentation Transcript

COWEE FINAL PRESENTATION June 23, 2008

Presentation – Final Report June 18, 2008

MIS FINAL PRESENTATION

Immersion Fall 2008 Final Presentation

ME304 Final Project Presentation Fall 2012

Immersion Fall 2008 Final Presentation

Fall 2013: Econ 339 Final Presentation

Final - Fall

Final Presentation Dec. 12, 2008

Final Exam CZM 324 Fall 2008

Computing with Services CS 696 – Services Computing Fall 2008

Basic Standards for Web Services CS 696 – Services Computing Fall 2008

Immersion Fall 2008 Final Presentation

ECE 477 Final Presentation Team 3 – Fall 2008

Fall Final

Final Presentation Fall semester

CanSat Fall Final Presentation

MIS 696A Final Presentation

CS 696 Fall 2008

Final Presentation Dec. 12, 2008

MIS final presentation

MIS PRESENTATION