Download
mis 696 final presentation fall 2008 n.
Skip this Video
Loading SlideShow in 5 Seconds..
MIS 696 Final Presentation Fall 2008 PowerPoint Presentation
Download Presentation
MIS 696 Final Presentation Fall 2008

MIS 696 Final Presentation Fall 2008

95 Views Download Presentation
Download Presentation

MIS 696 Final Presentation Fall 2008

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. MIS 696 Final PresentationFall 2008 Mary Burns Katherine Carl Jiesi Cheng Soomi Cheong Koren Elder Li Fan Chun-neng Huang Brent Langhals Matthew Pickard Nathan Twyman Shuo Zeng Xinlei Zhao

  2. What is MIS? As a discipline? As a field of research?

  3. MIS: A Conventional Definition Management Information (Computer Science) Systems (Engineering)

  4. 2007: A Normative Approach, Decision Tree 1999: A Simple Model and Key Researchers The Quest: From the Seven Pillars to the Tree of Decision 2006: Methodological Approach 2008: An IS approach to MIS? 2000: Additional Pillars 2005: Another Model, Publication Trends 1998: Seven Pillars 2001: Another 2D Model, A Timeline of Researchers 2004: A 2D Model, Research Institutions 2003: A 3D Model, Timeline, Endnote Library 2002: Researchers, More of the Same

  5. The Brainstorm “Discovery consists of seeing what everybody has seen and thinking what nobody has thought.” –Albert Szent-Gyorgyi Nathan

  6. The Ideagora Journal Trends Web of Science Graphical Representation Clustering Validation of 2007 Decision Tree

  7. The Realization We are a large and intelligent group of people, but can we deliver all of these analyses in a semester? We need a way to manage a large quantity of data.

  8. Contribution: Database Data Data Data Data Data

  9. Contribution: Database • Basic article info • Category

  10. Contribution: Database

  11. Contribution: Database

  12. Contribution: Database • Web of Knowledge and Google Citations

  13. Our Contribution: A Database • Article Dimensions • Rigor vs. Relevance • Theoretical vs. Applied • Innovation vs. Review • Behavioral vs. Technical

  14. Analysis 1:Statistical Analysis of the Corpus

  15. Attribute-Based Clustering & Analysis of MIS Papers

  16. Purpose and Methodology • Purpose • Classify the MIS papers from a different perspective – the general attributes of the papers • Provide useful information to assist the trend analysis and prediction about MIS research • Methodology • Clustering: Use Fuzzy k-Means Clustering Algorithm • Validation: Use Partition Index (SC) to determine the best number of clusters • Cluster Evaluation: Label the papers with cluster numbers • Analysis: Analyze the clustering results

  17. Attributes of Papers • 8 Attributes / 4 Attributes Pairs • Theoretical vs. Applied • Rigor vs. Relevance • Review vs. Innovation • Technical vs. Behavior • Scoring and Data Processing • Every attribute of a paper is given a score 1~5 • The score of one attribute is considered the reverse score of the other attribute in the pair (i.e. scoreTheoretical = 3 equals to scoreApplied = -3)

  18. MIS-Paper Space Definition- A 4-dimensional space

  19. Fuzzy k-Means Clustering • Average value of scores in the same pair are used as the coordinates of the paper in MIS-Paper Space, it is 4-dimensional • All coordinates of papers are used as the raw data in the clustering procedure • Because the best number of clusters could not be decided at the beginning, the clustering procedure will run several times with the number of clusters predefined from 3~15

  20. Validation • Goal of clustering • Group the papers with as many similarities as possible • Separate different groups as far away from each other as possible • Choice of validation index • Partition Index: The ratio of the sum of compactness and separation of the clusters • The lower the ratio, the better

  21. Validation (Cont’d)

  22. Validation (Cont’d) • Best number of clusters: 7 • Reasons • It is the “elbow” point, the increase of performance after 7 is not as prominent as that before 7 • Although 12 has the lowest index value, too many clusters (too few papers every cluster) will affect the generalization of the characteristics of every cluster

  23. Cluster Evaluation • Choose the largest membership value and label the paper with cluster number • Center and number of papers of every cluster

  24. Characteristics Table

  25. Data Visualization

  26. Data Visualization (Cont’d)

  27. Data Visualization (Cont’d)

  28. Data Visualization (Cont’d)

  29. Domain-Cluster Paper Distribution

  30. Possible Analysis Results • By analyzing the paper distribution in domain and clusters, we can generate • Authors’ research map • Universities’ research map • Journals’ preference on paper types • By analyzing the above result with a time series, we can generate • Trend and prediction of authors’, universities’ research • Journals’ preference

  31. Benefits • Catch the latest research hotspot in every domain • Follow the changes of the preference of journals • Acquire real-time information about the changes of universities’ and professors’ roles in the MIS community • Discover the unexplored domain in MIS area

  32. Discussion & Future Work • Two difficulties • Need information from perspectives to reasonably explain the results • Attribute score may contain bias, which will affect the performance of the clustering • Future work • Select new attributes to evaluate papers • Examine the effect of score bias and design better approach • Replace manual analysis with automatic process, such as Text Mining and Social Network Analysis

  33. Text Mining SQL 2005 Data Mining

  34. Data Mining Algorithms Association rules Seq. Clustering Neural Network Decision Trees Naïve Bayes Time Series Clustering Classification Regression Segmentaion Assoc. Analysis Anomaly Detect. Seq. Analysis Time series √ - first choice √ - second choice

  35. Naïve Bayesian • Based on Bayesian Theorem with “Naïve” assumption • The fastest algorithm, and gives reasonable accuracy • Best used for • Advanced data exploration (correlation, attribute discrinimation, etc) • Manual feature selection • Parallel correlation counting • Parameters: • MAXIMUM_INPUT_ATTRIBUTES • MAXIMUM_OUTPUT_ATTRIBUTES • MINIMUM_NODE_SCORE • MAXIMUM_STATES

  36. Decision Trees • Best accuracy for classification, regression, association prediction in many cases. • Multiple internal algorithms • Bayesian with K2 prior, Uniform prior • Entropy-based • Bayesian Gaussian for regression trees • Complete/simple-binary splits • Patent-pending technologies • Automatic feature-selection • High cardinality attribute handling • Continuous attribute handling • Parallel correlation counting • Parameters: • COMPLEXITY_PENALTY • MAXIMUM_INPUT_ATTRIBUTES • MAXIMUM_OUTPUT_ATTRIBUTES • MINIMUM_LEAF_CASES • FORCE_REGRESSORS • SCORE_METHOD • SPLIT_METHOD

  37. Clustering • Segmentation, profiling • Multiple internal algorithms • K-means • EM • Automatic feature selection on input attributes, automatic high cardinality attribute handling • Parameters • CLUSTER_COUNT • MAXIMUM_INPUT_ATTRIBUTES • CLUSTER_METHOD • MAXIMUM_STATES • MINIMUM_CLUSTER_CASES • MODELLING_CARDINALITY • STOPPING_TOLERANCE

  38. Neural Network • Classification, segmentation, association prediction, segmentation. • Conjugate gradient method • 0-1 hidden layer • Early stopping criteria • Automatic feature selection • Parameters • MAXIMUM_INPUT_ATTRIBUTES • MAXIMUM_OUTPUT_ATTRIBUTES • MAXIMUM_STATES • HIDDEN_NODE_RATION • HOLDOUT_PERCENTAGE

  39. SQL 2005 Data Mining

  40. SQL 2005 Data Mining