90 likes | 97 Views
March Madness Data Crunch Overview. Sponsored By:. Timeline. 02/06/19 – Historical Data Released 02/13/19 – Registration Deadline for Teams 03/01/19 – Initial Predictions CSV Submission due 03/18/19 – 2019 Current Season Data Released by 5PM
E N D
March Madness Data Crunch Overview Sponsored By:
Timeline • 02/06/19 – Historical Data Released • 02/13/19 – Registration Deadline for Teams • 03/01/19 – Initial Predictions CSV Submission due • 03/18/19 – 2019 Current Season Data Released by 5PM • 03/20/19 – 2019 Final Tournament Predictions CSV due by 5PM • 03/27/19 – 2019 Final PowerPoint Report & Participation Declaration Form due by 5PM • 04/05/19 – Final Poster Session & Awards Ceremony
Where to Create Teams • Create a team of 4 and upload Excel file to blackboard by, 2/13/19, 11:59 pm • Please email team name and members to: • mdcm@fordham.edu • Teams will then be added to the March Madness Blackboard class • Materials will be uploaded there
Objective • Based on Kaggle’s Machine Learning Mania • https://www.kaggle.com/c/march-machine-learning-mania-2017 • Predict the probability that a team wins any given game in the March Madness Tournament • Predict all possible 2278 matches • Use data from 2002 until 2018 to train and test until data for 2019 is released • Be creative! • See if you can find signal in the noise • Demonstrate your analytical skills • Visualize your findings
Dataset Overview • Glossary available on Blackboard • Game Data: game_id, host name and latitude and longitude and score • KenPom Data: four factor data, tempo, efficiency, etc. • Do not share outside of Fordham • Coaching Data: Coach name, career wins, season wins, NCAA tournament appearances, Sweet 16 appearances, and Final 4 appearances • Team Location Data: Latitude and longitude of team1 and team2 • Team Data: Team Name • Poll Data: AP Preaseason/Final Polls, Coaches Preseason/Final Polls • RPI Data
Grading Criterion • Judges will grade the submissions on the following factors • Model Accuracy • How well did the model perform? • Creativity of Exploratory Analysis & Methodology • Was the team able to find novel ways to improve accuracy and gain new insights into what makes teams succeed in March? • Communication & Visualization • How well was the team able to effectively communicate their findings to the judges • Extremely important to Deloitte!! • Note: Model accuracy is not the most important. Very important to find creative ways to analyze the data and effectively communicate!
Format of Poster Board • Overview & Introduction • Hypothesis & Methodology • Variable Selection, Analytics Explored, Data Mining Techniques • Analytics & Results • Results of Analytics, Results of Data Mining Techniques • Conclusions & Suggestions for Improvement • Performance of Model
Tutorials • What is Log Loss? (Blackboard) • SPSS Logistic Regression Example (Blackboard) • Python Logistic Regression Example (Blackboard) • Other examples (Right)
Prediction Tracking http://fordhamsportsanalytics.com/