1 / 10

Submit Predictions

Goal. Predict whom survived the Titanic Disaster. Hypotheses. Get Data. Data Management. Statistics & Analysis. Correctly Predict Passenger’s Fate . Submit Predictions. Score = . Number of Passengers in Test Dataset. Training and Test Data. Training Data. Develop Model.

alayna
Download Presentation

Submit Predictions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Goal Predict whom survived the Titanic Disaster Hypotheses Get Data Data Management Statistics & Analysis Correctly Predict Passenger’s Fate Submit Predictions Score = Number of Passengers in Test Dataset

  2. Training and Test Data Training Data Develop Model Test Data N=891 39% Survived N=418 All Titanic Passengers N= 2,223 • How similar is the Test Data to the Training Data? • If Similar, then model should do well. • If Differenet, then model could perform poorly.

  3. Kitchen Sink Over-Fitting?

  4. Decision Tree Pruning model.6 <- rpart(survived ~ sex + age + pclass + sibsp + parch + fare + embarked, data = train_data, maxdepth=2)

  5. Hold Out and Cross-Validation

  6. Random Forest: Multiple Trees

  7. Confusion Matrix RandomForest Decision Tree Gender False Negatives False Positives

  8. Model Ceiling Seems Realistic 340 320 418 Gender Model

  9. Why a Model Ceiling? Below are 4 pairs of passengers with very similar Predictor Variables; Yet, within each pair, one survived, and the other did not. At some point there just isn’t the data / variable to help make an accurate prediction.

More Related