1 / 6

Annual Income Prediction Modeling Using SVM

Annual Income Prediction Modeling Using SVM. Xinjue YU 12/14/2010. Annual Income Prediction. Why this problem? Useful in industries such as insurance, banking, marketing, etc Interested in the income distribution The goal:

bessie
Download Presentation

Annual Income Prediction Modeling Using SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annual Income Prediction Modeling Using SVM Xinjue YU 12/14/2010

  2. Annual Income Prediction • Why this problem? • Useful in industries such as insurance, banking, marketing, etc • Interested in the income distribution • The goal: • To predict whether a person has an annual income of more than $50,000 • The information we have: • Age, gender, education level, working hours per week, etc.

  3. The Dataset • The Adult dataset: • 32561 total with 16281 for testing • Extracted from the 1994 Census database. • A set of reasonably clean records was extracted ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))  • http://archive.ics.uci.edu/ml/machine-learning-databases/adult/

  4. Preparation • There are 14 features in the raw dataset, using 4 out of 14 • The 4 features that are used: gender, education level, aged and working hours per week • Quantizing the features • Education level: 1(<=high school), 2(<grad school) & 3(>=grad school) • Gender: 0(Female) & 1(male) • Age: 1(<30), 2(30-50) & 3(>50) • Working hours per week: 1(<=40) & 2(>40)

  5. The Approach • Using Support Vector Machine in artificial neural network • The data are supposed to be non-separable • Using SVM for non-separable pattern classification • Trying different kernels such as • Linear • RBF • Polynomial • Sigmoid • Using 2-D feature pairs first • gender & education level • Age and working hours per week • Using 4 features in further study (increased complexity)

  6. The Expected Results • Predict a person’s annual income is whether more than 50K by the result using SVM (classification/clustering involved) • Using testing data to get the error rates of different kernels • Comparison of the results of different kernels • Linear kernels are supposed to have the highest error rate • Try to limit the error rate within 20%-30%

More Related