project members kaushal mittal abhishek seth amar agrawal n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Scalable Decision Tree SPRINT PowerPoint Presentation
Download Presentation
Scalable Decision Tree SPRINT

Loading in 2 Seconds...

play fullscreen
1 / 6

Scalable Decision Tree SPRINT - PowerPoint PPT Presentation


  • 264 Views
  • Uploaded on

Project Members Kaushal Mittal Abhishek Seth Amar Agrawal. Scalable Decision Tree SPRINT. Problem Statement. Current decision tree implementation in Weka fails for large datasets. Scalable implementation of decision trees in Weka. Support for disk resident data. Challenges.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Scalable Decision Tree SPRINT' - Audrey


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
project members kaushal mittal abhishek seth amar agrawal
Project Members

Kaushal Mittal

Abhishek Seth

Amar Agrawal

Scalable Decision Tree SPRINT
problem statement
Problem Statement
  • Current decision tree implementation in Weka fails for large datasets.
  • Scalable implementation of decision trees in Weka.
  • Support for disk resident data.
challenges
Challenges
  • Instance class in Weka loads the entire training data in memory.
  • Multiple copies of the instance data made at several points during the training.
  • Other classes assume the existence of memory-resident instance data.
changes in weka
Changes in Weka
  • Extended the Instance class to support disk resident data.
  • Use of cache and random access files.
  • Changes to the Evaluation class to work with the new SInstance class.
decision tree classifier
Decision Tree Classifier
  • Design similar to Weka classifier J48.
  • SPRINT algorithm implemented.
  • Use of disk resident attribute lists.
  • Generates a binary classifier tree.
  • Uses Gini index as split criteria.
results
Results
  • Accuracy comparable to J48.
  • Glass 214
      • J48 - 100%
      • Sprint – 91.667%
  • Adult
      • J48 – 83.3%
      • Sprint – 79.8 %
  • Execution time – More than default J48 for small data sets(IO). For large data sets, Weka fails.