Decision trees for stream data mining – new results. Leszek Rutkowski – firstname.lastname@example.org Lena Pietruczuk – email@example.com Maciej Jaworski – firstname.lastname@example.org Piotr Duda – email@example.com. Czestochowa University of Technology
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Decision trees for stream data mining – new results
Leszek Rutkowski – firstname.lastname@example.org
Lena Pietruczuk – email@example.com
Maciej Jaworski – firstname.lastname@example.org
Piotr Duda – email@example.com
Czestochowa University of Technology
Institute of Computational Intelligence
The commonly known algorithm called ‘Hoeffding’s Tree’ was introduced by P. Domingos and G. Hulten in . The main mathematical tool used in this algorithm was the Hoeffding’s inequality 
Theorem:If are independent random variables and , then for
and is expected value of .
If are random variables of rangethen the Hoeffding’s bound takes the form
In our papers ,  and  we challenge all the results presented in the world literature (2000-2014) based on the Hoeffding’s inequality. In particular, we challenge the following result:
with probability if
and is a split measure.
Comparison of our results with the Hoeffding’s bound
I. For information gain the value of is much better than the value of , however it can be applied only to a two class problems. For details see .
II. For Gini index the value of gives better results than for . For the value of is more efficient.For details see .