E N D
1. ???? ????
3. ????:???????????,??????? ????????,????,???????
?????? ???????
????????,?????????
?????? ???????
????????,????,????????? ??
? ????? ????????(??)
8. ???????????????
11. How does work? Google ???????PageRank™,??????????? Larry Page ? Sergey Brin ????????????
12. PageRank ?? PageRank ????????????,??????????????????????
????,Google ???? A ????? B ?,????? A ???? B ?????,Google ???????,???????????;?????????????????????????????,??????????????????????
13. ?????????????? PageRank,?? Google ????????????????,???????????????????,???????????????
??,Google ? PageRank ????????????,??????????????????Google ???????????????,???????????????????????????????????????????????
14. ??? Google ??????????,???????????????????????????????????????,Google ???????????????????,?????????? PageRank ????Google ???????????????,????????????????
(Summarized from google’s own web page:http://www.google.com.tw/intl/zh-TW/why_use.html)
15. Comments on Google’s Techniques PageRaking is based on popularity.
Not personalized.
Not suitable for “advanced users”.
Professional vs. Amateur
Expert vs. Inexpert
PageRanking may be “selection” biased.
Who put those pages on web?
??????????????
Google is no longer efficient when you are not searching for “popular documents.”
16. ??????????????????? ??????????????
17. ???????????(selection bias)?
??????????????
????????????
18. ???????????????????
?????????????????????????,????????????
???????????????????????, ?????
Sampling: ????????!
What is a good sample (data)?
How to collect sample (data)?
22. ??????????? ??:
?????????????
Variable types (discrete/continuous)
??:support vector machine?!
Problem dependent
??:Looking for effective and efficient algorithms.
??: performance assessment
ROC, Precision/Recall, ……
(???????????,????????????)
23. ??????????? (ex. Google), ?????????????????,??????
??????,???????? (keyword) ??, ??????????, ????????? (Pattern Match).
???????????????????, ? Google ??????????
24. ????????(?????????), ?????????????
?????????,???
????????????????,????????(popularity)??????????,????????????
25. ?????????????????????????, ???????????, ???????????????????????????????!
26. ?????????? ?????????:
????????, ???????(????)????????????????
??????????Sequential Learning,???? Semi-Supervised Learning
?????????????(Sequential Analysis) ????????(Stochastic Control)????
???? ……
27. ???????????, ?????????????????????????????, ????????????
??????????????????, ???????
28. ?????????????
??????????:
??=(??1,??2,…)
????????????
????????
???????????? (Why?)
??????????????????
??????(???)????
(????????????????!)
29. Q:????????????????? A:???????????
??????
30. ??????(????) :
??=(??1,??2,…)
???????????,???“??”???,???????“??”???????,?????????????????Pattern Recognition????
????“????”????????
?????(Data mining)??
32. Learning TheoryClassification/Regression Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning
Ryan Michael Rifkin, Ph.D Thesis, 2002, MIT
Past performance and future results
Carol Tomasi, Nature, V. 428, 2004
(Learning from experience is hard, and predicting how well what we have learned will serve us in the future is even harder. The most useful lessons turn out to be those that are insensitive to small changes in our experience.)
General conditions for predictivity in learning theory
T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi, Nature V. 428, 2004
(to determine conditions under which a learning algorithm will generalize from its finite training set to novel examples.)
33. Learning From Examples
Nonparametric (infinite parameters) Approach
No Assumption on the population distribution
Hard to predict its generalization performance
Statistical Modeling
Parametric (finite parameters) Approach
Highly depend on distribution assumption
Curse of Dimensionality
34. Difficulty of Text Classification Dimensionality is high and dynamic.
Lack of distribution knowledge of the “population”.
Ex. Economics, Computer Science, Mathematics, Statistics etc. may all use the same vocabularies to write completely different articles.
35. More Challenges Adaptive Text Mining: varying definition of concept
Lack of “training” examples
Need to start from scratch.
Unstable
Less efficient
On-line learning !!!!!
36. More Challenging Problems ??????????
???????????????
???????????????
?????????????,??????????????
????????????,???????????? (????)
37. ?????????????? ??????????????
???????????
????????“???????”???(???)???????????
38. ??????? ????????????,?????(??)????(Bayesian)???
?????“??”????“??”????????
?????????????????
39. “????????”
40. ??:???????????
??????????????????
(????????????????,???????,????????????????)
41. ?????????? ???????????? ,????????????
?????????
???,?????“??”???-“?????”??
(In what way that will be more suitable for analysis) ????????pixel,color??,?????????????????,???????????????,?????????,???Algorithm?????????? ????????pixel,color??,?????????????????,???????????????,?????????,???Algorithm??????????
42. ????????????????--???!
“???????????????????”
??????“??????”,?????????“??”??????????
?????????“?”??
????“???”(?????)???????
??“???”?????,????????????????,????????????????????
43. R2D2 & 3PO
44. Related Research GOOSE: A Goal-Oriented Search Engine With Commonsense (2002)
By Hugo Liu, Henry Lieberman, Ted Selker (MIT Media Laboratory)
Personalization of search engine services for effective retrieval and knowledge management
By Weiguo Fan and Michael D. Gordon
(University of Michigan Business School)
Praveen Pathak
(School of Management Purdue University)
45. Albert Einstein - Address to the student body of California Institute of Technology It is not enough that you should understand about applied science in order that your work may increase man's blessings. Concern for man himself and his fate must always form the chief interest of all technical endeavors, concern for the great unsolved problems of the organization of labor and the distribution of goods in order that the creations of our mind shall be a blessing and not a curse to mankind.
Never forget this in the midst of your diagrams and equations.