???? ????

1. ???? ????

3. ????:???????????,??????? ????????,????,??????? ?????? ??????? ????????,????????? ?????? ??????? ????????,????,????????? ?? ? ????? ????????(??)

8. ???????????????

11. How does work? Google ???????PageRank�,??????????? Larry Page ? Sergey Brin ????????????

12. PageRank ?? PageRank ????????????,?????????????????????? ????,Google ???? A ????? B ?,????? A ???? B ?????,Google ???????,???????????;?????????????????????????????,??????????????????????

13. ?????????????? PageRank,?? Google ????????????????,???????????????????,??????????????? ??,Google ? PageRank ????????????,??????????????????Google ???????????????,???????????????????????????????????????????????

14. ??? Google ??????????,???????????????????????????????????????,Google ???????????????????,?????????? PageRank ????Google ???????????????,???????????????? (Summarized from google�s own web page:http://www.google.com.tw/intl/zh-TW/why_use.html)

15. Comments on Google�s Techniques PageRaking is based on popularity. Not personalized. Not suitable for �advanced users�. Professional vs. Amateur Expert vs. Inexpert PageRanking may be �selection� biased. Who put those pages on web? ?????????????? Google is no longer efficient when you are not searching for �popular documents.�

16. ??????????????????? ??????????????

17. ???????????(selection bias)? ?????????????? ????????????

18. ??????????????????? ?????????????????????????,???????????? ???????????????????????, ????? Sampling: ????????! What is a good sample (data)? How to collect sample (data)?

22. ??????????? ??: ????????????? Variable types (discrete/continuous) ??:support vector machine?! Problem dependent ??:Looking for effective and efficient algorithms. ??: performance assessment ROC, Precision/Recall, �� (???????????,????????????)

23. ??????????? (ex. Google), ?????????????????,?????? ??????,???????? (keyword) ??, ??????????, ????????? (Pattern Match). ???????????????????, ? Google ??????????

24. ????????(?????????), ????????????? ?????????,??? ????????????????,????????(popularity)??????????,????????????

25. ?????????????????????????, ???????????, ???????????????????????????????!

26. ?????????? ?????????: ????????, ???????(????)???????????????? ??????????Sequential Learning,???? Semi-Supervised Learning ?????????????(Sequential Analysis) ????????(Stochastic Control)???? ???? ��

27. ???????????, ?????????????????????????????, ???????????? ??????????????????, ???????

28. ????????????? ??????????: ??=(??1,??2,�) ???????????? ???????? ???????????? (Why?) ?????????????????? ??????(???)???? (????????????????!)

29. Q:????????????????? A:??????????? ??????

30. ??????(????) : ??=(??1,??2,�) ???????????,???�??�???,???????�??�???????,?????????????????Pattern Recognition???? ????�????�???????? ?????(Data mining)??

32. Learning TheoryClassification/Regression Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning Ryan Michael Rifkin, Ph.D Thesis, 2002, MIT Past performance and future results Carol Tomasi, Nature, V. 428, 2004 (Learning from experience is hard, and predicting how well what we have learned will serve us in the future is even harder. The most useful lessons turn out to be those that are insensitive to small changes in our experience.) General conditions for predictivity in learning theory T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi, Nature V. 428, 2004 (to determine conditions under which a learning algorithm will generalize from its finite training set to novel examples.)

33. Learning From Examples Nonparametric (infinite parameters) Approach No Assumption on the population distribution Hard to predict its generalization performance Statistical Modeling Parametric (finite parameters) Approach Highly depend on distribution assumption Curse of Dimensionality

34. Difficulty of Text Classification Dimensionality is high and dynamic. Lack of distribution knowledge of the �population�. Ex. Economics, Computer Science, Mathematics, Statistics etc. may all use the same vocabularies to write completely different articles.

35. More Challenges Adaptive Text Mining: varying definition of concept Lack of �training� examples Need to start from scratch. Unstable Less efficient On-line learning !!!!!

36. More Challenging Problems ?????????? ??????????????? ??????????????? ?????????????,?????????????? ????????????,???????????? (????)

37. ?????????????? ?????????????? ??????????? ????????�???????�???(???)???????????

38. ??????? ????????????,?????(??)????(Bayesian)??? ?????�??�????�??�???????? ?????????????????

39. �????????�

40. ??:??????????? ?????????????????? (????????????????,???????,????????????????)

41. ?????????? ???????????? ,???????????? ????????? ???,?????�??�???-�?????�?? (In what way that will be more suitable for analysis) ????????pixel,color??,?????????????????,???????????????,?????????,???Algorithm?????????? ????????pixel,color??,?????????????????,???????????????,?????????,???Algorithm??????????

42. ????????????????--???! �???????????????????� ??????�??????�,?????????�??�?????????? ?????????�?�?? ????�???�(?????)??????? ??�???�?????,????????????????,????????????????????

43. R2D2 & 3PO

44. Related Research GOOSE: A Goal-Oriented Search Engine With Commonsense (2002) By Hugo Liu, Henry Lieberman, Ted Selker (MIT Media Laboratory) Personalization of search engine services for effective retrieval and knowledge management By Weiguo Fan and Michael D. Gordon (University of Michigan Business School) Praveen Pathak (School of Management Purdue University)

45. Albert Einstein - Address to the student body of California Institute of Technology It is not enough that you should understand about applied science in order that your work may increase man's blessings. Concern for man himself and his fate must always form the chief interest of all technical endeavors, concern for the great unsolved problems of the organization of labor and the distribution of goods in order that the creations of our mind shall be a blessing and not a curse to mankind.�� Never forget this in the midst of your diagrams and equations. ��

???? ????

???? ????

Presentation Transcript