1 / 51

Data Mining Meets Mobile Search

This article discusses mobile search using message-based, browser-based, and Java application methods. It also explores unique features, challenges, and scalability issues in mobile search.

dodsonm
Download Presentation

Data Mining Meets Mobile Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Meets Mobile Search Wen-Chih Peng (彭文志) Dept. of Computer Science National Chiao Tung University

  2. What’s Mobile Search • Doing search via mobile devices • Types: • Message-based • Browser-based • Java application

  3. Unique Features in Mobile Search • Mobile devices • Personal devices • Wireless networks • Positioning enabled • Services in mobile search • Local search • Static: Nearby entities (more vertical search) • Dynamic: Live traffic or find your buddy • Entertainments: • Video, images

  4. Our Big Picture for Mobile Search

  5. Mobile Search@NCTU LBS recommendation list based on Blog or Web 2.0 sites Mobile/PDA phones with wireless interfaces and GPS

  6. Technology Highlight • Positioning • Large-Scale Wi-Fi issue • Data content • Static data: Web Puzzle problem • Dynamic data: Live traffic data • Community structures in Web 2.0 • Behavior analysis for intelligent UI

  7. Positioning Joint work with Prof. Y.-C. Tseng

  8. Challenges • Provide Wi-Fi positioning techniques • A large-scale pattern-matching mechanism • Training phase: collect thousands or millions of training data • Positioning phase: quickly estimate a location according to a huge location database

  9. 1 Location Database <xn,yn> 1 i <x1, y1> i <xi,yi> s <x2,y2> <x2,y2> s <x, y> Pattern-Matching Mechanism training data <x1, y1>1 <x2, y2>2 . . . <xn, yn>n <x1, y1>1 <x2, y2>2 . . . <xn, yn>n [ 1,1, 1.2, 1.3, 1.4 ] Training Phase Positioning Phase Pattern-Matching Localization Algorithm access point (AP) [s1, s2, s3,s4] real-time data

  10. Longer system setup time Longer system response time The Scalability Problem • Huge calibration efforts in the training phase • High computation cost in the positioning phase

  11. The Scalability Problem • Reduction of computation cost incurred in the positioning phase • Apply clustering technique to fragment database into a number of clusters

  12. Computation Cost Comparison Typical Pattern-matching Searching Space Cluster-Based size of location database number of clusters

  13. appears at <x1,y1> Cluster C2 Real-time received signal strengths s 2 RSS of AP 2  1 C1 1 If s is in the shaded region 3  2 C2  3 C3 RSS of AP 1 Cluster C1 Cluster C3 False Cluster Selection Cluster C2 1,1: received signal of AP 1 at <x1,y1> 1,2: received signal of AP 2 at <x1,y1> <x1, y1>(1,1, 1,2) The region that the signal may fluctuate Cluster C3 The cluster that contains the true location C2 ≠ False cluster selection occurs!!! Considering 2 APs in the environment

  14. WebPuzzle

  15. Pieces of Le Bouquet Cake House • Name: Le Bouquet 繽紛蛋糕房 • Address: 台北市中山北路二段63號 • Tel: 21002856 • Reviews: {summary from source articles}

  16. Goal • Problem: • Unstructured information of structured objects are distributed in WWW • Unclear / vagued / homogeneous pieces from heterogeneous sources construct an object • Goal: • Unstructured sources --> structured view • New object instance discovery • Applications • Information portals for various domains • GeoGuider: portal for GeoObjects

  17. Web Puzzle Problem Given The annotated corpus Input Keyword: describe the conceptually tuple space (optional) Entity: the tuple scheme e.g Computer Science #Person #Email #Phone Output Ranked entity tuples

  18. Keyword Search Data Objects Entity Vector Object Composition Entity Index Binary Relation EntityAnnotation

  19. Binary Relation Counting a -> b1: 10 times; a -> b2: 3 times P(b1 | a) > P(b2 | a) Context and text proximity Customized functions

  20. Search Associate keywords to object entities P(#entity | keywords)‏ Rank data objects by keywords Similarity(tuple_keyword, tuple_object)‏

  21. Prototype Platform in NCTU CS 2.7T disk space; 40 cores; 40G ram Distributed File System Map/Reduce enabled (supported by Google/Yahoo)

  22. Demo GeoGuider

  23. Mining Community from Blog/Web 2.0

  24. We are buried in comments

  25. f1 f2 ☆ Nop ☆ . . . f202 Example : K+

  26. b1 .7 b3 Vf Vc .3 .5 b2 b4 .5 Definition • Popular co-cited community (PCC) • G=(Vc∪Vf, E) is a PCC if there exists a partition s.t. • Co-cited : ni in Vf is fully connected to nj in Vc and • Popular : | Vf | > min_sup • E.g., • Core member : {b3, b4} • Followers : {b1, b2}

  27. f1 f2 ☆ Nop ☆ ☆ Nop ☆ f2 . . . fi fi Sandy Sandy . . . 小昕昕 小昕昕 fn fn Example : K+ • Mining PCC

  28. f2 f10 N3 N1 f6 f200 .6 N2 N4 ☆ Nop ☆ f214 f9 f2 .4 G(1) = fi Sandy 小昕昕 fn Example : K+ • Mining TPCC G(0)2 .5 G(0)1 .5 .3 .7 G(0)3

  29. CarWeb: A Traffic Data Collection Platform

  30. Motivation • Sharing GPS data • Cars with GPS and 3G mobile phones • Spatial-temporal databases • Mining traffic patterns

  31. Scenario of CarWeb

  32. CarWeb Architecture

  33. Joint Clustering in Data Streams • Highway traffic database

  34. Joint Clustering in Data Streams • Motivation • Discover useful clusters for sensor data management • Input • Highway traffic database • Parameters • window size w, range r, average speed error ε • Output • For each window, we generate clusters • Cluster: a set of r-connectedsensors

  35. Observation

  36. Traffic Estimation Problem • Input • A traffic database • Query (road segment, time) • Output • A speed of the query road segment Query: (e,T4)

  37. Spatio-Temporal Weighted Method

  38. Temporal Feature • Neighboring time slots

  39. h i v u p q l m t a d f w b e g x r n o k j s Spatial Feature • Nearby road segments • Similar road types

  40. Demo CarWeb

  41. Behavior Analysis for Intelligent UI

  42. Why Prediction? • Inconveniences of handheld devices • Humble keyboard • Keyword prediction • Recommendation • Small screen • Search result ranking • Segment prediction

  43. Multi-Domain Sequence • User behavior of handheld devices • Location (moving patterns) • Searching (keywords) • Payment (transactions) • Integrate multiple user behaviors • MDS: Multi-domain sequences • More informative than a single domain sequence

  44. Multi-Domain Sequential Pattern

  45. Challenges • Each domain has its own sequence database • Performing join operation across sequence databases is costly Join

  46. PropagatedMine • Perform sequential pattern mining in the first starting domain • Then further propagate the mining results to other domains propagate propagate propagate propagate Dn D3 D1 D2 Sequential patternmining Propagated Table Propagated Table Propagated Table Multi-domain sequential patterns Multi-domain sequential patterns Multi-domain sequential patterns Sequential patterns

  47. Conclusions • Data mining helps the growth of mobile search • Positioning • Web Puzzling • Community structures • CarWeb • User behavior analysis for intelligent UI • Built a mobile search prototype

  48. Mobile Search (Clients)

  49. Selected References • C.-Y. Lin, W.-C. Peng, and Y.-C. Tseng, ``Efficient In-Network Moving Object Tracking in Wireless Sensor Networks," IEEE Trans. on Mobile Computing, Vol. 5, No. 8, pp. 1044-1056, August 2006 • L.-Y. Wei and W.-C. Peng, ``Clustering Data Streams in Optimization and Geography Domains," Proceedingds of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009), Bangkok, Thailand, April 27-30, 2009. • C.-H. Lo, W.-C. Peng and M.-F. Chiang, ``Ranking Web Pages from User Perspectives of Social Bookmarking Sites," Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, Sydney, Australia, Dec. 9-12, 2008. • C.-H. Lo and W.-C. Peng, ``Efficient Joint Clustering Algorithms in Optimization and Geography Domains," Proceedingds of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2008), Osaka, Japan, May 20-23, 2008. • C.-H. Lo, W.-C. Peng, C.-W. Chen, T.-Y. Lin, and C.-S. Lin, ``CarWeb: A Traffic Data Collection Platform," Proceedings of the 9th International Conference on Mobile Data Management, April 27-30, Beijing, China, 2008. • M.-F. Chiang, W.-C. Peng and C.-H. Lo, "Discovering Popular Co-Cited Communities in Blogspaces," Proceedings of the first IEEE International Workshop on Data Engineering for Blogs, Social Media, and Web 2.0, (In conjunction with IEEE International Conference on Data Engineering), Cancun, Mexico, April 12, 2008. • S.-P. Kuo, B.-J. Wu, W.-C. Peng, and Y.-C. Tseng, ``Cluster-Enhanced Techniques for Pattern-Matching Localization Systems," Proceedings of the 4th IEEE International Conference on Mobile Ad-hoc and Sensor Systems (MASS 2007), Pisa, Italy, Oct. 8-11, 2007. • Z.-X. Liao and W.-C. Peng, ``Exploring Lattice Structures in Mining Multi-Domain Sequential Patterns," Proceedings of the Second International Conference on Scalable Information Systems (InfoScale), Suzhou, China, June 6-8, 2007.

  50. Who makes it come true ?

More Related