1 / 24

Building Topic/Trend Detection System based on Slow Intelligence

Building Topic/Trend Detection System based on Slow Intelligence. Chia-Chun Shih & Ting-Chun Peng Institute for Information Industry Taipei, Taiwan. Presented at DMS’10 special session on Slow Intelligence Systems. Agenda. Introduction Topic/Trend Detection System

kailey
Download Presentation

Building Topic/Trend Detection System based on Slow Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Topic/Trend Detection System based on Slow Intelligence Chia-Chun Shih & Ting-Chun Peng Institute for Information Industry Taipei, Taiwan Presented at DMS’10 special session on Slow Intelligence Systems

  2. Agenda • Introduction • Topic/Trend Detection System • Topic/Trend Detection System with Slow Intelligence • Conclusion

  3. Introduction

  4. Introduction Social media is prevailing Social media is a reflection of real-world An experiment from HP Social Computing Lab shows: Twitter-rate time series can accurately predict box-office movie sales with Adjusted R2= 0.973 (amazing!!) The emerging market for Social Media Monitoring Service E.g., Nielsen Buzzmetrics, Radian6 Facebook Users Blog Posts Twitter Posts

  5. Introduction (cont’d) • Topic Detection and Tracking (TDT) • Initiated by DARPA at 1996 • discover the topical structure in unsegmented streams of news reporting as it appears across multiple media • Tasks: • Topic Detection • Topic Tracking • First Story Detection • Story Segmentation • Link Detection

  6. Introduction (cont’d) • Slow Intelligence provides a software development framework for systems with insufficient computing resources to gradually adapt to environments to handle complexities Environment Knowledge-based Controller Problem Solution 1 2 3 4 Enumerator Adaptor Eliminator Concentrator Slow Intelligence System

  7. Introduction (cont’d) • In this paper, we propose a design of online topic/trend detection system for Social Media with the advantages of Slow Intelligence. • Four complexities of designing online topic/trend detection systems are identified, along with corresponding Slow Intelligence solutions.

  8. Topic/Trend Detection System

  9. Topic/Trend Detection System • Objective • Detect current hot topics and to predict future hot topics based on data collected from Social Media • Three components • Crawler & Extractor: Collect data and extract information from Social Media • Topic Extractor: Detect hot topics from a set of text documents • Trend Detector: Detect trends (future hot topics) based on currently available data Current Hot topics Crawler & Extractor Topic Extractor Trend Detector Social Media Future Hot topics

  10. Topic/Trend Detection System (cont’d) • Crawler & Extractor Social Media HTML documents User’s Keywords of Interests Web Crawler Text documents Web data DB Topic Extractor Information Extractor * Extract articles and metadata (title, author, content, etc) from semi-structured web content Crawler & Extractor

  11. Topic/Trend Detection System (cont’d) • Topic Extractor Web data DB Current topics Topic Word Extraction Topic Word Clustering • Apply TF-IDF scheme to generate Top-N topic words for each document • Apply clustering algorithm to cluster topic words into topic groups. The topic groups are treated as “topics” Current Hot topics Hot topic extraction • Apply aging theory to find hot topics Topic Extractor

  12. Topic Trend (Future Hot Topics) Current topics Trend Estimation Algorithms Topic/Trend Detection System (cont’d) • Trend Detector Trend Detector • The Trend Estimation Algorithm is a black box now, however, it will “find its way” when Slow Intelligence is involved in the system

  13. Topic/Trend Detection Systemwith Slow Intelligence

  14. T/TD System with Slow Intelligence • Four complexitiesof designing online topic/trend detection systems • 1. It is unlikely to collect all web data based on limited amount of computing resources. The system needs to develop data collection strategies which can concentrate limited resources on collecting important web data. Crawler & Extractor

  15. T/TD System with Slow Intelligence (cont’d) • 2. Many computation methods are available for estimating trends. If parameter settings are also taken into account, there are too many combinations to choose. Furthermore, Internet is a changing environment, which means current best solution may not perform well in the future. The system needs to automatically (or at least quasi-automatically) find best solution from many alternatives in a changing environment. Trend Detector

  16. T/TD System with Slow Intelligence (cont’d) • 3. The crawler needs to revisit websites to collect up-to-date data in hourly or daily intervals. Each site has different amount of to-be-update data and different policy to restrict frequent access, which are unknown beforehand. The system needs to find feasible data collection schedule based on past experience. Crawler & Extractor

  17. T/TD System with Slow Intelligence (cont’d) • 4. Any changes in web pages may disrupt Extractors. It needs automatic repair mechanism for Extractorsif many websites are being monitored. The repair mechanism needs to detect errors of Extractors, find alternatives, and choose the best solution from alternatives to fix the disrupted Extractors. Crawler & Extractor

  18. T/TD System with Slow Intelligence (cont’d) 1. SIS to help restrict the range of data collection Knowledge of data Knowledge of algorithm

  19. T/TD System with Slow Intelligence (cont’d) 2. SIS to help select and adapt trend detection algorithms

  20. T/TD System with Slow Intelligence (cont’d) 3. SIS to help scheduling Crawler

  21. T/TD System with Slow Intelligence (cont’d) 4. SIS to help adapt Extractors

  22. Conclusion

  23. Conclusion • An online trend detection system requirescareful resource allocation and automatic algorithm adaptationto process huge size of heterogeneous data. • This research adopts Slow Intelligence, which provides a framework for systems with insufficient computing resources to gradually adapt to environments, to response the challenges. • Four Slow Intelligence subsystems are proposed, and each subsystem targets a challenge in designing online topic/trend detection systems.

  24. If you have any questions, please e-mail us chiachun@iii.org.tw (Chia-Chun Shih) markpeng@iii.org.tw (Ting-Chun Peng)

More Related