1 / 22

Application of SIS based TDT

Application of SIS based TDT. Yang Hu University of Pittsburgh Department of Computer Science. Introduction to SIS Topic Detection and Tracking (TDT) Concept Goals Major Tasks Methods TDT based Power Efficiency Web Server Motivation Implementation Conclusion. Outline.

guernsey
Download Presentation

Application of SIS based TDT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of SIS based TDT Yang Hu University of Pittsburgh Department of Computer Science

  2. Introduction to SIS • Topic Detection and Tracking (TDT) • Concept • Goals • Major Tasks • Methods • TDT based Power Efficiency Web Server • Motivation • Implementation • Conclusion Outline

  3. Slow Intelligence System can provide a software development framework for general-purpose system with insufficient computing resources to gradually improve performance over time. Introduction to SIS

  4. It contains five stages Slow Intelligence System 5 1 2 3 4 Elimination Adaptation Concentration Enumeration Propagation Introduction to SIS (cont’d)

  5. What is TDT • A DARPA-sponsored initiative to investigate the state of the art in finding the trend in a stream of broadcast news stories. Concept

  6. To develop automatic techniques for finding topically related material in streams of data. This could be valuable in a wide variety of applications where efficient and timely information access is important. Eg. (CNN or Yahoo News) • Make the computers able to map out data automatically finding story boundaries, determining what stories go with one another, and discovering when something new (unforeseen) has happened. Goals

  7. Story Segmentation - Detect changes between topically cohesive sections • Topic Tracking - Keep track of stories similar to a set of example stories • Topic Detection - Build clusters of stories that discuss the same topic • First Story Detection - Detect if a story is the first story of a new, unknown topic • Link Detection - Detect whether or not two stories are topically linked Major Tasks

  8. General Linear Abstraction of Seasonality (GLAS) • Henderson Filter (HF) • Lowess (LW) • Smoothing splines (SS) • Kalman Filter (KF) Methods

  9. It’s a package currently used in Bank of England for seasonal adjustment and trend estimation. • The trend series is constructed using a moving –average of data with triangular shaped weighting pattern. GLAS

  10. It’s used in the X11-ARIMA and X-12-ARIMA packages which are also packages currently used in Bank of England. • The rational is the same as GLAS, but using a different weighting pattern. HF

  11. Lowess identifies a certain number of nearest-neighbors to a given point, x0, and assigns a weight to each neighbor based on the distance of that neighbor to the point. A value of the trend at x0 is then calculated based on these weights. • The number of nearest neighbors which are used is the smoothing parameter. • The bigger the number, the smoother the trend. LW

  12. The smoothing spline smoother is derived as the explicit solution to the functional minimization problem. • represents the smoothing parameter, which is the trade-off between the smoothness of the curve (the second derivative term in the integral) and the fidelity to the data (the residual sum of squares). SS

  13. This approach employs the idea of structural time series modeling where the unobserved component of trend is assumed to follow a well-defined stochastic process. • General form for the trend component is given below. KF

  14. TDT based Power Efficiency Web Server

  15. Server power consumption is rapidly becoming a hot topic in the IT industry.  • Over the last decade, power has emerged as a critical design constraint in modern computer architecture. In many cases system power consumption is increasing exponentially. Motivation

  16. SIS Coordinator Implementation

  17. SIS based TDT 1st KB Enumerator Eliminator Concentrator 2nd KB Implementation(cont’d)

  18. Implementation(cont’d)

  19. For most data centers, the cost of power has become a top budget item.  In fact, in 2008, the average cost of power used by a server exceeded its purchase price (4). • Nationally, the EPA estimated data center power consumption to cost over $4.5 Billion a year in 2006, projected to grow to $7.4 Billion in 2011 (5). • One main reason is typically, due to lack of communication between the guys that pays the power bill, and the IT department that operates the servers. Conclusion

  20. Shih and Peng “Building Topic/Trend Detection System based on Slow Intelligence ” • Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y., "Topic detection and tracking pilot study: Final report" • Bianchi, M., Boyle, M., and Hollingsworth, D., "A comparison of methods for trend estimation" • Belady, Christian. 2007. “In the Data Center, Power and Cooling Costs More Than the IT Equipment it Supports.” Electronics Cooling. Vol. 23, No. 1, February 2007.  • U.S. Environmental Protection Agency. 2007. “EPA Report to Congress on Server and Data Center Energy Efficiency”. References

  21. Q & A

  22. The End

More Related