1 / 28

Improving Prediction Accuracy of Matrix Factorization Based Network Coordinate Systems

Improving Prediction Accuracy of Matrix Factorization Based Network Coordinate Systems. Yang Chen 1 , Peng Sun 2 , Xiaoming Fu 1 , Tianyin Xu 1,3 1 Institute of Computer Science, University of Goettingen, Germany 2 Department of Electronic Engineering, Tsinghua University, China

nayef
Download Presentation

Improving Prediction Accuracy of Matrix Factorization Based Network Coordinate Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Prediction Accuracy of Matrix Factorization Based Network Coordinate Systems Yang Chen1, Peng Sun2, Xiaoming Fu1, Tianyin Xu1,3 1Institute of Computer Science, University of Goettingen, Germany 2Department of Electronic Engineering, Tsinghua University, China 3State Key Lab. for Novel Software & Technology, Nanjing University, China yang.chen@cs.uni-goettingen.de

  2. Outline • Introduction and Related Work • Study on Prediction Accuracy for Short Links • System Design of Pancake • Performance Evaluation • Conclusion

  3. Background • Distance (RTT) estimation can be used to optimize large scale distributed systems (P2P systems): • Server selection • Application level multicast/anycast • Overlay Routing • BitTorrent (P2P File Sharing) • Problems with direct measurement: • Bad scalability: slow, high overhead measurements

  4. Background (cont.) • Network Coordinate (NC) systems: Lightweight and Scalable Internet distance prediction • Only requires O(N) measurements to predict the distances • Just like a uniform, lightweight, high scalability real-time Internet Map with open API  B. Donnet, B. Gueye, M. A. Kaafar. A Survey on Network Coordinates Systems, Design, and Security. In IEEE Communication Surveys and Tutorial, accepted for publication.

  5. Overview NC System | NC Security|Accuracy & Triangle Inequality Violation (TIV)| Other Network Parameters | Deployment | Applications

  6. Euclidean distance model The ith row denotes the NC of host i

  7. Triangle Inequality Violation (TIV) M(A,C)+M(C,B)<M(A, B) ME(A,C)+ME(C,B)>ME(A, B) Any three hosts with TIV cannot be embedded into Euclidean space within some level of accuracy, for the distances among them in Euclidean space must obey triangle inequality.

  8. Matrix Factorization (MF) based NC systems Internet distance matrix has low rank nature [Tang et al., ACM IMC’03] ME(A,C)+ME(C,B)>ME(A, B) M(i,j) represents the measured distance from host i to host j ME(i,j) represents the predicted distance from host i to host j

  9. Optimization Goal & Existing Systems • Overall Optimization Goal • Existing Systems • IDES (IMC’04, JSAC’06) • Phoenix (Networking’09) • DMF (Networking’10)

  10. Metric • Relative Error • RE of the distance between host i and host j is defined as where smaller RE indicates higher prediction accuracy. When the predicted distance equals to the measured distance, the RE value will be 0.

  11. Main Focus • Prediction accuracy is vital for NC systems • Breakthrough point • Relationship between link distance and relative error • MF based NC: Poor prediction accuracy of short links, i.e., the distances less than or equal to 50ms • We observed similar phenomenon while improving all the three MF based NC systems using our approach • Only the results with Phoenix are shown in this paper

  12. Relative Error of Short Links Target: reduce the prediction error of short links without increasing the prediction error of other links * Vivaldi (ACM SIGCOMM’04) is the most widely NC system so far, which is based on Euclidean distance model 90th Percentile RE (NPRE)

  13. Local Phoenix vs Global Phoenix Suppose applications only interested in a subset of hosts, i.e., hosts in Germany Which way is more accurate??

  14. Decentralized Groupping • Group these N hosts into u clusters in a decentralized way • Step 1: The u hosts are selected randomly as anchors among all N hosts, which will guide the decentralized clustering. • Step 2: For each ordinary host, it will measure its distance to every anchor and join the cluster represented by the nearest anchor.

  15. Median Distances: Intra and Inter Clusters

  16. Local Phoenix vs Global Phoenix (cont.)

  17. System Design of Pancake Prediction of Intra Cluster Link: Local NC Prediction of Inter Cluster Link: Global NC

  18. Algorithm Every odd round Every even round

  19. Extra Measurement Overhead • For each ordinary host • measure its RTTs to every anchor once per hour • Compared with the measurement overhead for the NC calculation, it is negligible • For the anchors • Anchors just need to be able to reply the ICMP PING passively, this causes very light load to the anchors • One million ordinary hosts in the system, the load of each anchor is approximately 2700 PINGs per second

  20. Performance Evaluation • Relative Error of Distance Prediction • Convergence Behavior of Pancake • Evaluation through Dynamic Data Set

  21. Relative Error 90th Percentile RE Dimension= 8, Reference Hosts = 32, Anchor Number = 5

  22. Convergence Behavior of Pancake • Compared with Phoenix, Pancake converges faster and the stabilized prediction error is smaller • Fair Comparison • Phoenix: each update round, its NC will be updated once • Pancake: each update round, Pancake will only update either its global NC or its local NC, instead of updating both of them

  23. Evaluation through Dynamic Data Set Aggregate Data Set (min or median): The elements of the data matrix are taken at different time for the simulation Reflect the real time RTT values in the matrices

  24. Evaluation through Dynamic Data Set (cont.) Average 90th RE

  25. Main Contributions • Intra Cluster MF based NC • Forming clusters in a decentralized way based on locality • Employing MF based NC algorithms such as Phoenix in local clusters achieves better prediction accuracy for intra-cluster links than merely relying on global NC algorithms. • Pancake System • A two-level NC system which can significantly improve the prediction accuracy of existing NC systems • Compatible with existing deployments • Negligible extra communication overhead for end users • Extensive Evaluation • Evaluation based on widely used real Internet data sets • Evaluation based ondynamic data set which reflects the RTT variations over time for all end-to-end links (first work to consider RTT variations)

  26. Future Work • Theoretical study • Understand why the local NC can have better prediction accuracy • Better decentralized clustering algorithm • Forming the clusters effectively and understand the relationship between cluster forming and prediction accuracy • Applications • Potential applications: download mirror selection, match making in online gaming, server placement, etc…

  27. Backup Slides

  28. Triangle Inequality Violation (TIV) D(A,C)+D(C,B)<D(A, B) DE(A,C)+DE(C,B)>DE(A, B) Any three hosts with TIV cannot be embedded into Euclidean space within some level of accuracy, for the distances among them in Euclidean space must obey triangle inequality.

More Related