1 / 21

Software Connector Classification and Selection for Data-Intensive Systems

Software Connector Classification and Selection for Data-Intensive Systems. Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl. Workshop on Incorporating COTS Software into Software Systems (IWICSS 2007). Agenda. Research Problem and Importance Our Approach

mare
Download Presentation

Software Connector Classification and Selection for Data-Intensive Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl. Workshop on Incorporating COTS Software into Software Systems (IWICSS 2007)

  2. Agenda • Research Problem and Importance • Our Approach • Classification • Selection • Analysis • Evaluation • Precision, Recall, Accuracy Measurements • Related Work • Conclusion & Future Work

  3. Research Problem and Importance ? • Content repositories are growing rapidly in size • At the same time, we expect more immediate dissemination of this data • How do we distribute it… • In a performant manor? • Fulfilling system requirements?

  4. Software Architecture • The definition of a system in the form of its canonical building blocks • Software Components: the computational units in the system • Software Connectors: the communications and interactions between software components • Software Configurations: arrangements of components and connectors and the rules that guide their composition

  5. Data Consumer Data Consumer Data Consumer Data Consumer Component data data Connector Data Distribution Systems Component Data Producer ??? Insight: Use Software Connectors to model data distribution technologies

  6. Data Movement Technologies • Wide array of available OTS “large-scale” connector technologies • GridFTP, Aspera software, HTTP/REST, RMI, CORBA, SOAP, XML-RPC, Bittorrent, JXTA, UFTP, FTP, SFTP, SCP, Siena, GLIDE/PRISM-MW, and more • Which one is the best one? • How do we compare them • Given our current architecture? • Given our distribution scenarios & requirements?

  7. Research Question • What types of software connectors are best suited for delivering vast amounts of data to users, that satisfy their particular scenarios, in a manner that is performant, scalable, in these hugely distributed data systems?

  8. Data Distribution Problem Space

  9. Broad variety of distribution connector families • P2P, Grid, Client/Server, and Event-based • Though each connector family varies slightly in some form or fashion • They all share 3 common atomic connector constituents • Data Access, Stream, Distributor • Adapted from Mehta et al.’s Connector Taxonomy

  10. Connector Tradeoff Space • Surveyed properties of 13 representative distribution connectors, across all 4 distribution connector families and classified them • Client/Server • SOAP, RMI, CORBA, HTTP/REST, FTP, UFTP, SCP, Commercial UDP Technology • Peer to Peer • Bittorrent • Grid • GridFTP, bbFTP • Event-based • GLIDE, Sienna

  11. Large Heterogeneity in Connector Properties

  12. How do experts make these decisions? • Performed survey of 33 “experts” • Experts defined to be • Practitioners in industry, building data-intensive systems • Researchers in data distribution • Admitted architects of data distribution technologies • General consensus? • They don’t the how and the why about which connector(s) are appropriate • They rely on anecdotal evidence and “intuition” 45% of respondents claimed to be uncomfortablebeing addressed as a data distribution expert.

  13. Our Approach: DISCO • Develop a software framework for: • Connector Classification • Build metadata profiles of connector technologies, describing their intrinsic properties (DCPs) • Connector Selection • Adaptable, extensible algorithm development framework for selecting the “right” connectors (and identifying wrong ones) • Connector Selection Analysis • Measurement of accuracy of results • Connector Performance Analysis

  14. DISCO in a Nutshell

  15. Building DCPs of all 13 connectors (Classification) • Rely on Mehta et al. metadata to describe data distribution connectors • Carefully select metadata to include/exclude

  16. Develop complementary selection algorithms

  17. Preliminary Evaluation • We developed 13 connector profiles • Based on literature, expert reviews, and our own development experience • 30 distribution scenarios • 24 score functions (white box) and Bayesian domain profiles with 100 conditional probabilities (black box) Connector Profiles Distribution Scenarios Answer Key Score Bayesian DISCO Clustering Clustering Precision-Recall Analysis

  18. Precision-Recall Results • Error Rate • Probability of incorrectly labeling a connector as appropriate for a scenario • Precision • The fraction of selected connectors appropriate for a scenario • Recall • Probability of detecting a connector as appropriate for a scenario

  19. Related Work

  20. Conclusions & Future Work • Conclusions • Domain experts (gurus) rely on tacit knowledge and often cannot explain design rationale • Disco provides a quantification of & framework for understanding an ad hoc process • Bayesian algorithm has a higher precision rate • Future Work • Explore the tradeoffs between white-box and black-box approaches • Investigate the role of architectural mismatch in connectors for data system architectures

  21. Thank You!Questions?

More Related