1 / 18

The Top 10 Reasons Why Federated Can’t Succeed

The Top 10 Reasons Why Federated Can’t Succeed. And Why it Will Anyway. But First…. What is our purpose as a community? Produce (wonderful) new ideas Structure the field Educate the workforce. A Brief History of Federation. Multibase @1980 Many attempts since Functional Relational

topper
Download Presentation

The Top 10 Reasons Why Federated Can’t Succeed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway

  2. But First… • What is our purpose as a community? • Produce (wonderful) new ideas • Structure the field • Educate the workforce

  3. A Brief History of Federation • Multibase @1980 • Many attempts since • Functional • Relational • Object-oriented • Logic-based • XML • Still not solved (think of last night) • And never will be?

  4. Number 10: Robustness • Systems fail • Sources slow or unavailable • In a distributed system, more pieces • => more failures • Users don’t like failures

  5. Number 9: Security • Different systems have different security mechanisms • Hard to create a single coherent view of permissions • Distributed systems are more vulnerable • More points of failure • Hard to make security guarantees • Data is often the corporate jewels • It must be protected

  6. Number 8: Updates • Recording change isn’t always an UPDATE • Application semantics must be accounted for • Application APIs must be reckoned with • ACIDity isn’t always achievable • Not all data sources display ACID properties • Varying degrees of support • Strong transaction semantics not always possible or appropriate • And always painful • Changes to multiple sources must be coordinated • Requirements for consistency vary

  7. Number 7: Configurability • Many architectures possible • Even with pre-existing sources, many choices • Little or no guidance on tradeoffs • Lots of code to install • Federation engine, data source clients • Often choices here • Lots of connections to define • Need tooling to support

  8. Number 6: Administration • Monitoring is hard • Not all sources have facilities to track events • Variety of mechanisms for different events, and different sources • Not always APIs • Tuning is difficult • Need to understand what must change • Need to take appropriate actions • Repairing is painful • Distributed debugging • Different vendors to deal with for fixes

  9. Number 5: Semantic heterogeneity • Hard to identify commonalities • Same terms, different meanings • Different terms, same meaning • Different structures representing different interpretations • Can’t integrate data effectively without them • Can’t make sensible queries

  10. Number 4: Insufficient Metadata • Need metadata to integrate, configure, administer and query • Every data source has different metadata • No uniform standard • Not always collected • Tools to examine and exploit missing

  11. Number 3: Performance (Data Movement) • Distributed queries involve moving data • Geographic distribution is common • WAN is slow • Large data volumes common • Large numbers of objects • Large objects • Caching isn’t a complete answer • Changes can be frequent and hard to track • Storage is not unlimited

  12. Number 2: Performance(Complexity) • Decision-support appls do complex queries • Many choices for how to execute • Big differences in performance among choices • Need data from diverse sources • May not have enough power in source • Performance at sources may vary • Need expensive functions of data • Function may not be implemented everywhere • Flowing the data to the function expensive

  13. Number 1: Performance(Pathlength) • Simple queries (OLTP-like) incur huge overheads • Processing and networking costs • Simple queries are common • Easier to write • Automatically produced • Workflows

  14. So Why Will Federated Succeed? • It has to • Integration one of the top IT issues • And it’s not going away • Alternatives are expensive and/or painful • Write it by hand • EAI/Workflow • Consolidation (warehouse, data marts…)

  15. So Why Will Federated Succeed? (2) • Simple scenarios exist • Don’t need OLTP, high security, great robustness, … for all applications • Customers know their data, or must learn anyway • Needs are so great, compromise is possible

  16. So Why Will Federated Succeed? (3) • Progress on technology being made • 20 years of distributed query processing • Plumbing in place • Commit protocols • Reliable messaging • Connectivity infrastructure • XML (basic community agreement) • XML data format • XML schema • Web services • We’re getting closer

  17. What would we do if it ever did work? • Retire  • Integrate the web? • Data grids • Data Google • P2P database?

  18. For Discussion • Is research in this area warranted? • What are the most important research topics? • Did we miss any?

More Related