1 / 73

Efficient monitoring system for large-scale federated data storages Alexandre Beche

Efficient monitoring system for large-scale federated data storages Alexandre Beche < Alexandre.beche@cern.ch>. Outlines. F ederated data storages Efficient monitoring system Server instrumentation Monitoring dataflow Data visualization Beyond XRootD monitoring

tale
Download Presentation

Efficient monitoring system for large-scale federated data storages Alexandre Beche

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient monitoring system for large-scale federated data storages AlexandreBeche <Alexandre.beche@cern.ch>

  2. Outlines • Federated data storages • Efficient monitoring system • Server instrumentation • Monitoring dataflow • Data visualization • Beyond XRootD monitoring • HTTP/WebDAV federation monitoring • WLCG Transfers Dashboard • Data mining • Summary Alexandre Beche - ITTF

  3. Federated data storages Alexandre Beche - ITTF

  4. Federated data storages • Aggregation of storage systems of any kind into a global namespace via a single access protocol Alexandre Beche - ITTF

  5. Federated data storages • Aggregation of storage systems of any kind into a global namespace via a single access protocol • Federations provides read-only access to world-wide replicated data via virtual entry points (regional redirectors) Alexandre Beche - ITTF

  6. Federated data storages • Aggregation of storage systems of any kind into a global namespace via a single access protocol • Federations provides read-only access to world-wide replicated data via virtual entry points (regional redirectors) • Technologies: XRootD (today) and HTTP/WebDAV (future) Alexandre Beche - ITTF

  7. Federated data storages Global redirector Federated namespace Redirector Data server Client Alexandre Beche - ITTF

  8. Federated data storages Global redirector Federated namespace A client wants to read a file Redirector Data server Client Alexandre Beche - ITTF

  9. Federated data storages Global redirector Federated namespace Query the regional redirector to locate the file Redirector Data server Client Alexandre Beche - ITTF

  10. Federated data storages Global redirector Federated namespace Redirector Data server Client Alexandre Beche - ITTF

  11. Who is using the XRootD • 4 LHC VOs: • ALICE: Uses since many years • ATLAS: Federated ATLAS XRootD (FAX) • CMS: Any data, Anytime, Anywhere (AAA) • LHCb: Uses access protocol only • Talk focuses on AAA and FAX Alexandre Beche - ITTF

  12. Large-scale Alexandre Beche - ITTF

  13. Large-scale • Ongoing deployment (72%): • AAA: 1/1 T0, 3/7 T1, 39/51 T2 • FAX: 1/1 T0, 6/12 T1, 34/44 T2 • Constantly growing Alexandre Beche - ITTF

  14. Large-scale • Ongoing deployment (72%): • AAA: 1/1 T0, 3/7 T1, 39/51 T2 • FAX: 1/1 T0, 6/12 T1, 34/44 T2 • Constantly growing Data traffic: 1.25GB/s Alexandre Beche - ITTF

  15. Efficient monitoring system Alexandre Beche - ITTF

  16. Why monitoring ? Understand data flows to estimate data traffic Provide information for efficient operations Identify access patterns and propose data placement strategies Alexandre Beche - ITTF

  17. Federated data storagesMonitoring layer Global redirector Federated namespace Redirector Data server Client Alexandre Beche - ITTF

  18. Federated data storagesMonitoring layer Global redirector Federated namespace Data server Monitoring collector Monitoring collector Alexandre Beche - ITTF

  19. Federated data storagesMonitoring layer Global redirector Federated namespace Monitoring traffic: 300kB/s Data server Monitoring collector Monitoring collector Alexandre Beche - ITTF

  20. Instrumentation on XRootD serverMonitoring streams

  21. Instrumentation on XRootD serverMonitoring streams Detailed stream Periodic summary data Alexandre Beche - ITTF

  22. Instrumentation on XRootD serverMonitoring streams Detailed stream Aggregated XRootD Traffic per site Periodic summary data Low event rate Alexandre Beche - ITTF

  23. Instrumentation on XRootD serverMonitoring streams High event rate Binary format required Non-blocking protocol Detailed stream Aggregated XRootD Traffic per site Periodic summary data Low event rate Alexandre Beche - ITTF

  24. Instrumentation on XRootD serverMonitoring streams High event rate Binary format required Non-blocking protocol Map: Server, user, and file names mapped to id’s Trace: Per-file I/O information Detailed stream Aggregated XRootD Traffic per site Periodic summary data Low event rate Alexandre Beche - ITTF

  25. Instrumentation on XRootD serverMonitoring streams High event rate Binary format required Non-blocking protocol Per-file information GLED* collector Map: Server, user, and file names mapped to id’s Trace: Per-file I/O information Detailed stream Aggregated XRootD Traffic per site Periodic summary data Low event rate * GLED Developped by MatevzTadel ( UCSD) Alexandre Beche - ITTF

  26. XRootD monitoring dataflow real time Federation GLED Collector UDP Alexandre Beche - ITTF

  27. XRootD monitoring dataflow real time asynchronous Federation AMQ GLED Collector Consumer raw AMQ* x5 stomp stomp UDP * AMQ operated by the CERN messaging team Alexandre Beche - ITTF

  28. XRootD monitoring dataflow real time asynchronous Federation AMQ* x5 AMQ GLED Collector Consumer Raw Stats AMQ* x5 stomp stomp UDP 10 minutes * AMQ operated by the CERN messaging team Alexandre Beche - ITTF

  29. XRootD monitoring dataflow real time asynchronous Federation AMQ GLED Collector Consumer Raw Stats AMQ* x5 stomp stomp UDP 10 minutes External applications WEB API Dashboard UI * AMQ operated by the CERN messaging team Alexandre Beche - ITTF

  30. Transport Layer • Based on messaging technology (ActiveMQ): • Producer separated from the consumer • ~Real time or asynchronous consuming Alexandre Beche - ITTF

  31. Transport Layer Peak 400 msg/s Average number of messages received per second ATLAS CMS • Based on messaging technology (ActiveMQ): • Producer separated from the consumer • ~Real time or asynchronous consuming Alexandre Beche - ITTF

  32. Insertion to database • Load balanced collector • Horizontal scaling Consumer AMQ DB Consumer Alexandre Beche - ITTF

  33. Insertion to database • Load balanced collector • Horizontal scaling Consumer AMQ DB Stompclt* * Developed by the CERN messaging team Alexandre Beche - ITTF

  34. Insertion to database • Load balanced collector • Horizontal scaling Consumer AMQ DB Stompclt* Disk queue* * Developed by the CERN messaging team Alexandre Beche - ITTF

  35. Insertion to database • Load balanced collector • Horizontal scaling Consumer AMQ DB Stompclt* DB inserter • Customizable inserter • Data enhancement • Filtering Disk queue* * Developed by the CERN messaging team Alexandre Beche - ITTF

  36. Insertion to database • Load balanced collector • Horizontal scaling Consumer AMQ DB Simplevisor* Stompclt* DB inserter • Customizable inserter • Data enhancement • Filtering Disk queue* * Developed by the CERN messaging team • Modular architecture • Common building blocks (EPEL) • Reliable Alexandre Beche - ITTF

  37. Database layer FAX 200 GB ~800M records AAA 200 GB ~800M records ORACLE 11g Alexandre Beche - ITTF

  38. Database layer FAX 200 GB ~800M records AAA 200 GB ~800M records ORACLE 11g Daily insert 850 MB / 2M rows Alexandre Beche - ITTF

  39. Database layer FAX 200 GB ~800M records AAA 200 GB ~800M records ORACLE 11g Daily insert 850 MB / 2M rows • Storage • Raw, statistics, metadata • Tables daily partitioned, no global indexes Alexandre Beche - ITTF

  40. Database layer • Oracle is also used as a compute engine • Aggregation of unordered events • PL / SQL Alexandre Beche - ITTF

  41. Database layer • Oracle is also used as a compute engine • Aggregation of unordered events • PL / SQL • Aggregation • Stateless: Full re-computation of touched bins each time • Compute stats from raw data in 10 min bins • Aggregate 10 min stats in daily bins Alexandre Beche - ITTF

  42. Database layer Real challenge to scale up • Oracle is also used as a compute engine • Aggregation of unordered events • PL / SQL • Aggregation • Stateless: Full re-computation of touched bins each time • Compute stats from raw data in 10 min bins • Aggregate 10 min stats in daily bins Alexandre Beche - ITTF

  43. Aggregation methods Transfers 2pm 3pm 4pm 5pm 6pm 7pm Alexandre Beche - ITTF

  44. Aggregation methods Easy method Transfers 2pm 3pm 4pm 5pm 6pm 7pm Alexandre Beche - ITTF

  45. Aggregation methods Easy method Transfers 2pm 3pm 4pm 5pm 6pm 7pm Alexandre Beche - ITTF

  46. Aggregation methods Easy method Transfers 2pm 3pm 4pm 5pm 6pm 7pm Alexandre Beche - ITTF

  47. Aggregation methods Easy method Transfers 2pm 3pm 4pm 5pm 6pm 7pm Adopted method • Both method equivalent if: • Small transfers or many transfers Alexandre Beche - ITTF

  48. Storage investigation • Alternatives to ORACLE under investigation • Evaluates scaling options • Inline with the IT strategy • Promising results with ElasticSearch • Wait for the new aggregation system • Announced for a near future Alexandre Beche - ITTF

  49. Web APIInterface to the DB External applications WEB API Stats Dashboard UI • All database access goes through API • Including our visualization tool • Execute a SQL query • Add topology and apply some filtering • Return data in JSON Alexandre Beche - ITTF

  50. Visualization layerDashboard UI Rich web single-page user interface AJAX+JSON communication Alexandre Beche - ITTF

More Related