1 / 15

Cooperative Computing for Data Intensive Science

Cooperative Computing for Data Intensive Science. Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008. What is Cooperative Computing?. By combining our computing and storage resources together, we can attack problems larger than we could alone.

tamma
Download Presentation

Cooperative Computing for Data Intensive Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008

  2. What is Cooperative Computing? • By combining our computing and storage resources together, we can attack problems larger than we could alone. • I can use your computer when it is idle, and vice versa. (Most computers are idle about 90 percent of the day.) • Also known as… • Grid computing, distributed computing, metacomputing, volunteer computing, etc…

  3. Who Needs Coop Computing? • Many fields of study rely on simulation and data processing to conduct science. • Physics, chemistry, biology, engineering, finance, sociology, computer science. • More Computing == Better Results • NOT High Performance: Speed up one program. • High Throughput: Produce as many results as possible over the next day / week / year.

  4. Cooperative Computing Lab • We design and build distributed systems that helps people to attack BIG problems. • Work directly with end users to make sure that our solutions affect the real world. • Operate a modest computing system as both a production service and a research testbed. • Currently about 500 cpus and 300 disks. • CS Research challenges: scalability, robustness, usability, debugging, and performance. http://www.nd.edu/~ccl

  5. What Makes this Challenging? • The Programming Model • I want to process 10 TB of data on 100 machines, then distribute it across 20 disks, then view the best results on my workstation. • Fault Tolerance • Something is always broken! • Performance Robustness • There is always one slowpoke. • Debugging • My job runs correctly here but not there...!?

  6. An Example Collaboration:Biometrics ResearchandDistributed Systems

  7. F A Common Pattern in Biometrics Sample Workload: 4000 images 256KB each 1s per F 185 CPU-days Future Workload: 60000 images 1MB each 0.1s per F 4166 CPU-days

  8. Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. Try 2: Each row is a batch job. Failure: Too many small ops on FS. F F F F F CPU CPU CPU CPU CPU F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN HN Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. Try 4: User gives up and attempts to solve an easier or smaller problem. F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN Non-Expert User Using 500 CPUs

  9. All Pairs Production System 300 active storage units 500 CPUs, 40TB disk Web Portal F G H 4 – Choose optimal partitioning and submit batch jobs. S T F F F 1 - Upload F and S into web portal. 2 - AllPairs(F,S) F F F All-Pairs Engine 6 - Return result matrix to user. 5 - Collect and assemble results. 3 - O(log n) distribution by spanning tree.

  10. Some Results on Real Workload

  11. Collaboration is Where the Interesting Problems Are! (Cooperative ComputingProvides the Resources)

  12. What Makes a Collaboration Work? • Like a marriage? (old joke.) • First, a show of commitment: go after some low hanging fruit, and publish it. • A proposal for funding only succeeds if you have already started working together. • Need very concrete goals: your partner may not share your idea of an interesting tangent. • Students sometimes need a big push to leave their comfort zone and work together.

  13. For more information… • Douglas Thain • dthain@nd.edu • Cooperative Computing Lab • http://www.nd.edu/~ccl • Apply for Summer 2008 REU: • http://www.nd.edu/~ccl/reu Supported by NSF Grants CCF-0621434 and CNS-0643229.

More Related