Research issues in cooperative computing
Download
1 / 27

Research Issues in Cooperative Computing - PowerPoint PPT Presentation


  • 296 Views
  • Updated On :

Research Issues in Cooperative Computing. Douglas Thain http://www.cse.nd.edu/~ccl. Sharing is Hard!. Despite decades of research in distributed systems and operating systems, sharing computing resources is still very difficult. Problems get worse as scale increases: Office Server Room

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Research Issues in Cooperative Computing' - medwin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Research issues in cooperative computing l.jpg

Research Issues inCooperative Computing

Douglas Thain

http://www.cse.nd.edu/~ccl


Sharing is hard l.jpg
Sharing is Hard!

  • Despite decades of research in distributed systems and operating systems, sharing computing resources is still very difficult.

  • Problems get worse as scale increases:

    • Office

    • Server Room

    • Distributed System

    • Computational Grid


Designers go to extremes l.jpg
Designers Go To Extremes:

Cooperative

Computing

Peer to

Peer

Central

Control


How do we share data l.jpg
How Do We Share Data?

P2P File Sharing

(WWW, Napster)

Central Storage Archive

(NFS, UDC, StorageTank.)


Things i can t do today l.jpg
Things I Can’t Do Today

  • Let members of my project team store and retrieve documents from this disk in my office.

    • (Where my boss defines “project team”.)

  • I must have 1 TB of space for one whole week, but it must be stored by someone I know.

    • (Where I give a list of trusted people.)

  • Allow a visitor in my office to use my machine.

    • (But I want her workspace isolated from mine.)

  • This bioinformatics repository can be written by my grad students, read by all ND faculty, and read by anyone approved by the NSF.

    • (Where each list comes from a different source.)


What is cooperative computing l.jpg
What is Cooperative Computing?

  • CC means putting owners in charge.

    • I control who uses my resources.

    • Need tools for expressing trust.

  • CC means respect for social structures.

    • Trust is rarely symmetric.

    • Hierarchy and centralization can be important.

    • Motivation is usually external to the system.

  • CC means ease of use.

    • Resource owners need simple and effective tools.

    • Resource users need to be insulated from failures.


Every user should be a super user l.jpg

Consumption

Allocation

Accounting

Quality of Service

Security

Debugging

Consumption

Allocation

Accounting

Quality of Service

Security

Debugging

Consumption

Allocation

Accounting

Quality of Service

Security

Debugging

Every User Should be a Super-User

Allocation

Accounting

Quality of Service

Security

Debugging

Super-

User


Vision of cooperative storage l.jpg
Vision of Cooperative Storage

  • Make it easy to deploy systems that:

    • Allow sharing of storage space.

    • Respect existing human structures.

    • Provide reasonable space/perf promises.

    • Work easily and transparently without root.

    • Make the non-ideal properties manageable:

      • Limited allocation. (select, renew, migrate)

      • Unreliable networks. (useful fallback modes)

      • Changing configuration. (auto. discovery/config)


Slide9 l.jpg

Where can I find

100 GB for 24 hours?

storage

catalog

access

control

server

Is this a

member of

the CSE dept?

status

updates

Resource Policy

storage

server

Make reservation

and access data

Members of the

CSE dept can borrow

200 GB for one week.

Evict user!

Who is here?

?

basic

filesystem


Cooperative storage pool l.jpg

dist. computation

dist. file system

backup system

Cooperative Storage Pool

storage

server

storage

server

storage

server

storage

server

storage

server

storage

server

disk

disk

disk

disk

disk

disk


Cooperative computing is useful in the office but it is badly needed on the grid l.jpg
Cooperative Computingis useful in the office…but it is badly neededon the Grid!


On the grid l.jpg

CPU

CPU

gate

keeper

Condor Batch System

CPU

CPU

gate

keeper

gate

keeper

Maui Scheduler

PBS batch system

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

On the Grid

job

job

job

job

job

job

job

job

Work Queue


Grid computing experience l.jpg
Grid Computing Experience

Ian Foster, et al. (102 authors)

The Grid2003 Production Grid:

Principles and Practice

IEEE HPDC 2004

The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by…

ATLAS, CMS, SDSS, LIGO…


Grid computing experience14 l.jpg
Grid Computing Experience

The good news:

  • 27 sites with 2800 CPUs.

  • 40985 CPU-days provided over 6 months.

  • 10 applications with 1300 simultaneous jobs.

    The bad news:

  • 40-70 percent utilization.

  • 30 percent of jobs would fail.

  • 90 percent of failures were local problems.

    The lessons:

  • Most site failures were due to disk space.

  • Debugging most problems was impossible.


Coop computing and the grid l.jpg
Coop Computing and the Grid

  • The Grid is a boundary case of CC.

    • Large scale, high performance.

    • Allocate resources to partially trusted visitors.

    • Everyone wants to exhaust resources.

  • Can CC scale from the office to the grid?

    • If it is easy for one person to deploy in an office… then it will be usable enough to work on the grid.


More cooperative computing l.jpg
More Cooperative Computing

  • Nested Principals & Authentication

    • Simple question: How to allow a visitor?

  • Distributed Access Control

    • Can we find something more usable than PKI?

  • Storage Abstractions

    • Can we do better than files/directories?

  • Data-Intensive Grid Computing

    • How do I use storage and CPU together?

  • Distributing Debugging

    • Consider it a distributed query problem.


Cooperative computing credo l.jpg
Cooperative Computing Credo:

Make computer structures

model social structures...

Not the other way around!


For more information l.jpg
For more information…

The Cooperative Computing Lab

http://www.cse.nd.edu/~ccl

Prof. Douglas Thain

[email protected]


Two related problems l.jpg
Two Related Problems

  • Users don’t have direct control.

    • I need 50 GB of storage for one week.

    • Allow my collaborators to use my space.

      (Usually considered administrative tasks.)

  • Users don’t have direct information.

    • Why was I denied this allocation?

    • What series of steps was used to run my job?

      (Usually considered implementation details.)


The current situation l.jpg

% cp

% emacs

% vi

GET

PUT

open

close

read

write

chirp

tool

parrot

catalog

server

hostname

kerberos

GSI

filesystem

libchirp

libchirp

libchirp

status

updates

simple

ACL

The Current Situation

storage

server

storage

server

storage

server

storage

server

storage

server


Distributed debugging l.jpg
Distributed Debugging

debugger

kerberos

cpu

cpu

workload

manager

auth

gateway

batch

system

cpu

cpu

cpu

cpu

job

log

file

log

file

log

file

archival

host

license

manager

storage

Server

storage

server

storage

server

log

file

log

file

log

file

log

file

log

file


Distributed debugging23 l.jpg
Distributed Debugging

  • Big challenges!

    • Language issues: storing and combining logs.

    • Ordering: How to reassemble events?

    • Completeness: Gaps, losses, detail.

    • Systems: Distributed data collection.

  • But, could be a big win:

    • “A crashes whenever X gets its creds from Y.”

    • “Please try again: I have turned up the detail on host B.”


Grid computing l.jpg
Grid Computing

- The Vision: Make large-scale computing resources as reliable and as simple as the electric power grid or the water utility.

- The Reality: Tie together existing computing clusters and archival storage around the country into systems that are (almost) usable by experts.


Slide25 l.jpg

  • Storage Allocation

    • Give me 50 GB for 24 hours

    • Technical Problem: Building Allocation

  • Distributed Debugging

    • Correlation

    • Hypothesis Proposal

    • Reasoning

    • System Building

    • Adaptation


Slide26 l.jpg

I need ten more CPUs in order to finish my paper by Friday!

CSE grads can computehere, but only when I’m not.

May I use your CPUs?

CPU

CPU

CPU

CPU

CPU

CPU

Is this

person a

CSE grad?

My friends in Italy need

to access this data.

I’m not root!

auth

server

secure I/O

disk

disk

disk

PBs of workstation storage!

Can I use this as a cache?

If I can backup to you,

you can backup to me.

disk

disk


Cooperative computing credo27 l.jpg
Cooperative Computing Credo

  • Put users in charge of their resources.

    • Share resources as they see fit.

    • Expose information for debugging.

  • Mode of operation:

    • Make tools that are foolproof enough for casual use by one or two people in the office.

    • If they really are foolproof, then they will also be suitable for deployment in large scale systems such as computational grids.


ad