1 / 59

Data Replication CS 188 Distributed Systems February 3, 2015

This article explores the concept of data replication and caching in distributed systems, discussing the benefits, challenges, and differences between the two. It also addresses issues related to read-only replication and replication with writing, providing insights into varying replication factors and potential solutions.

rblaker
Download Presentation

Data Replication CS 188 Distributed Systems February 3, 2015

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Replication CS 188Distributed SystemsFebruary 3, 2015

  2. Some Other Possibilities • What if the machines sharing files are portable and not always connected? • What if the machines communicate across the Internet? • What if the load on some files is too heavy for a single machine?

  3. An Answer to These Questions • Replicate the data • Keep multiple copies of the data on different machines • Depending on details, make different copies available for different purposes

  4. How Does This Help? • What if the machines sharing files are portable and not always connected? • Put a replica of the data on the portable machine • What if the machines communicate across the Internet? • Avoid expensive cross-Internet traffic by having replicas on both sides • What if the load on some files is too heavy for a single machine? • Share the load among multiple replicas

  5. Other Replication Advantages • Reliability • If one machine fails, replicas of its data might be elsewhere • Flexibility • Easier to assign data workloads to storage resources

  6. The Replication Concept When in the course of human events it becomes necessary for one people to . . . When in the course of human events it becomes necessary for one people to . . . When in the course of human events it becomes necessary for one people to . . . When in the course of human events it becomes necessary for one people to . . . When in the course of human events it becomes necessary for one people to . . . There is a conceptual object (like a file) We keep more than one physical copy of it Maybe several Each copy is meant to be a full representation of the object So accessing any should be the same as accessing any other

  7. Replication and Caching • The two are obviously similar • Caching usually implies it’s temporary • Replication usually implies it’s permanent • Caching is usually for local use only • Replication is usually for more general use • These distinctions are not actually binary, though • Permanent isn’t always really permanent • Some caches service multiple machines

  8. There Are Some Differences • For example, invalidation on write is feasible for cached data • It isn’t feasible for replicated data • One can always throw away a cached copy of data (modulo local needs) • One can’t always throw away a replica • Especially the only one

  9. Replication and Reading • If the data is read-only, the replication problem is easy • IF . . . • The problems arise if the data is ever written • Life then becomes much more complicated

  10. Read-Only Replication • Merely ensure that all copies start off the same • They never change • Accessing any copy as good as any other • Still a problem of finding and choosing replicas to access

  11. Read-Only Data and Metadata • Usually we treat file metadata as part of the file • Maybe the data is read only • But is the metadata? • How about access permissions? • How about access time? • If metadata can be updated, you still have issues

  12. Choosing Read-Only Replicas • Mostly a performance question • Which one is “closest?” • Which one is “least loaded?” • Initial placement might make a big difference • And what if replicas can move?

  13. Varying Read-Only Replication Factors • We can add or delete read-only replicas easily • Some issues regarding open files • When should we add a replica? • When should we delete a replica? • When should we move a replica to a different location?

  14. Replication and Writing • Life becomes complicated when you write replicated data • Physically the write occurs at one copy • Logically the write should be applied to all copies • Going from the physical reality to the logical goal is challenging

  15. Illustrating the Problem When in the course of human events it becomes necessary for one people to . . . Forescore and seven years ago, our forefathers brought forth . . . When in the course of human events it becomes necessary for one people to . . . Forescore and seven years ago, our forefathers brought forth . . . We write to the yellow replica The yellow and blue replicas should be the same, but they aren’t What do we do? Problem solved! But . . .

  16. A Fly in the Ointment When in the course of human events it becomes necessary for one people to . . . Forescore and seven years ago, our forefathers brought forth . . . When in the course of human events it becomes necessary for one people to . . . We’ve gotten ourselves into this state What if the writer’s next access is to the other replica?

  17. A Worse Situation When in the course of human events it becomes necessary for one people to . . . Forescore and seven years ago, our forefathers brought forth . . . What if someone else reads the other copy?

  18. An Even Worse Situation When in the course of human events it becomes necessary for one people to . . . Ask not what your country can do for you, but what you can do for your country Forescore and seven years ago, our forefathers brought forth . . . What if someone else writes the other copy?

  19. These Situations Arose Before Distributed Computing • What if there are two processes on one machine? • What if they read a file and then both choose to write it? • Or one writes without the other’s knowledge? • Still problematic, but easier to solve

  20. Single Machine Solutions • Have only one copy of shared data • Replication advantages less on a single machine, anyway • Use locks to control access to shared data • Both solutions rely on a single piece of storage that both parties consult • So they don’t work on two machines

  21. Cross-Machine Locking • Why can’t I just share a lock between two machines? • A lock is really a piece of data • Saying who holds it • Either you store it on one machine or on both • Storing on just one leads to performance and reliability problems • Storing on both gets us back to our original problem • But now the shared data is the lock itself

  22. Primary Copy Options • Only allow writes to one replica • So no issue of conflicting writes to different replicas • Doesn’t solve the read/write concurrency problem • Issues if the primary copy fails • Or if its server is overloaded • Or if there are network partitions

  23. A Diversion Into Clocks • Ultimately, these issues relate to the question of ordering events • What order do things happen in? • In a distributed system • One form of ordering used a lot in the real world is time • Can we use time to solve our problem?

  24. Time Services • One way to make things happen in order is to timestamp them • Read a clock and slap a time stamp on the event • As in normal life, things only happen in time order • Possible solution for ordering distributed events

  25. Time Services and Replication • Maybe we can slap a timestamp on every write • And maybe use timestamps to control reads • The timestamps of multiple writes control the order in which they occur • Doesn’t solve all the problems, but does solve some

  26. Read the clock Read the clock 3:15 3:15 To B Read the clock To C 3:15 To B 3:22 3:27 3:15 3:27 3:22 Using a Clock Node 2 Node 1 A A A B B C C Node 3 Now B can know the proper order of writes

  27. The Problem With Clocks • A clock is (ultimately) a physical resource • So it’s in exactly one place • We use messages to access remote places • And messages take varying amounts of time to get from one place to another • So, with a single clock, can’t guarantee proper ordering

  28. Solutions to Clock Problems • Physical clocks • Logical clocks

  29. Physical Clocks • Each node keeps its own local clock • Modern machines always have them, anyway • Stamp each synchronizable event with the local clock • Problem becomes keeping the clocks synchronized

  30. Globally Accessible Clocks • In the general case, this usually means GPS clocks • GPS satellites broadcast highly accurate clock signals • Over the entire Earth’s surface • Anyone with a GPS receiver that’s working can hear it

  31. Pros and Cons of Physical Clocks • Simplicity • Need constant access to clock • Transmission errors/delays damage synchronization • Requires strong knowledge of transmission delays • Never possible to reduce clock skew to zero

  32. Logical Clocks • Don’t try to keep track of passage of actual time • Use a logical mechanism to keep track of proper order of events • Essentially, assign artificial timestamps that maintain the causality required for the computation

  33. When Are Logical Clocks Useful? • When relative order of events is the issue • Rather than relationship to wall clock time • Often the case for operations of distributed applications • Not always when there is a relationship to the real world

  34. Lamport Clocks • Fundamental logical clock system • Each process Pi has a clock Ci • Each event is assigned a time at its processor • is the happens-before relation ab means a happened before b • If ab, C(a) < C(b)

  35. Implementing Lamport Clocks • Whenever an event occurs, increment the local clock • Assign new value to event • But how do we provide the correct global view? • Since processes live on different processors

  36. Handling Messages in Lamport Clocks • Processes communicate only via send and receive of messages • Which are events • If Pi sends to Pj, Ci(send) < Cj(receive) • Since send must happen-before receive • How do we force that?

  37. Rules for Lamport Clocks 1). If ab within the same process, C(a) < C(b) 2). If a is a sending event in Pi and b is the corresponding receiving event in Pj, then C(a) < C(b) • Enforcing Rule 1 is easy, since it’s on the same processor

  38. Enforcing Rule 2 • Timestamp outgoing messages with time of send • Receiver j adds increment d to maximum of message timestamp and local clock • Cj= max(C(a), Cj) + d • C(b) = Cj • Ensures that receive event b gets a clock value after send event a

  39. a send 2 0 1 2 2 2 receive 0 0 0 2 3 Lamport Clocks Example 1 i 1 2 j 3 C(a) =1, C(send) = 2, C(receive) = 3 C(a) < C(send) C(send) < C(receive)

  40. Properties of Lamport Clocks • Happens-before is transitive • If ab and bc, then ac • If ab, then C(a) < C(b) • But the converse is not true • C(a) < C(b) does not imply ab • How can that happen?

  41. a d b 2 2 0 1 0 0 1 0 Lamport Clock Example 2 i 1 2 j 1 C(a) =1, C(b) = 2, C(d) = 1 C(a) < C(b) C(d) < C(b) ????!!!!????!!!!

  42. The Sad Truth About Distributed Systems Concurrency • Abandon all hope ye who enter here • You’ve got to forget your godlike view • In the absence of a physical clock, • YOU CAN’T ORDER ALL EVENTS PROPERLY!!!!!!!! • But perhaps you don’t believe that . . .

  43. a d b 2 0 0 1 1 1 1 0 Lamport Clock Example 3 i 1 2 j 1 C(a) =1, C(b) = 2, C(d) = 1 But the “order” of events was different than before

  44. Why Do We Have This Problem? • Not really because we aren’t keeping a physical clock • It’s because we aren’t communicating enough to derive the order • If each process sent the other a message after each local event, our examples would have proper ordering

  45. d send b a 1 3 2 0 3 3 3 Synchronize receive 0 0 5 0 4 0 Obtaining the Proper Order for Example 2 i 1 2 3 j 4 5 C(a)<C(b), C(b)<C(d)

  46. receive a d b send 4 5 3 0 0 0 2 Synchronize 2 2 1 0 2 2 And For Example 3 i 3 4 5 j 2 1 C(d) < C(a), C(a) < C(b)

  47. But There’s a Problem • What if we have true concurrency? • What if an event occurs while a synchronization message is in transit?

  48. d a send 1 0 0 0 2 Synchronize 2 0 2 1 Lamport Clocks Example 4 i 1 j 2 1 C(d) = C(a) Because of concurrency, you can’t win

  49. Lamport Clocks and Partial Orders • Basic Lamport clocks only give a partial order • They don’t order events with equal times • Easy to provide a full order • Number all processes • Concatenate process number to clock

  50. In Our Examples, • Say process i is numbered 1 and process j is numbered 2 • In example 1, no equal times • In example 2, • C(a) = 1,1 • C(b) = 2,1 • C(d) = 1,2 • So C(a) is ordered before C(d)

More Related