Introduction to distributed system pagerank algorithm l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 90

Introduction to Distributed System & PageRank Algorithm PowerPoint PPT Presentation


  • 287 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to Distributed System & PageRank Algorithm. http://net.pku.edu.cn/~course/cs402/2009/ 彭波 [email protected] 北京大学信息科学技术学院 7/7/2009. 大纲. 作业回顾 分布式系统基础 PageRank 算法的 MapReduce 实现 课程项目. Review of Lecture 1 . SaaS PaaS Utility Computing. “ Data Center is a Computer ”

Download Presentation

Introduction to Distributed System & PageRank Algorithm

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to distributed system pagerank algorithm l.jpg

Introduction to Distributed System&PageRank Algorithm

http://net.pku.edu.cn/~course/cs402/2009/

彭波

[email protected]

北京大学信息科学技术学院

7/7/2009


Slide2 l.jpg

大纲

  • 作业回顾

  • 分布式系统基础

  • PageRank算法的MapReduce实现

  • 课程项目


Review of lecture 1 l.jpg

Review of Lecture 1

SaaS

PaaS

Utility Computing

“Data Center is a Computer”

Parallelism everywhere

Massive Scalable Reliable

Resource Management

Data Management

Programming Model & Tools


Divide and conquer l.jpg

Divide and Conquer

“Work”

Partition

w1

w2

w3

“worker”

“worker”

“worker”

r1

r2

r3

Combine

“Result”


What s mapreduce l.jpg

What’s Mapreduce

  • Parallel/Distributed Computing Programming Model

shuffle

output

Input split


Codelab1 l.jpg

CodeLab1

  • 1.开始eclipse无法运行,主要是因为没有添加本地jar文件

  • 2.在0.13.0中找不到input目录,也没有/user目录,并且也无法上传,有同学说重启后就能看到/user目录,我没有证实。我是在命令行下添加input目录,就会自动创建/user目录。后来的0.17.0虚拟机中我已经把Input目录放好了,就没有这个问题。

  • 3.对Java不熟悉。特别是要根据0.13.0虚拟机修改的时候很多人都反应有困难,看起来大部分人都没有什么Java基础。


Hw1 exercises l.jpg

HW1 Exercises

  • Have you ever encountered a Heisenbug? How did you isolate and fix it?

  • For the different failure typeslisted above, consider what makes each one difficult for a programmer trying to guard against it. What kinds of processing can be added to a program to deal with these failures?

  • Explain why each of the 8 fallaciesis actually a fallacy.

  • Contrast TCP and UDP. Under what circumstances would you choose one over the other?


Exercises l.jpg

Exercises

  • What's the difference between caching and data replication?

  • What are stubs in an RPC implementation?

  • What are some of the error conditions we need to guard against in a distributed environment that we do not need to worry about in a local programming environment?

  • Why are pointers (references) not usually passed as parameters to a Remote Procedure Call?


Exercises10 l.jpg

Exercises

  • Here is an interesting problem called partial connectivity that can occur in a distributed environment. Let's say A and B are systems that need to talk to each other. C is a master that also talks to A and B individually. The communications between A and B fail. C can tell that A and B are both healthy. C tells A to send something to B and waits for this to occur. C has no way of knowing that A cannot talk to B, and thus waits and waits and waits. What diagnostics can you add in your code to deal with this situation?


Exercises11 l.jpg

Exercises

  • This is the Byzantine Generals problem: Two generals are on hills either side of a valley. They each have an army of 1000 soldiers. In the woods in the valley is an enemy army of 1500 men. If each general attacks alone, his army will lose. If they attack together, they will win. They wish to send messengers through the valley to coordinate when to attack. However, the messengers may get lost or caught in the woods (or brainwashed into delivering different messages). How can they devise a scheme by which they either attack with high probability, or not at all?


Introduction to distributed system l.jpg

Introduction to Distributed System

Parallelization & Synchronization


Parallelization idea l.jpg

Parallelization Idea

  • 如果任务可以被cleanly split into n units,并行is very “easy”.


Parallelization idea 2 l.jpg

Parallelization Idea (2)


Parallelization idea 3 l.jpg

Parallelization Idea (3)


Parallelization idea 4 l.jpg

Parallelization Idea (4)


Parallelization pitfalls l.jpg

Parallelization Pitfalls

But this model is too simple!

  • 怎样分配任务:

    • How do we assign work units to worker threads?

  • 执行单元和任务数不匹配

    • What if we have more work units than threads?

  • 怎样把结果聚合

    • How do we aggregate the results at the end?

  • 怎样知道任务都完成了

    • How do we know all the workers have finished?

  • 如果任务不能分割为完全独立的子任务

    • What if the work cannot be divided into completely separate tasks?

What is the common theme of all of these problems?


Parallelization pitfalls 2 l.jpg

Parallelization Pitfalls (2)

  • 这些问题的关键都在于:multiple threads must communicate with one another, or access a shared resource.

  • Golden rule: Any memory that can be used by multiple threads must have an associated synchronization system!


Slide20 l.jpg

Process synchronization refers to the coordination of simultaneous threads or processes to complete a task in order to get correct runtime order and avoid unexpected race conditions.


And if you thought i was joking l.jpg

And if you thought I was joking…


What is wrong with this l.jpg

What is Wrong With This?

Thread 1:

void foo() {

x++;

y = x;

}

Thread 2:

void bar() {

y++;

x++;

}

If the initial state is y = 0, x = 6, what happens after these threads finish running?


Multithreaded unpredictability l.jpg

Multithreaded = Unpredictability

  • When we run a multithreaded program, we don’t know what order threads run in, nor do we know when they will interrupt one another.

  • Many things that look like “one step” operations actually take several steps under the hood:

Thread 1:

void foo() {

eax = mem[x];

inc eax;

mem[x] = eax;

ebx = mem[x];

mem[y] = ebx;

}

Thread 2:

void bar() {

eax = mem[y];

inc eax;

mem[y] = eax;

eax = mem[x];

inc eax;

mem[x] = eax;

}


Multithreaded unpredictability24 l.jpg

Multithreaded = Unpredictability

This applies to more than just integers:

  • Pulling work units from a queue

  • Reporting work back to master unit

  • Telling another thread that it can begin the “next phase” of processing

    … All require synchronization!


Synchronization primitives l.jpg

Lock

Semaphore

Barriers

Synchronization Primitives

Thread 2:

void bar() {

sem.lock();

y++;

x++;

sem.unlock();

}

Thread 1:

void foo() {

sem.lock();

x++;

y = x;

sem.unlock();

}


Too much synchronization deadlock l.jpg

Too Much Synchronization? Deadlock

Thread A:

semaphore1.lock();

semaphore2.lock();

/* use data guarded by

semaphores */

semaphore1.unlock();

semaphore2.unlock();

Thread B:

semaphore2.lock();

semaphore1.lock();

/* use data guarded by

semaphores */

semaphore1.unlock();

semaphore2.unlock();

(Image: RPI CSCI.4210 Operating Systems notes)


And if you thought i was joking27 l.jpg

And if you thought I was joking…


The moral be careful l.jpg

The Moral: Be Careful!

  • Synchronization is hard

    • Need to consider all possible shared state

    • Must keep locks organized and use them consistentlyand correctly

  • Knowing there are bugs may be tricky; fixing them can be even worse!

  • Keeping shared state to a minimum reduces total system complexity


Introduction to distributed system29 l.jpg

Introduction to Distributed System

Fundamentals of Networking


Slide30 l.jpg

TCP

ROUTER

IP

FTP

UDP

HTTP

Gateway

Protocol

PORT

SOCKET

SWITCH

Firewall


What makes this work l.jpg

What makes this work?

  • Underneath the socket layer are several more protocols

  • Most important are TCP and IP (which are used hand-in-hand so often, they’re often spoken of as one protocol: TCP/IP)


Why is this necessary l.jpg

Why is This Necessary?

  • Not actually tube-like “underneath the hood”

  • Unlike phone system (circuit switched), the packet switched Internet uses many routes at once


Networking issues l.jpg

Networking Issues

  • If a party to a socket disconnects, how much data did they receive?

  • Did they crash? Or did a machine in the middle?

  • Can someone in the middle intercept/modify our data?

  • Traffic congestion makes switch/router topology important for efficient throughput


Introduction to distributed system34 l.jpg

Introduction to Distributed System

Distributed Systems


Outline l.jpg

Outline

  • Models of computation

  • Connecting distributed modules

  • Failure & reliability


Models of computation l.jpg

Models of Computation

Instructions

Single (SI)

Multiple (MI)

Single (SD)

Data

Multiple (MD)

Flynn’s Taxonomy


Slide37 l.jpg

SISD

Processor

D

D

D

D

D

D

D

Instructions


Slide38 l.jpg

SIMD

Processor

D0

D0

D0

D0

D0

D0

D0

D1

D1

D1

D1

D1

D1

D1

D2

D2

D2

D2

D2

D2

D2

D3

D3

D3

D3

D3

D3

D3

D4

D4

D4

D4

D4

D4

D4

Dn

Dn

Dn

Dn

Dn

Dn

Dn

Instructions


Slide39 l.jpg

MIMD

Processor

Processor

D

D

D

D

D

D

D

D

D

D

D

D

D

D

Instructions

Instructions


Slide40 l.jpg

Memory(Instructions and Data)

Instructions

Data

Processor

Interface to external world


Slide41 l.jpg

Interface to external world

Interface to external world

Processor

Processor

Instructions

Data

Data

Instructions

Memory(Instructions and Data)

Instructions

Data

Data

Instructions

Processor

Processor

Interface to external world

Interface to external world


Slide42 l.jpg

Memory(Instructions and Data)

Memory(Instructions and Data)

Instructions

Data

Data

Instructions

Processor

Processor

Interface to external world

Interface to external world

Network

Interface to external world

Interface to external world

Processor

Processor

Instructions

Data

Data

Instructions

Memory(Instructions and Data)

Memory(Instructions and Data)


Slide43 l.jpg

Memory(Instructions and Data)

Memory(Instructions and Data)

Instructions

Data

Data

Instructions

Instructions

Data

Data

Instructions

Processor

Processor

Processor

Processor

Interface to external world

Interface to external world

Network

Interface to external world

Interface to external world

Processor

Processor

Processor

Processor

Instructions

Data

Data

Instructions

Instructions

Data

Data

Instructions

Memory(Instructions and Data)

Memory(Instructions and Data)


Outline44 l.jpg

Outline

  • Models of computation

  • Connecting distributed modules

  • Failure & reliability


System organization l.jpg

System Organization

  • Having one big memory would make it a huge bottleneck

    • Eliminates all of the parallelism


Cta memory is distributed l.jpg

CTA: Memory is Distributed


Interconnect networks l.jpg

Interconnect Networks

  • Bottleneck in the CTA is transferring values from one local memory to another

  • Interconnect network design very important; several options are available

  • Design constraint: How to minimize interconnect network usage?


A brief history 1985 95 l.jpg

A Brief History…1985-95

  • “Massively parallel architectures” start rising in prominence

  • Message Passing Interface (MPI) and other libraries developed

  • Bandwidth was a big problem

    • For external interconnect networks in particular


A brief history 1995 today l.jpg

A Brief History…1995-Today

  • Cluster/grid architecture increasingly dominant

  • Special node machines eschewed in favor of COTS technologies

  • Web-wide cluster software

  • Companies like Google take this to the extreme (10,000 node clusters)


More about interconnects l.jpg

More About Interconnects

  • Several types of interconnect possible

    • Bus

    • Crossbar

    • Torus

    • Tree


Interconnect bus l.jpg

Interconnect Bus

  • Simplest possible layout

  • Not realistically practical

    • Too much contention

    • Little better than “one big memory”


Crossbar l.jpg

Crossbar

  • All processors have “input” and “output” lines

  • Crossbar connects any input to any output

  • Allows for very low contention, but lots of wires, complexity

    • Will not scale to many nodes


Toroidal networks l.jpg

Toroidal networks

  • Nodes are connected to their logical neighbors

  • Node-node transfer may include intermediaries

  • Reasonable trade-off for space/scalability


Slide54 l.jpg

Tree

  • Switch nodes transfer data “up” or “down” the tree

  • Hierarchical design keeps “short” transfers fast, incremental cost to longer transfers

  • Aggregate bandwidth demands often very large at top

  • Most natural layout for most cluster networks today


Outline55 l.jpg

Outline

  • Models of computation

  • Connecting distributed modules

  • Failure & reliability


Slide56 l.jpg

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable”

-- Leslie Lamport


Reliability demands l.jpg

Reliability Demands

  • Support partial failure

    • Total system must support graceful decline in application performance rather than a full halt

  • Data Recoverability

    • If components fail, their workload must be picked up by still-functioning units

  • Individual Recoverability

    • Nodes that fail and restart must be able to rejoin the group activity without a full group restart


Reliability demands58 l.jpg

Reliability Demands

  • Consistency

    • Concurrent operations or partial internal failures should not cause externally visible nondeterminism

  • Scalability

    • Adding increased load to a system should not cause outright failure, but a graceful decline

    • Increasing resources should support a proportional increase in load capacity

  • Security

    • The entire system should be impervious to unauthorized access

    • Requires considering many more attack vectors than single-machine systems


Slide59 l.jpg

Ken Arnold, CORBA designer:

“Failure is the defining difference between distributed and local programming”


Component failure l.jpg

Component Failure

  • Individual nodes simply stop


Data failure l.jpg

Data Failure

  • Packets omitted by overtaxed router

  • Or dropped by full receive-buffer in kernel

  • Corrupt data retrieved from disk or net


Network failure l.jpg

Network Failure

  • External & internal links can die

    • Some can be routed around in ring or mesh topology

    • Star topology may cause individual nodes to appear to halt

    • Tree topology may cause “split”

    • Messages may be sent multiple times or not at all or in corrupted form…


Timing failure l.jpg

Timing Failure

  • Temporal properties may be violated

    • Lack of “heartbeat” message may be interpreted as component halt

    • Clock skew between nodes may confuse version-aware data readers


Byzantine failure l.jpg

Byzantine Failure

  • Difficult-to-reason-about circumstances arise

    • Commands sent to foreign node are not confirmed: What can we reason about the state of the system?


Malicious failure l.jpg

Malicious Failure

  • Malicious (or maybe naïve) operator injects invalid or harmful commands into system


Correlated failures l.jpg

Correlated Failures

  • Multiple CPUs/hard drives from same manufacturer lot may fail together

  • Power outage at one data center may cause demand overload at failover center


Preparing for failure l.jpg

Preparing for Failure

  • Distributed systems must be robust to these failure conditions

  • But there are lots of pitfalls…


The eight design fallacies l.jpg

The Eight Design Fallacies

  • The network is reliable.

  • Latency is zero.

  • Bandwidth is infinite.

  • The network is secure.

  • Topology doesn't change.

  • There is one administrator.

  • Transport cost is zero.

  • The network is homogeneous.

-- Peter Deutsch and James Gosling, Sun Microsystems


Dealing with component failure l.jpg

Dealing With Component Failure

  • Use heartbeats to monitor component availability

  • “Buddy” or “Parent” node is aware of desired computation and can restart it elsewhere if needed

  • Individual storage nodes should not be the sole owner of data

    • Pitfall: How do you keep replicas consistent?


Dealing with data failure l.jpg

Dealing With Data Failure

  • Data should be check-summed and verified at several points

    • Never trust another machine to do your data validation!

  • Sequence identifiers can be used to ensure commands, packets are not lost


Dealing with network failure l.jpg

Dealing With Network Failure

  • Have well-defined split policy

    • Networks should routinely self-discover topology

    • Well-defined arbitration/leader election protocols determine authoritative components

      • Inactive components should gracefully clean up and wait for network rejoin


Dealing with other failures l.jpg

Dealing With Other Failures

  • Individual application-specific problems can be difficult to envision

  • Make as few assumptions about foreign machines as possible

  • Design for security at each step


Conclusions l.jpg

Conclusions

  • Parallel systems evolved toward current distributed systems usage

  • Hard to avoid failure

    • Determine what is reasonable to plan for

    • Keep protocols as simple as possible

    • Be mindful of common pitfalls


Pagerank l.jpg

PageRank算法


Background l.jpg

Background

  • 1995,10M 网页;1995/12/15发布 altavista.digital.com时已经索引了16M 网页

  • 1997,25M queries/day,$50M revenue

  • But …

Digital Equipment Corporation

How to calculate the importance of a Web page?


Social network l.jpg

社会网络(social network)

  • 任何一种用于建立个体之间联系的自然现象、社会活动或技术机制都可能形成一张网

    • “朋友关系”(对称,无向图)

    • “知晓关系”(不对称,有向图)

    • “文献引用关系”(不对称,有向图)

    • co-author关系(对称,无向图,成块“clique”)

    • 通电话,通信

    • 病毒传染(生物、计算机)

    • 网页链接关系(不对称,有向图)

      • 还可以考虑不同的“尺度”:网站之间,城市之间,省份之间,国家之间,…


Slide77 l.jpg

研究这些“关系图”有什么意义?

  • 一阶指标(“入度”)

    • 知晓关系:社会知名度

    • 引用关系:认可程度

  • “高阶指标”

    • 和一个著名人物“共同发表”论文的“距离”:越短似乎显得越“有荣誉”(例如,Erdos number,)

  • 仅仅是“结构”就可以带来丰富的“语义”

    • 例如省份之间的链接数差别可能有有意义的解释

Paul Erdös


Reputation prestige importance l.jpg

Reputation, Prestige, Importance, …

  • “入度”

    • (2,4)〉(1,3)

  • More than “入度”

    • 认识甲的人可能和认识乙的人一样多,但认识乙的人都是些“重要人物”,于是通常应该认为乙比甲重要

    • 不仅是人,论文也是一样,被重要的文章引用的文章可能就比较重要些

    • 谁重要一些?


Random walks over the web l.jpg

Random Walks Over the Web

  • Model:

    • User starts at a random Web page

    • User randomly clicks on links, surfing from page to page

  • What’s the amount of time that will be spent on any given page?

  • This is PageRank


Pagerank defined l.jpg

Given page x with in-bound links t1…tn, where

C(t) is the out-degree of t

 is probability of random jump

N is the total number of nodes in the graph

PageRank: Defined

t1

X

t2

tn


Example l.jpg

example

  • Converge to:p = (0.165, 0.321, 0.230, 0.284)


Computing pagerank l.jpg

Computing PageRank

Properties of PageRank

Can be computed iteratively

Effects at each iteration is local

Sketch of algorithm:

Start with seed PRi values

Each page distributes PRi “credit” to all pages it links to

Each target page adds up “credit” from multiple in-bound links to compute PRi+1

Iterate until values converge


Pagerank in mapreduce l.jpg

PageRank in MapReduce

Map: distribute PageRank “credit” to link targets

Reduce: gather up PageRank “credit” from multiple sources to compute new PageRank value

Iterate until

convergence

...


Pagerank issues l.jpg

PageRank: Issues

Is PageRank guaranteed to converge? How quickly?

What is the “correct” value of , and how sensitive is the algorithm to it?

What about dangling links?

How do you know when to stop?


Homework l.jpg

Homework

  • Lab 3 - PageRank over Wikipedia Corpus

  • Hw3 – Read GFS[2]

    • [Ghemawat, et al.,2003] S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google file system," SIGOPS Oper. Syst. Rev., vol. 37, pp. 29-43, 2003. (pdf)


Slide86 l.jpg

Q&A


Summary l.jpg

Summary


Failure types l.jpg

Failure Types

  • Halting failures: A component simply stops. There is no way to detect the failure except by timeout: it either stops sending "I'm alive" (heartbeat) messages or fails to respond to requests. Your computer freezing is a halting failure.

  • Fail-stop: A halting failure with some kind of notification to other components. A network file server telling its clients it is about to go down is a fail-stop.

  • Omission failures: Failure to send/receive messages primarily due to lack of buffering space, which causes a message to be discarded with no notification to either the sender or receiver. This can happen when routers become overloaded.

  • Network failures: A network link breaks.

  • Network partition failure: A network fragments into two or more disjoint sub-networks within which messages can be sent, but between which messages are lost. This can occur due to a network failure.

  • Timing failures: A temporal property of the system is violated. For example, clocks on different computers which are used to coordinate processes are not synchronized; when a message is delayed longer than a threshold period, etc.

  • Byzantine failures: This captures several types of faulty behaviors including data corruption or loss, failures caused by malicious programs, etc.


8 fallacies l.jpg

8 fallacies

  • The network is reliable.

  • Latency is zero.

  • Bandwidth is infinite.

  • The network is secure.

  • Topology doesn't change.

  • There is one administrator.

  • Transport cost is zero.

  • The network is homogeneous.   


Error conditions l.jpg

Error Conditions

  • Network data loss resulting in retransmit: Often, a system tries to achieve 'at most once' transmission tries. In the worst case, if duplicate transmissions occur, we try to minimize any damage done by the data being received multiple time.

  • Server process crashes during RPC operation: If a server process crashes before it completes its task, the system usually recovers correctly because the client will initiate a retry request once the server has recovered. If the server crashes completing the task but before the RPC reply is sent, duplicate requests sometimes result due to client retries.

  • Client process crashes before receiving response: Client is restarted. Server discards response da


  • Login