1 / 7

2.3 컴퓨터 클러스터의 설계 원칙 - PowerPoint PPT Presentation

  • Uploaded on

2.3 컴퓨터 클러스터의 설계 원칙. 2.3.1 Single-System Image Featues It means the illusion of a single system, single control, symmetry, and transparency. Single system: the entire cluster is viewed by users as one system that has multiple processors.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' 2.3 컴퓨터 클러스터의 설계 원칙' - amir-clark

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
2.3 컴퓨터 클러스터의 설계 원칙

2.3.1 Single-System Image Featues

  • It means the illusion of a single system, single control, symmetry, and transparency.

    • Single system: the entire cluster is viewed by users as one system that has multiple processors.

    • Single control: Logically, an user or system user utilizes services from one place with a single interface.

    • Symmetry: All clusters services and functionalities are symmetric to all nodes and all users, except those protected by access rights.

    • Location-transparent: The user is not aware of the where about of the physical device that eventually provides a service.

  • Cluster nodes

    • home node

    • local node

    • remote nodes

  • The illusion of an SSI can be obtained at several layers: application software layer, hardware or kernel layer, middleware layer.

Ch. 2-2 Computer Clusters

  • Single Entry Point

    • The single entry point enables users to login to a cluster as one virtual host.

    • The system transparently distribute the user’s login and connection requests to different physical hosts to balance the load.

    • Realizing a Single Entry Point in a Cluster of Computers

      • Fig. 2.13

    • Single File Hierarchy

      • From the view-point of any process, files can reside on three types of locations in a cluster, as shown in Fig. 2.14.

      • A stable storage requires two aspects: persistent, fault-tolerant.

      • Stable storage (global files) could be implemented as one centralized, large RAID disk. But it could also be distributed using local disks of cluster nodes.

    • Single I/O Space over Distributed RAID for I/O-Centric Clusters

      • Fig. 2.16

Ch. 2-2 Computer Clusters

  • RAID

    2.3.2 High Availability through Redundancy

  • When designing robust, high available systems three terms are often used together: reliability, availability, and serviceability (RAS).

    • 신뢰성: 시스템이 고장 없이 얼마나 오래 동작할 수 있는지를 측정

    • 가용성: 시스템이 사용자에게 가용인 시간 백분율

    • 서비스 가능성: 시스템을 서비스(유지, 보수, 업그레이드)하는 것이 얼마나 쉬운지를 말한다.

  • Ch. 2-2 Computer Clusters

    • Availability and Failure Rate

      • Availability=MTTF/(MTTF+MTTR)

      • MTTF (mean time to failure)

      • MTTR (mean time to repair)

    • Planned vs. Unplanned Failures

    • Transient vs. Permanent Failures

    • Partial vs. Total Failures

      • Single Point of failure in an SMP and in Clusters of Computers, Fig. 2.19.

    • Redundancy Techniques

      • Table 2.5 Availability of Computer System Types

    • Isolated Redundancy

      • When a component (the primary component) fails, the service it provided is take over the another component (the backup component).

      • The primary and the backup components should be isolated from each other.

      • Benefits

        • not a single point of failure

        • 고장 된 구성요소는 나머지 시스템이 작동 중 일 때, 수리될 수 있다.

        • 주된 구성요소와 백업 구성요소는 서로 테스트하고 디버거 할 수 있다.

    Ch. 2-2 Computer Clusters

    • N-Version Programming to Enhance Software Reliability

      • The software is implemented by N isolated teams who may not even know the other exist.

      • Different teams are asked to implement the software using different algorithms, programming languages, environment tools, and even platform.

      • In a fault-tolerant system, the N versions all run simultaneously and their results are constantly compared. If the results differ, the system is notified that a fault has occurred.

        2.3.3 Fault-Tolerant Cluster Configurations

    • Three ascending levels of availability

      • Hot standby server clusters

      • Active-takeover clusters

      • Failover cluster

        • 시스템 대체작동은 다수의 기능들: 고장 진단, 고장 공지, 고장 복구를 제공해야 한다.

    • Recovery Scheme

      • Backward recovery

        • Checkpoint

        • Rollback

    Ch. 2-2 Computer Clusters

    2.4 클러스터 작업 및 자원 관리

    2.4.1 Cluster Job Scheduling Methods

    • Cluster jobs may be scheduled to run at a specific time (calendar scheduling) or when a particular event happens (event scheduling).

    • Table 2.6 Job Scheduling Issues and Schemes for Cluster Nodes

    • Space Sharing

      • Multiple jobs can run on disjointed partitions of nodes simultaneously.

      • At most, a process is assigned to a node at a time.

      • Job Scheduling by Tiling over Cluster Nodes, Fig. 2.22

    • Time Sharing

      • Independent scheduling (local scheduling)

      • Gang scheduling

        • The gang scheduling scheme schedules all processes of a parallel job together.

        • When one process is active, all processes are active.

      • Competition with foreign jobs

    Ch. 2-2 Computer Clusters

    2.4.2 Cluster Job Management Systems

    • A Job Management System (JMS) should have three parts:

      • user server

      • job scheduler

      • resource manager: 자원 할당/감시, 스케줄링 정책 시행, 회계정보 수집

    • JMS Administration

    • Cluster Job Types

    • Characteristics of a Cluster Workload

      • NAS 벤치마크 경험에 기초한 작업 부하 특성, p. 108 참조

    • Migration Schemes

      • Node availability

      • Migration overhead

      • Recruitment threshold

        • The recruitment threshold is the amount of time a workstation stays unused before the cluster considers it an idle node.

          2.4.3 Load Sharing Facility for Cluster Computing

    Ch. 2-2 Computer Clusters