1 / 15

VIRTUALISATION OF HADOOP CLUSTERS

This paper explores the virtualisation of Hadoop clusters using Xen hypervisor. It outlines the challenges, steps, and advantages of virtualisation in Hadoop. The paper also suggests enhancements for load balancing, snapshot creation, and intelligent monitoring.

Download Presentation

VIRTUALISATION OF HADOOP CLUSTERS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

  2. Introduction • Physical machine can have a number of smaller virtual machines (VMs), each running a separate operating system instance. • Challenges • partitioning of a machine • concurrent execution of multiple operating systems • Isolation of virtual machines from one another • Support heterogeneity of applications • Low performance overhead • Xen is a virtual machine monitor for x86 that supports execution of multiple guest operating systems  hypervisor, kernel and user space applications

  3. Objective • Automation of creation and deletion of a virtual cluster for hosting Hadoop using Xen • A large physical cluster can be simulated on few physical machines Steps • Input user configuration by editing configuration files. • Generates user specified number of VM running Hadoop. • Users can manage the Hadoop file system • Users can submit jobs for each physical machine.

  4. Need for virtualisation • Ability to recover from software problems quickly by saving a copy of guest image. • High availability by relocating guests when a server machine in inoperable. • Dynamic load balancing by migrating guests from server machines. • Consolidation of many services in one physical machine and administer them independently in VM. • Usage of abundant computational power on the physical machine. Minimisation of cost. • Switch between applications on different OS using hypervisors.

  5. HADOOP CLUSTER CONFIGURATION Host node is configured as master (NN) and also acts as slave (DN) Guest node (DN) is configured as slave

  6. Master is the HostOS which acts as job tracker/Name node. • Slave is the GuestOS which acts as task tracker/Data node.

  7. Steps in implementing • Installation of Xen kernel • Creation of Guest OS • Configuration of Guest OS • Installation of Java Development Kit • Extraction and Configuration of Hadoop Cluster • Creating OS image for new Guest Machines • Creation and removal of other Virtual machines, copy the OS images

  8. Automated Creation of a Hadoop Virtual cluster XML file has configuration details of new VM

  9. Automated Shut down of Hadoop Virtual cluster

  10. Advantages of automated virtualization in Hadoop • Effective isolation of the datanode from the load on the machine caused by other processes makes the datanode more responsive/reliable. • The availability of multiple virtual machines on each machine lowers the granularity of scheduling units thus making it possible to schedule multiple task trackers on the same machine and to improve the overall utilization of the whole clusters. • The snapshot a virtual cluster makes it possible to re-activate the same cluster in the future and start to work from the snapshot. (rollback)

  11. Enhancements • Providing a graphical console for monitoring and managing virtual cluster. • Creation and Migration of virtual machine for the purpose of load balancing. • Enabling snapshot of the virtual machine. For checkpointing • Providing Intelligent Monitoring System which could detect the failure of a virtual machine in the cluster and restarts the particular virtual machine increasing the reliability.

  12. Performance of Physical vs Virtual clusters

  13. Master as a Physical Node 7 NodesData nodes – 6 Virtual nodesName node –1 physical node

  14. 7 NodesData nodes – 1 physical node + 5 Virtual nodesName node –1 virtual node Master as a Virtual Node

  15. Performance with varying number of Virtual nodes

More Related