1 / 28

Container Practices at IHEP

Container Practices at IHEP. Wei Zheng Computer Center, IHEP,CAS 2019-4-3. Contents. IHEP Introduction Container Practices Container Orchestration with Kubernetes Container Security Next Work Summary. Introduction to IHEP Computing Platform.

derving
Download Presentation

Container Practices at IHEP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Container Practices at IHEP Wei Zheng Computer Center, IHEP,CAS 2019-4-3

  2. Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019

  3. Introduction to IHEP Computing Platform • IHEP: Institute of High Energy Physics, Chinese Academy of Science, largest fundamental research center in China • IHEP Computing Center: network and computing service to HEP experiments BEPCII/BESII CEPC LHAASO CSNS JUNO DYB ISGC 2019

  4. IHEP Local Cluster • Computing • HTC computing • ~10,000 cpu cores • Job slots utilization: >85%, 11.7 million jobs (2018.12-2019.3) • HPC Computing • 125 work nodes: 2,808 CPU cores • 10 GPU worker nodes, 0.5 PFLOPS • 80 GPU cards : NVIDIA Tesla V100 nvlink, 32GB ISGC 2019

  5. IHEP Local Cluster • Login nodes • 30+ login nodes shared for all users • More than 200 active users • Storage • Lustre: 10 instances, totally 10 PB, 67% used • EOS: 2 instances, 3.5PB capacity ISGC 2019

  6. Remote site cluster • Remote sites of BESIII distributed computing • Not big scale cluster • IHEP manpower is responsible for unified operation and maintenance ISGC 2019

  7. Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019

  8. Motivation • More software/services provide container APP versions • Store the old physical software runtime environment • Improve resource utilization by scheduling container jobs to run on remote sites/cloud • Automated deployment and scaling of services through container orchestration • …. ISGC 2019

  9. Container performance- disk I/O • Lustre junofs disk performance in container • Benchmark IOzone container I/O performance loss less than 2% ISGC 2019

  10. Container performance - job running • JUNO data processing in Bare-metal , VM and Docker • Compared with VM, Docker container has better resources utilization, Docker Performance Loss(PL) between 0.5%~2%, VM PL 3%~5%

  11. Container images create/build • Docker image • Create image by mkimage.sh script • Build an image from custom dockerfile • Download from dockerhub/dockercloud • Use the officially provided image • Singulairy image • Download from shub/dockerhub • Build from singularity custom recipe files • Every image build both a writable and a read-only image ISGC 2019

  12. Container images create/build Image script [root@bws0780 ~]# bash mkimage.sh -p yum -g " Base" SL75 From slc65-base MAINTAINER zhengwei RUN yum install -y make gcc-c++ gccbinutil \ && yum install -y libX11-devel libXpm-devellibXft-devellibXext-devel \ && yum install -y install mesa-libGL-develftgl-develmysql-devel\ && yum install -y fftw-develgraphviz-devel \ && yum install -y avahi-compat-libdns_sd-devel python-devel \ && yum install -y libxml2-devel gsl-static gsl-devel \ && yum install -y qt-devel && yum clean all CMD /bin/bash Docker file Singularity> cat Singularity-SL55Base BootStrap:yum OSVersion: 5.5 MirrorURL: http://mirror.ihep.ac.cn/slc/slc55/x86_64/SL UpdateURL: http://mirror.ihep.ac.cn/slc/slc55/x86_64/updates/RPMS Include: yum %setup………… Singularity recipe file ISGC 2019

  13. Container images type • Operation System SL7.X SL6.X SL5.5 • Login/Work Node LoginNode SL75/65/69/55 WorkerNode SL65/69/55/58 • Physical software Bes, Juno, Lhaaso…. • Services Mysql, MonitorAgent, Apache, Grafana…… ISGC 2019

  14. Container images stroage • Docker images • Ihep private docker registry • AFS /CVMS • Singulairy images • AFS/CVMS [root@mirror SL55]# ll -th -rwxr-xr-x 1 root root 8.2G Feb 28 15:30 WorkNode55-writable-20190227.img -rwxr-xr-x 1 root root 2.0G Feb 27 09:39 WorkNode55-onlyread-20190227.img ISGC 2019

  15. Hep_container tool • Hep_container • Develop a container tool for IHEP computing platform users • Based on singularity, support docker in future • Satisfy users’ various container requirements • Location and type of the container images are transparent to the user • Easy to update container images • Automatically mount directory according to user’s group name • Lustre/eos/afs/cvmfs • Besfs/afs/cvmfs/…… • Support for IHEP and PKU site ISGC 2019

  16. Container with job scheduler • Htcondor Docker universe • A docker universe job run a Docker container from image • HTCondor manages the running container as HTCondor job on an execute host • Then the running container can then be managed as any HTCondor job.  universe = docker docker_image = Juno-worknode65 executable = /bin/cat arguments = /etc/hosts should_transfer_files = YES when_to_transfer_output = ON_EXIT output = out.$(Process) error = err.$(Process) log = log.$(Process) request_memory = 100M queue 1 ISGC 2019

  17. Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019

  18. Kubernetes build container LoginFarm • LHAASO experiment computing Login farm • Login nodes are managed by openstack+kubernetes • Start containers of SL7 Login nodes on Openstack VM platform by Kubernetes • Auto dynamic expansion • kube-proxy for load balancing • Cluster Loginfarm is more stable and highly available ISGC 2019

  19. Loginfarm of LHAASO kubernetes • Openstack Queens(RDO) • Docker • V18.06 • Kubernetes nodes • 1 master and 1 HA • 3 slave worker • V1.12.0 • Host OS • CentOS 7.6 • VM Instance OS • Scientific Linux release 7.6 • Container OS • Scientific Linux release 7.5 ISGC 2019

  20. Dashboard of LHAASO kubernetes ISGC 2019

  21. Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019

  22. Container security practice • Image Security • Build our own base images OS • Pull official image with docker certified and certified publisher • Offer read-only singularity image for users • Scan images to detect and prevent containers with known vulnerabilities or malicious packages • Host Security • Run containers as non-root users • Least privilege • Only needed run with –privileged=true or –cap –add ISGC 2019

  23. Container security parctise • Kubernetes Security • Use Namespaces to Establish Security Boundaries • Update to the stable Kubernetes 1.11.0->1.12.0 for Kubernetes privilege escalation vulnerability CVE-2018-1002105 • Update docker v17.03->v 18.06 to solve risk CVE-2019-5736 [root@lhmtk8s01 ~]# kubectl get namespace NAME STATUS AGE default Active 102d ingress-nginx Active 7d23h kube-login Active 90d kube-public Active 102d kube-system Active 102d ISGC 2019

  24. Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019

  25. Next Work • Cluster SL6 os will upgrade to SL7 this year, SL6 and SL5 jobs will be only running in container • BESIII software container are building in process, which will be used on Tianhe-2, a fast national supercomputer in GuangZhou • Auto schedule jobs to remote sites or cloud to utilize their idle resources • More remote site such as ustc sdu buaa will join in • Jupyter notebook platform will be offered ISGC 2019

  26. Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019

  27. Summary • Through test comparison, the performance loss of the container is very little and better than running on VM • Support kinds of customized container image for IHEP experiments • Hep_conainer tool provides users with a unified container portal to meet the customization needs of users and the needs of multiple sites • Realize containerization of LHAASO LoginNode through kuberteness, simple load balancing and scaling • More jobs will be run in container ISGC 2019

  28. Thanks for your attentions! 谢谢! ISGC 2019

More Related