1 / 24

Blue Gene Bring Up

Blue Gene Bring Up. Linux on Service Node. SuSE SLES 10 A RAID array is recommended, typically either RAID1 or RAID5 depending on the number of disks available. Either 1 or 2 volume groups depending on the disk configuration (rootvg and datavg). Linux on Service Node. Partitions

atalo
Download Presentation

Blue Gene Bring Up

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Blue Gene Bring Up

  2. Linux on Service Node • SuSE SLES 10 • A RAID array is recommended, typically either RAID1 or RAID5 depending on the number of disks available. • Either 1 or 2 volume groups depending on the disk configuration (rootvg and datavg).

  3. Linux on Service Node • Partitions • / - 1 GB - rootlv • /usr - 3 GB - usrlv • /var - 2 GB – varlv • /opt - 10 GB – optlv • /tmp - 10 GB – tmplv • swap - 4 GB - swap - swaplv • /dbhome - 20GB - dbhomelv • /bgsys - 10GB – bgsyslv

  4. Linux on Service Node • RPMs • cpp, gcc, libgcc, gcc-c++, gcc-64bit, glibc-devel, libgcc-64bit, bison, texinfo, flex, termcap, termcap-64bit, gcc-fortran, gmp, gmp-64bit, gmp-devel, gmp-devel-64bit, ncurses-devel, ncurses-devel-64bit • vacpp.rte-8.0.1-2.ppc64.rpm xlsmp.rte-1.6.1-3.ppc64.rpm xlsmp.msg.rte-1.6.1-3.ppc64.rpm • bgp_os, bgp_base, bgptoolchain • Interfaces • Functional network • Service network • Public network

  5. Groups db2rasdb db2iadm1 db2fadm1 db2asgrp Users bgpsysdb bgpdb2c bgpadmin mpirun NFS IONodes mount /bgsys to finish their boot process, as such /bgsys is exported on the functional network via NFS bgpuser bgpdeveloper bgpadmin bgpservice Linux on Service Node

  6. Front End Node • Groups • bgpadmin • bgpservice • bgpdeveloper • bgpuser • Users • mpirun • Profile • /etc/profile.d/bgp.sh

  7. Group Roles (set using bguser.pl)

  8. DB2 Structure

  9. DB2 - Why use a Database? • Need a software representation of the hardware • A machine of such large scale requires a persistent means of storing errors (RAS events), job history, block definitions, environmental readings, etc. • Operational state of the machine can be obtained without touching the hardware

  10. Other Benefits of a Database • Setting values in the database can trigger actions in other components • Can simplify the design by having policy stored in the database itself via procedures, triggers, and constraints instead of the code • Information can be obtained using existing tools or SQL

  11. DB2 • Product Description • Restricted license • Enterprise Server Edition (ESE) • Client • Database Location • /dbhome/bgpsysdb • Instances • bgpsysdb (server) • bgpdb2c (client)

  12. DB2 concepts • SchemaThe collection of database objects such as tables, views, indexes, and triggers that define the database. • TablesA named data object that consists of a specific number of columns and some unordered rows. • ViewsA logical table that consists of data that a query generates.

  13. DB2 Naming Guidelines for BG/P • Tables always start with TBGP, such as TBGPNodeCard, or TBGPLinkCard • Names are NOT case sensitive in SQL • For each of the tables, there is a view that has the more user-friendly columns, such as location, and without VPD • These are named without the T, such as BGPNodeCard • In cases where some information is omitted from the view, there is also an extra view for diags, such as BGPNodeCardAll • If there is no need for any derived columns in the view, or omitted columns, then an alias is created • i.e. BGPClockCard • The net effect is that almost all the time, using the “BGP” name will show you what you want • If there is a history being kept, then _history is added to the end

  14. TBGPBlock TBGPBPBlockMap TBGPSmallBlock TBGPLinkBlockMap TBGPProductType TBGPMachine TBGPMachineSubnet TBGPMidplane TBGPNodeCard TBGPNode TBGPServiceCard TBGPLinkCard TBGPClockCard TBGPBulkPowerSupply TBGPSwitch TBGPCable TBGPClockCable TBGPLinkChip TBGPICON TBGPFanModule TBGPJob TBGPEthGateway TBGPEGWMachineMap TBGPPortBlockMap TBGPBlockUsers TBGPMidplaneSubnet TBGPNodeSubnet TBGPServiceAction TBGPUserPrefs TBGPReplacement_history TBGPMachine_history TBGPMidplane_history TBGPNodeCard_history TBGPNode_history TBGPServiceCard_history TBGPLinkCard_history TBGPClockCard_history TBGPLinkChip_history TBGPIcon_history TBGPFanModule_history TBGPJob_history TBGPServiceCardEnvironment TBGPFanEnvironment TBGPClockCardEnvironment TBGPBULKPOWEREnvironment TBGPNodeCardPOWEREnvironment TBGPLinkCardPOWEREnvironment TBGPSrvcCardPOWEREnvironment TBGPLinkChipEnvironment TBGPLinkCardEnvironment TBGPNodeEnvironment TBGPNodeCardEnvironment TBGPEventLog TBGPERRCodes TBGPDiagRuns TBGPDiagBlocks TBGPDiagResults TBGPDiagTests BG/P Tables

  15. BGPMidplane BGPMidplaneAll BGPNodeCard BGPNodeCardAll BGPNode BGPNodeAll BGPServiceCard BGPServiceCardAll BGPLinkCard BGPLinkCardAll BGPClockCardAll BGPBulkPowerSupplyAll BGPLinkChip BGPLinkChipAll BGPFanModule BGPFanModuleAll BGPLink BGPClockCardEnvironment BGPDiagTests BGPNodeCardCount BGPLinkCardCount BGPServiceCardCount BGPNodeCount BGPBasePartition BGPBPBlockStatus BGPSwitchLinks BGPLinkBlockStatus BGPSwitchPort BGPPortBlockStatus BGPBlockSize BG/P Views

  16. Database setup • Database Populate This is a Perl script that populates the database with the expected configuration for the Blue Gene system. • InstallServiceAction Verifies that the predefined structure matches the actual configuration • VerifyCables Confirms that the torus network cabling is correct • VerifyIpAddressesConfirms that the IO card IP addresses are correct

  17. DB2/SQL examples • List all tables/views list tables • Describe table/view describe table TBGPmidplane • Extracting data select * from TBGPmidplane More complex select a.position,count(isionode), a.status, a.seqid from tbgpnodecard a left outer join bgpnode b on b.midplanepos = a.midplanepos and b.nodecardpos = a.position and b.isionode = 'T' and b.status <>'M' where a.midplanepos = ‘R00-M0' group by a.position,a.status,a.seqid order by 1

  18. Exercise • Logon to service node as bgpadmin • db2 conect to bgdb0 user bgpsysdb • List tables in the database • List the serial numbers of the nodecards • List only the compute cards

  19. BGP RPMs • RPMs • bgp_os • bgpbase • bgptoolchain • Directory tree • /bgsys • /bgsys/drivers/ppcfloor – symbolic link to current driver sw • /bysys/drivers/ppcfloor/bin - binaries • /bgsys/drivers/ppcfloor/bareMetal – service actions scripts

  20. Site Specific Configuration • Templates are located in /bgsys/local/etc • rc scripts • UIDs and GIDs • profiles • /etc/profile.d/bgp.sh

  21. Shutdown • Run a service action on the clock cards in each rack: tertiary, secondary, primary clock cards • ‘bgpmaster stop’ • stop db2 • Power down rack(s) • Shutdown FEN • Shutdown service node

  22. Startup • Service node • Front end node • Power up racks • ‘bgpmaster start’ • End service actions on clock cards (primary, secondary, tertiary) • Verify all hardware is seen

  23. Unexpected Power Outage • Power off all systems • Power up and boot service node • Power up and boot FEN • Power up rack(s) • ‘bgpmaster start’ • Run install service action

  24. Exercise • Shutdown and startup system • Verify all is well

More Related