1 / 19

MySQL and GRID

MySQL and GRID. Gabriele Carcassi STAR Collaboration 6 May 2002 - Proposal. Why?. STAR uses MySQL to keep track of data files (file catalog) There are already many projects concerning file catalogs MAGDA already uses MySQL to store file catalog information

tien
Download Presentation

MySQL and GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May 2002 - Proposal

  2. Why? • STAR uses MySQL to keep track of data files (file catalog) • There are already many projects concerning file catalogs • MAGDA already uses MySQL to store file catalog information • We will not concentrate on this aspect

  3. Why? • STAR reconstruction jobs need access to both data files and databases DB Calibration constants Detector geometry Detector readings Job Raw data file Reconstructed data file

  4. Database replication • DB is being replicated through MySQL replication • Master/slave approach Slave DB Master DB Slave DB Slave DB write access read access update forwarded

  5. GRID and DB replicas • Database replication is essential because: • Increase data availability • Allows locations to run jobs independently (in case of network congestion) • Correction to the DB are available to all locations • In order to successfully execute jobs in a GRID environment, we need database replication to be somehow integrated

  6. Steps • Aid DB administrator to manage complexity of database replication • Tools to install a mirror using GRID technology • Database catalogs and integration with file catalogs • Integrate GRID authentication

  7. 1. Manage complexity • The more server you have, the more complicated is to manage the system • Build a GUI that helps the database administrator (DBA) to have a general picture

  8. 1. Manage complexity • Keep track of all the servers, and the relationship between each other. • Help manage more sophisticated network topologies, for example: Less traffic over WAN BNL LBL Master Master mirror Slaves BNL slaves The arrows shows the direction of the updates NB: This is an example, not STAR topology.

  9. 1. Manage complexity • Configurations • Aid the comparison between server characteristics (i.e. OS version, MySQL version) • Aid the comparison of the settings of the different database servers • User management • Create and delete users on a group of servers

  10. 1. Manage complexity • Consistency checks • Compare row counts of different replicas • Compare master log pointer with slaves • Check slave connections to the master • Evaluate the replication • Monitor CPU and network activity of the servers to help decide if the current number of servers is sufficient

  11. 2. Creating a new mirror • MySQL keeps the slave synchronized, but you have to manually copy the db files during slave initialization • When the DB is already in place, you might have to copy each file by itself, since the total might exceed 2GB • GRID technologies can be used to transfer the first copy of the database from site to site

  12. 2. Creating a new mirror • New mirror • Steps for mirror creation • Install MySQL (manual?) • Copy database files (through GRIDFtp) • Configure master and slave • Creating/deleting a database on an existing mirror

  13. 2. Creating a new mirror • Ease of use • Integration with the previous GUI • Hide as much details as possible and encourage good configuration policies (i.e. create a user with suitable permissions to be used by the slave connection)

  14. 3. Catalogs • Jobs need to know to which database server to connect (for now, this is done by XML files) • In a GRID environment, jobs will contact a file catalog to determine the location of the files to be used • The scheme for the database catalog shouldn’t be different, and it should be as connected as possible to the file catalog

  15. 3. Catalogs • Database catalog • A job should query the catalog to know to which database server to connect • If possible, file catalogs could be used directly. For example, instead of a physical file location, the catalog could return the parameters for the database connection (i.e. “mydb.star.bnl.gov:3301”) • If no connection can be established, the job might ask the catalog for an alternative server

  16. 3. Catalogs • Administration • The DBA should be able to establish the policies that decide to which server a given job should connect (i.e. IP address, user groups) • Integration with the GUI • The GUI would use the catalog to keep track of the different servers • The GUI would help the DBA to assign policies, and check that the catalog information is not corrupted

  17. 4. Authentication • GRID authentication should be integrated to the whole scheme • Connection between database servers • Connection to the catalog • Connection job-database server

  18. 4. Authentication • MySQL 4.0 will integrate SSL for both user connections and between server connections • MySQL can be instructed to accept only certificate authentication • MySQL 4.0 is still in alpha and SSL is not GSI, but: • Some issues should be already addressed • A patch to enable GSI on SSH is available and might provide further insights

  19. 4. Authentication • Integration with the GUI • Just one authentication at the beginning of the application • Same authentication used to authenticate to the servers • User management

More Related