430 likes | 600 Views
Large Databases in Industry. Wendy Moncur Department of Computing Science, University of Aberdeen. Large Databases in Industry. Database design & management in a major bank Case study 6000-table Personnel database. My background.
E N D
Large Databases in Industry Wendy Moncur Department of Computing Science, University of Aberdeen
Large Databases in Industry • Database design & management in a major bank • Case study • 6000-table Personnel database
My background • DataBase Administrator (DBA) at one of UK’s largest banks. • Designed databases for high performance & availability. • Platform: DB2 & SQL • Largest database: 6000 tables
DBA Salaries • DBA Average Minimum Salary £41,896 • DBA Average Maximum Salary £47,147 • Source: http://www.itjobswatch.co.uk (2008)
What does a DBA do? • Database design & creation • Quality assurance of SQL • Database optimisation • Performance management • Database administration • Security
Database design & creation • Process of fitting a database design to clients’ requirements. • Database design achieved in 3 phases: • Conceptual – model data independent of all physical considerations • Logical – refine and map conceptual model onto relational model (or some other database model such as object-oriented ) • Physical – map logical model onto a specific DBMS
Quality assurance of SQL • Review application code written by developers • Understand application • Use EXPLAIN to check individual SQL statements • May need to change application or indexes • Are indexes used? • Is the run time acceptable? • Batch • Online
Database optimisation • Improve indexing • Delete redundant indexes • Check order of columns in multi-column indexes matches application needs: • e.g. – Personnel table with index on Surname, FirstName versus FirstName, Surname • Confirm whether indexes should be Ascending or Descending • Verify clustering key is appropriate
Performance management • Clients will specify: • Database implementation date • Online transaction times in milliseconds • Batch process run times • Recoverability of data
Database administration • Once the database is ‘live’: • Backup and recovery strategy • How far back? • How many transactions lost? – depends on business data held • Reorganisation strategy • Trade off between availability & performance • Implementation of changes on database • Application implementation – stability tests
Security • Control access to data in test & production • Even test data may be sensitive • Sample data from live database • LEB: “Baroness Gardner of Parkes” • Coutts is the bank for the Queen …. • Only DBAs should have access to delete or modify the database…. • Use views to control users’ & developers’ information access
Other DBA Deliverables • Documentation for: • Requirements specification • As defined by clients, developers, managers, contractors • Design decisions – in case of problems/ upgrades • Application design reviews and tests • Handover to Production
Career structure • Graduate • Trainee DBA • DBA • May be split into production or development • Production - £££ for being on call • Development – less stress! • Consultant/ Team leader
Case study: the monster database • 6000+ tables • 18000+ indexes
Part1: Challenges • “One size fits all” • External supplier • 6000+ tables • 18000+ indexes • 1 tablespace • Short timescale
Challenges: “one size fits all”? • One size does not fit all. • Performance of SQL statements dependent on: • Database design • Index design • The DATA
Challenges: “one size fits all”? • Every company has different requirements. • Customers demand high performance... and control the budget. • Service Level Agreements (SLAs) dictate … • Minimum transaction speed • Number of concurrent users • Number of remote locations • Daily system availability • Database must be tailored to achieve site-specific SLAs.
Challenges: external supplier • Software package & database from external supplier. • Cannot change this.
Challenges: 6,000+ tables • Cannot change tables: no denormalisation allowed. • Supplied program code demands these tables exist. • Cannot change supplied program code unless essential.
Challenges: 18,000+ indexes • Can change indexes: • Unique indexes • Clustering indexes • Secondary indexes
Unique index • Defines what makes a row unique. • Components of the index cannotbe changed. • Order of componentscanbe changed.
Unique index E.g. – for Table “EMPLOYEE” Unique index =DateOfBirth, Firstname, Surname. Most queries ask for data where only Surname, Firstnameare known. SELECT Surname, Firstname, DateOfBirth From Employee Where Surname= “Jenkins” AndFirstname= “Malcolm” ; Recommendation: Change order of unique index to Surname, Firstname, DateOfBirth.
Clustering indexes • Defines the physical order in which rows of data should be stored. • Components of the index can be changed. • Order of components can be changed.
Clustering indexes E.g. – Table “EMPLOYEE” Clustering index = DateOfBirth Yet most queries order by EmploymentStartDate SELECT EmploymentStartDate, Surname, Firstname From Employee Where Surname = “Jenkins” And Firstname = “Malcolm” ; Order by EmploymentStartDate; Recommendation: Change clustering index to use EmploymentStartDate.
Secondary indexes • Not unique. • Do not dictate how the data is to be held. • Created to improve performance of queries and updates. • Increases cost of insert and update, as must be created and maintained along with the table. Recommendation: Drop superfluous secondary indexes.
Challenge: Short timescale • At least 4 test environments: • 96,000 objects! ((6,000 tables + 18,000 indexes) * 4 environments) • 3 months Vanilla Unit test System test • Pre-live
Tools • Use tools to… • Check performance of each SQL statement • Manage change process
Check performance • “EXPLAIN” • Evaluates route to data for every SQL statement. • Identifies what indexes are used • Doesn’t identify redundant indexes • Doesn’t identify indexes that need to be changed.
Manage change process • Rigorous control needed • Achieved through… • Consistent naming standards • Detailed record of every change • Consistent route through environments, no short cuts • DBA tools
Can’t change: “One size fits all” External supplier 6000+ tables Can change: 18000+ indexes 1 tablespace Short timescale Part1: Recap of challenges
Part2: The Production Database • Does it perform? • Can the right people use it? • If disaster strikes, can the data be recovered?
Does the database perform? • Database performance monitored against Service Level Agreements (SLAs). • Regular health checks carried out: • Data stored in sequence? • Enough space? • If sub-standard performance, further database design work done.
Can the right people access the data? PERSONNEL database
Can the right people access the data? PERSONNEL database Personnel team Query & update data at individual or regional level
Can the right people access the data? DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data PERSONNEL database Personnel team Query & update data at individual or regional level
Can the right people access the data? DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data Chief executive Employee statistics PERSONNEL database Personnel team Query & update data at individual or regional level
Can the right people access the data? DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data Chief executive Employee statistics PERSONNEL database Staff member Their own data Personnel team Query & update data at individual or regional level
Can the right people use the database? • Different people, different information needs. • Sensitive data – salary, health, discipline… • Solution • VIEWS • Transaction Management
If disaster strikes, can the data be recovered? • Robust backup & recovery strategies for: • Hardware failure • Software failure
Part2: Recap of Production Database issues • Database must perform to acceptable level. • Only the right people should have access to any data item. • No matter what, the data must be recoverable.
Summary • MSc learning relevant to real world • Everything is bigger out there! • Grounding in basic understanding lets you handle complex challenges