sarah sproehnle n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Sarah Sproehnle PowerPoint Presentation
Download Presentation
Sarah Sproehnle

Loading in 2 Seconds...

play fullscreen
1 / 12

Sarah Sproehnle - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Sarah Sproehnle. Cloudera, Inc sarah@cloudera.com. Outline. Identifying slow queries Indexing techniques Proper schema design EXPLAIN Rewriting subqueries Common mistakes. Identifying slow queries. Use the slow query log! --log-slow-queries --long-query-time

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sarah Sproehnle' - toni


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sarah sproehnle
Sarah Sproehnle
  • Cloudera, Inc
  • sarah@cloudera.com
outline
Outline
  • Identifying slow queries
  • Indexing techniques
  • Proper schema design
  • EXPLAIN
  • Rewriting subqueries
  • Common mistakes
identifying slow queries
Identifying slow queries
  • Use the slow query log!

--log-slow-queries

--long-query-time

--log-queries-not-using-indexes

          • 5.1.21+ allows microsecond granularity
          • 5.1: SET GLOBAL slow_query_log = 1;
          • Do not use log_output = table
          • mysqldumpslow
          • Monitor SHOW PROCESSLIST
          • mytop, innotop, Enterprise Monitor (query analyzer is great, but adds latency; use selectively)
          • Ask around
indexing
Indexing
  • Most indexes are b-trees
  • B-tree is great for: WHERE col = x WHERE col > x WHERE col IS NULL WHERE col LIKE ‘foo%’ ORDER BY col [DESC] LIMIT n
  • Not useful for: WHERE function(col) = x WHERE col LIKE ‘%foo’ ORDER BY col -- without a LIMIT
  • Covering indexes Query: SELECT a FROM t WHERE b=‘foo’ ORDER BY c; Index: KEY(b, c, a);
indexing continued
Indexing continued
  • Creating the index can be painful (consider InnoDB plugin or replication switchover)
  • The order of columns in a composite index matters! KEY (a, b) will be used for: WHERE a = x WHERE a = x AND b = x WHERE a = x ORDER BY b SELECT b.. WHERE a = x but not… WHERE b = x
  • Hash indexes for Memory, NDB Cluster and InnoDB (adaptive) Fast and compact, but only useful for equality lookups
  • For InnoDB, do not append the PK to a secondary index
  • Watch out for duplicate indexes (mk-duplicate-key-checker)
schema design
Schema design
  • Normalize or denormalize?
  • Choose NOT NULL if possible
  • Choose good primary keys (keep them small!) They are often used as foreign keys; InnoDB uses the primary key as a row id
  • Keep your data small. Use the right data types: SMALLINT vs. INT vs. BIGINT CHAR vs. VARCHAR TIMESTAMP (4 bytes) vs. DATETIME (8 bytes) Store IP addresses as INT UNSIGNED (inet_aton) Use PROCEDURE ANALYSE()
  • Index a prefix of a string column: KEY(col(5))
  • Use RANGE partitioning, but careful which functions you use
  • Use replication to split read/writes
explain
EXPLAIN
  • EXPLAIN SELECT… unfortunately doesn’t work on UPDATE/DELETE
  • Important that you use EXPLAIN on your true data set
  • Useful for verifying: Is an appropriate index being used? What order are the tables joined in? This is critical given MySQL’s nested join algorithm. Is a temporary table required? (“Using temporary”) Covering index or are seeks to the row(s) needed? (“Using index”)
  • Example: a query that looks innocuous SELECT.. WHERE idx IN (42, 101, 1024); EXPLAIN can reveal interesting results!
rewriting subqueries
Rewriting subqueries
  • mysql> EXPLAIN SELECT name FROM Country WHERE code IN(SELECT countrycode FROM City WHERE population>9000000)\G

*************************** 1. row ***************************

select_type: PRIMARY

table: Country

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 239

Extra: Using where

*************************** 2. row *********

select_type: DEPENDENT SUBQUERY This query should not be correlated!

table: City

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 4079

Extra: Using where

rewriting subqueries1
Rewriting subqueries
  • Instead of “WHERE col IN(SELECT..)”, write a JOIN

SELECT DISTINCT Country.name FROM Country JOIN City ON code = countrycodeWHERE City.population > 9000000;

  • In general:SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition); Can be rewritten as follows:

SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;

Fixed in 5.4!

common mistakes general advice
Common mistakes/General advice
  • Need a random row? Do not use ORDER BY rand() LIMIT 1!Consider generating a random number and doing a lookup by auto_inc column.
  • Avoid hints (STRAIGHT_JOIN, FORCE INDEX)
  • Use hints when necessary, especially pre 5.1
  • Don’t set sort_buffer_size extremely large
  • Query cache: 256MB max Careful when benchmarking (use SQL_NO_CACHE)
  • memcached
  • Move long running queries (e.g., reporting queries) to a slave
more general advice
More general advice
  • Consider other storage engines
  • Materialize data into a Memory or MyISAM table. Keep it fresh with triggers or an event. Use - -init-file for Memory tables
  • LIMIT for paging – not always suitable App servers generally do not scale with large result sets, but evaluating LIMIT n,m over and over is painful
  • For batch processing, consider Hadoop: Can handle very large datasets Sqoop moves data from MySQL to Hadoop (and back) Built in reliability and scalability Don’t think in MapReduce? Use Hive
thanks for attending
Thanks for attending!
  • Sarah Sproehnle
  • Cloudera, Inc
  • sarah@cloudera.com