1 / 42

Who am I?

Writing Storage Engines Brian Aker Director of Architecture Montreal PHP Conference March 2005 MySQL AB. Who am I?. Brian Aker Director of Architecture, MySQL AB

Download Presentation

Who am I?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Writing Storage EnginesBrian AkerDirector of ArchitectureMontreal PHP ConferenceMarch 2005 MySQL AB

  2. Who am I? • Brian Aker • Director of Architecture, MySQL AB • Author of mod_layout, the apache streaming services mod_mp3, Slash (Slashdot’s CMS System), and lot of other things on Freshmeat.... • http://mysql.com/ • http://krow.net/

  3. MySQL • 5 million installations • 200 employees, 20+ countries • Most North American Developers live in Seattle • We even have an office in Seattle • No, developers never go there

  4. MySQL Server • High Performance RDBMS • SQL-Based, aiming to be SQL-99 compliant • Stable • Scalable • Embedded in hardware (including JMX) • Extremely high load applications • Master/Slave Replication • Easy to use • Modular • “Storage Engines” • Many features can be disabled at runtime and/or compile time to conserve resources

  5. Client Library Support • Libmysql c-library (think OCI) • JDBC – Type IV JDBC Driver • ODBC • Perl DBD::DBI • PHP (built in) • ADO.Net, OleDB, Ruby, Erlang, Eiffel, Smalltalk, etc, etc. provided by third parties

  6. Goals • Overview of MySQL Architecture • Understanding of Storage Engine Architecture • Knowledge of required methods • Starting points for coding • sql/ is for the kernel • mysys/ is the portable runtime • mysql-test is for you test cases

  7. What does it take? • All code is written in simplified C++ • An example storage engine • Your Ideas

  8. Server’s Kernel Parser Optimizer Storage Engine MyISAM Innodb NDB HEAP Merge

  9. What is a Storage Engine? • “Data formats on Disk” • Examples • Innodb • MyISAM • BDB • Cluster • HEAP • CSV • Your’s!

  10. Example Table • CREATE TABLE foo ( • a int, • b char(4), • c varchar(9), • d blob) • ENGINE = MYISAM;

  11. Rows • Rows are made up of Fields INT CHAR VAR BLOB NULL 4 4 L + 9 L + P NULL

  12. Fields • Rows are made up of fields C H A R NULL

  13. What do I need to do to add one? • Subclass Field in field.h • Implement a few methods: • Storage: store(string), store(long long), store(double) • Retrieve:val_real(), val_int(), val_str() • Other: field_cast_type(), result_type(), cmp(), sort_string(), max_length()

  14. Field Store Example • int Field_ipaddrv4::store(const char *from, uint length, CHARSET_INFO *cs){ int count; count= sscanf(from, "%u.%u%u.%u", ptr, (ptr +1), (ptr +2), (ptr +3)); if (count != 4) { bzero(ptr, 4); return -1; } return 0;}

  15. Field val Example • String *Field_ipaddrv4::val_str(String *val_buffer __attribute__((unused)), String *val_ptr){ int count; count= snprintf(buffer, 15, "%u.%u.%u.%u", ptr[0], ptr[1], ptr[2], ptr[3]); val_ptr->set((const char*) buffer,count, &my_charset_latin1); return val_ptr;}

  16. Break Down of Storage Engine Methods • Table Control • Optimizer • SQL Modifiers • SQL Reads

  17. Table Control • ::create() • ::open() • ::close() • ::delete_table()

  18. ha_example::create() • int ha_example::create(const char *name, TABLE *table_arg, HA_CREATE_INFO *create_info){ DBUG_ENTER("ha_example::create"); /* This is not implemented but we want someone to be able that it works. */ DBUG_RETURN(0);}

  19. ha_example::open() • int ha_example::open(const char *name, int mode, uint test_if_locked){ DBUG_ENTER("ha_example::open"); if (!(share = get_share(name, table))) DBUG_RETURN(1); thr_lock_data_init(&share->lock,&lock,NULL); DBUG_RETURN(0);}

  20. ha_example::close() • int ha_example::close(void){ DBUG_ENTER("ha_example::close"); DBUG_RETURN(free_share(share));}

  21. ha_example::delete_table() • int ha_example::delete_table(const char *name){ DBUG_ENTER("ha_example::delete_table"); /* This is not implemented but we want someone to be able that it works. */ DBUG_RETURN(0);}

  22. Optimizer • ::info() • ::records_in_range()

  23. info() • void ha_heap::info(uint flag) • { records = info.records; deleted = info.deleted; errkey = info.errkey; mean_rec_length=info.reclength; data_file_length=info.data_length; index_file_length=info.index_length; max_data_file_length= info.max_records* info.reclength; delete_length= info.deleted * info.reclength; if (flag & HA_STATUS_AUTO) auto_increment_value= info.auto_increment;}

  24. records_in_range() • ha_rows ha_example::records_in_range(uint inx, key_range *min_key, key_range *max_key){ DBUG_ENTER("ha_example::records_in_range"); DBUG_RETURN(10); // low number to force index usage}

  25. SQL Modifiers • delete_row() • write_row() • update_row()

  26. ha_example::delete_row() • int ha_example::delete_row(const byte * buf){ DBUG_ENTER("ha_example::delete_row"); DBUG_RETURN(HA_ERR_WRONG_COMMAND);}

  27. ha_archive::insert_row() • int ha_archive::write_row(byte * buf){ char *pos; z_off_t written; DBUG_ENTER("ha_archive::write_row"); statistic_increment(ha_write_count,&LOCK_status); if (table->timestamp_default_now) update_timestamp(buf+table->timestamp_default_now-1); written= gzwrite(share->archive_write, buf, table->reclength); DBUG_RETURN(0);}

  28. ha_tina::write_row() • int ha_tina::update_row(const byte * old_data, byte * new_data){ int size; DBUG_ENTER("ha_tina::update_row"); size= encode_quote(new_data); if (chain_append()) DBUG_RETURN(-1); if (my_write(share->data_file, buffer.ptr(), size, MYF(MY_WME | MY_NABP))) DBUG_RETURN(-1); DBUG_RETURN(0);}

  29. SQL Reads • Scan Reads • rnd_init(), rnd_next(), position(), rnd_pos() • Index Reads • index_read(), index_next(), index_prev(), index_first(), index_last()

  30. ha_tina::rnd_init() • int ha_tina::rnd_init(bool scan){ DBUG_ENTER("ha_tina::rnd_init"); current_position= next_position= 0; records= 0; chain_ptr= chain; if (scan) (void)madvise(share->mapped_file,share->file_stat.st_size,MADV_SEQUENTIAL); DBUG_RETURN(0);}

  31. ha_tina::rnd_next() • int ha_tina::rnd_next(byte *buf){ DBUG_ENTER("ha_tina::rnd_next"); current_position= next_position; if (!share->mapped_file) DBUG_RETURN(HA_ERR_END_OF_FILE); if (HA_ERR_END_OF_FILE == find_current_row(buf) ) DBUG_RETURN(HA_ERR_END_OF_FILE); records++; DBUG_RETURN(0);}

  32. ha_tina::position() • void ha_tina::position(const byte *record){ DBUG_ENTER("ha_tina::position"); ha_store_ptr(ref, ref_length, current_position); DBUG_VOID_RETURN;}

  33. ha_tina::rnd_pos() • int ha_tina::rnd_pos(byte * buf, byte *pos){ DBUG_ENTER("ha_tina::rnd_pos"); current_position= ha_get_ptr(pos,ref_length); DBUG_RETURN(find_current_row(buf));}

  34. ha_example::index_read() • int ha_example::index_read(byte * buf, const byte * key, uint key_len __attribute__((unused)), enum ha_rkey_function find_flag __attribute__((unused))){ DBUG_ENTER("ha_example::index_read"); DBUG_RETURN(HA_ERR_WRONG_COMMAND);}

  35. ha_example::index_next() • /* Used to read forward through the index.*/int ha_example::index_next(byte * buf){ DBUG_ENTER("ha_example::index_next"); DBUG_RETURN(HA_ERR_WRONG_COMMAND);}

  36. Table Scan • ha_example::store_lockha_example::external_lockha_example::infoha_example::rnd_initha_example::extra Cash record in HA_rrnd()ha_example::rnd_nextha_example::rnd_nextha_example::rnd_nextha_example::extra End cacheing of records (def)ha_example::external_lockha_example::extra Reset database to after open

  37. That is All? • Transaction methods • Bulk load methods • Defrag methods • Lot more (read handler.h)

  38. Autoconf • Autoconf files in the top-level source directory • acconfig.h • acinclude.m4 • config.in

  39. Additional Files • Basic server files modified under sql/ • sql/Makefile.am • sql/handler.h • sql/mysql_priv.h • sql/handler.cc • sql/mysqld.cc • sql/set_var.cc

  40. Test Cases • Test cases created under mysql-test • mysql-test/include/have_mmap.inc • mysql-test/t/mmap.test • mysql-test/r/mmap.result

  41. Other Thoughts • What are your goals? • Read only? • Durable? • Network?

  42. More Information • sql/ha_example.[h|cc] • Look at Documents on mysql.com • lists.mysql.com • MySQL Support Contracts

More Related