1 / 32

HANA Persistence

HANA Persistence. Shadow Pages. Shadow paging is a copy-on-write technique to avoid in-place updates When a page is modified, a shadow page is allocated

torie
Download Presentation

HANA Persistence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HANA Persistence Shadow Pages

  2. Shadow paging is a copy-on-write technique to avoid in-place updates When a page is modified, a shadow page is allocated Because old and new version of a page (on disk) may exist a mapping from logical page number to its physical location is necessary (converter) Why shadow paging? Provide atomicity and durability on page level(two of the ACID* properties to guarantee DB transactions to be reliable) Transition form one valid state to the next one is done by savepointing Shadow paging

  3. Converter Maps Logical Page Numbers to Physical Page Numbers (Volume Index + Offset) Is implemented as a tree Inner nodes just point to their children Leaf nodes contain mapping information Restart page points to converter root Converter (1)

  4. Converter (2) Data Volumes Anchor Page Restart Page Converter Index Page Converter Leaf Page Data Page

  5. 4712(1) Shadow paging (1)Initial State time Resource Container Converter page (Version 1) A R C 4711 … … … … 4712 Data Volume A(1) Undo Log Redo Log 4711(1) R(1) C(1) 4712(1)

  6. update Page 4711 4712(1) Shadow paging (2)Modify Data Page time - Update page content - Clear mapping in converter page - Write redo+undo log entries Resource Container Converter page (Version 1) A *R *C *4711 … … … … 4712 Data Volume A(1) Undo Log Redo Log 4711(1) Log Entry R(1) Log Entry C(1) 4712(1)

  7. update Page 4711 4712(1) Shadow paging (3a)Savepoint : Flush modified Data Pages Savepoint version 2 time - Assign new physical page number for modified data pages - Flush modified data pages Resource Container Converter page (Version 2) A *R *C 4711 … … … … 4712 Data Volume A(1) Undo Log Redo Log 4711(1) Log Entry R(1) Log Entry C(1) 4712(1) 4711(2)

  8. update Page 4711 4712(1) Shadow paging (3b)Savepoint : Flush modified Converter Pages Savepoint version 2 time - Flush modified converter pages Resource Container Converter page (Version 2) A *R C 4711 … … … … 4712 Data Volume A(1) Undo Log Redo Log 4711(1) Log Entry C(2) R(1) Log Entry C(1) 4712(1) 4711(2)

  9. update Page 4711 4712(1) Shadow paging (3c)Savepoint : Write Restart Page Savepoint version 2 time - Write restart page with infomation about converter root and current log position Resource Container Converter page (Version 2) A R C 4711 … … … … 4712 Data Volume A(1) Undo Log R(2) Redo Log 4711(1) Log Entry C(2) R(1) Log Entry C(1) 4712(1) 4711(2)

  10. update Page 4711 4712(1) Shadow paging (3d)Savepoint : Write Anchor Page Savepoint version 2 time - Update anchor page and write to disk atomically Resource Container Converter page (Version 2) A R C 4711 … … … … 4712 Data Volume A(2) Undo Log R(2) Redo Log 4711(1) Log Entry C(2) R(1) Log Entry C(1) 4712(1) 4711(2)

  11. update Page 4711 4712(1) Shadow paging (3e)Savepoint : Free Shadow Pages Savepoint version 2 time - Free pages from last savepoint cycle Resource Container Converter page (Version 2) A R C 4711 … … … … 4712 Data Volume A(2) Undo Log R(2) Redo Log Log Entry C(2) Log Entry 4712(1) 4711(2)

  12. update Page 4711 4712(1) Shadow paging (4)Commit/Rollback Savepoint version 2 time - Commit: Delete undo log entry - Rollback: Apply undo log entry to restore previous version and delete it afterwards Resource Container Converter page (Version 2) A R C 4711 … … … … 4712 Data Volume A(2) Undo Log R(2) Redo Log Log Entry C(2) Log Entry 4712(1) 4711(2)

  13. update Page 4711 4712(1) Shadow paging (5a)Restart after emergency shutdown Savepoint version 2 Crash before commit/rollback time - restart with latest converter version - apply undo log (written before savepoint) Resource Container Converter page (Version 2) A R C 4711 … … … … 4712 Data Volume A(2) Undo Log R(2) Redo Log Log Entry C(2) Log Entry 4712(1) 4711(2)

  14. update Page 4711 4712(1) Shadow paging (5b)Restart after emergency shutdown Savepoint version 2 Crash after commit/rollback Commit/Rollback time - restart with latest converter version - apply redo log (written after savepoint) for commited transactions - apply undo log (written before savepoint) for rollbacked transactions Resource Container Converter page (Version 2) A R C 4711 … … … … 4712 Data Volume A(2) Undo Log R(2) Redo Log Log Entry C(2) Log Entry 4712(1) 4711(2)

  15. Savepoint Phases • Write changed pages in parallel (up to 3 times) • Acquire lock to prevent modification of pages • Determine log position • Remember open transactions • Copy modified pages and trigger write • Increase savepoint version • Release lock • Wait for IO-requests to finish • Write anchor page

  16. Delta Persistency

  17. L2 Delta PersistencyData Structures Overview • Container Based Persistency Table Container MVCC Object PersDesc Main Fragment Delta Fragment PersDesc PersDesc Δ Col Frag C1 Δ Col Frag C2 Δ Col Frag C3 PersColDesc C2 PersColDesc C3 PersColDesc C1 DICT DICT DICT DATA DATA DATA MVCC Page Chain Data Page Chain PersDesc Dictionary Page Chain PersDesc PersDesc

  18. L2 Delta Persistency PAX Format Data Pages • Container Based Persistency # of columns, # of rows, first row position, etc. Generic LP Header DataPage Fixed Hdr (n) Column Info Blocks Bit-size encoding, data type, offset within the page. One block per column. Row IDs col 1 vids col 0 value ids col 2 vids col 3 vids Materialized RowID (can be optimized when all RowIDs are contiguous) Value id blocks for columns 4…n-1 Blocks of n-bit packed value IDs per column. Each block contains the same number of rows, but are of different size due to differences in encoding col (n) vids

  19. L2 Delta PersistencyColumn Data Array • Container Based Persistency • Column data array delta persistency implemented using PAX pages • Data page uses PAX (Partition Attributes Across) format • Keeps complete rows (i.e., same number of values for each column) on a given page • Physical data placement grouped by columns • PAX format chosen to optimize storage for large number of small ERP tables • Single page per table (for small number of rows) • In-memory contiguous column data array (aka index vector) survives • Key decision to preserve OLAP performance • Keeps AttributeEngine access methods simple – no need for access methods over PAX pages

  20. L2 Delta PersistencyColumn Data Array • Container Based Persistency • Data Page Population • Asynchronous population of data pages from in-memory data vector • Flushing of data page not tied to transaction commit • Flushing synchronized with database savepoint • Data pages evicted as soon as they are full (low memory utilization) • N-Bit Encoding Rollover • Affected in-memory column data array is re-encoded • Re-encoding of affected and subsequent data pages • Lazy copy to data page: nothing to re-encode on the page if data not copied yet • Delta Store Loading • In-memory column data arrays populated from data pages • Data pages evicted (except for the last page) afterwards

  21. L2 Delta Persistency Data Page Directory and Page Chain • Container Based Persistency Compact page directory (array) is always in memory Fully populated pages are flushed to disk andnot resident (most of the chain) Pages are resident until all rows they represent have been inserted (usually just the tail of the chain)

  22. L2 Delta PersistencyDictionary • Container Based Persistency Value Array • Two Types of Dictionaries • Value in Array (ViA) • Small fixed-size types (value length <= 16 bytes) • Value stored directly in the dictionary’s value array • Value array (transient) is vector<T> • Pointer in Array (PiA) • Strings (both VARCHAR and CHAR; fixed size not leveraged) • Value stored in: • Physical blocks; not compressed • Pointer to the value stored in the dictionary’s value array • Pointer points to a string block: • 1st 1, 2 or 4 bytes of the value are the length • 1st 2 bits in length indicates 1, 2 or 4 bytes • Value array (transient) is vector<char *> ValueID v v Block Value Array ValueID p v v

  23. L2 Delta PersistencyDictionary • Container Based Persistency • Dictionary persistency implemented using dictionary pages • Pages subdivided into blocks • A block stores a chunk of dictionary values for a single column • Value ordering • Implicit – for ViA dictionary • Explicit via logical value pointer – for PiA dictionary

  24. L2 Delta PersistencyViA Dictionary • Container Based Persistency Value ID based on implicit ordering Value copy maintained in the value vector No or little Fragmentation ViA pages evicted when full ViA Page PGH BH C1 Transient dictionaries BH C5 BH C2 C2 C1 C5 … BH C1 BH C4 Fragmentation

  25. L2 Delta PersistencyPiA Dictionary • Container Based Persistency • PiA Dictionary values are stored on PiA pages. • Different columns values interleaved in a single block per page • Value placement in PiA pages is unordered • Explicit value ordering is achieved using logical pointers • Logical pointers are implicitly ordered in a block • Blocks containing logical pointers use ViA pages (which are evicted as they become full)

  26. L2 Delta PersistencyPiA Dictionary • Container Based Persistency • ERP Dictionary Data Distribution PiA dictionary storage design considers ERP dictionary data distribution • Optimizations for small strings (<=7 bytes)

  27. L2 Delta PersistencyDML Runtime • Container Based Persistency WriteOperations (INSERT/UPDATE/DELETE) Data Log Record Column C2 Column Cn Column C1 Dictionary store Inverted Index Data Array Dict Index Data Volume Redo Logs Log Entry Delta Fragment Undo Undo Log Entry Undo UndoEntry

  28. L2 Delta PersistencyRecovery • Container Based Persistency • Delta Store recovery happens as part of the DB Recovery • At restart, delta fragment’s persistent state is reverted to what it was at the last savepoint table (via converter table switch) • Any pages (data, dictionary, and MVCC) written after the savepoint are lost • Replay of redo log records recovers the DB state (and table’s delta store as well) to last committed transaction • Still-open transactions after recovery are closed and their UNDO executed • AE is not fully available during recovery • Recovery must be self contained in the UT layer • Replay of redo log needs fully instantiated dictionary • Column data array needs to be instantiated as well • This is why backing array for dictionary value vector and column data array are moved to UT layer MVCC Dict Data Delta Fragment Data Volume Redo Logs Log Entry Undo Undo Log Entry UndoEntry Undo

  29. L2 Delta PersistencyRecovery • Container Based Persistency • When first redo log record for a table is hit, delta fragment is instantiated from on disk image of delta store Data Object Δ Col Frag Cn Δ Col Frag C2 Δ Col Frag C1 DICT DICT DICT IV IV IV Delta Fragment Data Dict MVCC In-Memory MVCC Info Delta Fragment DB Recovery Data Volume Redo Logs Log Entry Undo Undo Log Entry UndoEntry Undo

  30. L2 Delta PersistencyRecovery • Container Based Persistency • After Delta Fragment is instantiated, log records for the table can be replayed over UT • Incomplete transactions rolled back using undo files Data Object Δ Col Frag Cn Δ Col Frag C2 Δ Col Frag C1 DICT DICT DICT IV IV IV Delta Fragment MVCC Dict Data In-Memory MVCC Info Replay Redo Log Record Delta Fragment Dirty pages written out when full DB Recovery Data Volume Redo Logs Log Entry Undo Undo Log Entry UndoEntry Undo

  31. L2 DeltaLock-less Structures • Lock-less Structures Legacy Implementation • DMLs execute concurrently, but blocked at a data structure access level • Data Structure level locking hurts OLTP performance • A write to an index vector locks the entire vector (unlike classical page based scheme that requires only affected page to be locked) L2 Delta uses lock-less structures • Lock-less versioned vectors • Column data array, dictionary value vector, etc. • Lock-less B-tree and/or lock-less hash mapfor dictionary index • No locking of structures even for write • Most coarse locks from AE already removed • AE has several other locks • Most of them will be removed, but some may survive • Existing code has silent assumptions Attribute C1 Inverted Index Dict ValueVector Index Vector Dict Index

  32. Unified Table ContainerControl Structure Versioning Example (Simplified) • Lock-less Structures Transient • Operations: • Add Page 3 • Start reader • Add Page 4 • Add Page 5 • Clone vector • Clone table • Set vector • Set anchor • Link page • Read data • Add Page 6 • End reader •  Old version dropped Reader Anchor Meta Table Table’ Ref# 2 1 2 Ref# 0 1 Δ Page Vector Δ Page Vector’ Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Persistent

More Related