Mark s rasmussen improve dk
Download
1 / 98

Knowing the Internals, Who Needs SQL Server Anyway? - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on
  • Presentation posted in: General

Mark S. Rasmussen – improve.dk. Knowing the Internals, Who Needs SQL Server Anyway?. Whoami. Tech Lead @ iPaper Developer/DBA/Sysadmin/Project manager/* Comp.Sci @ Aarhus University Blogging at improve.dk @improvedk Author of the OrcaMDF project. Disclaimer.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Knowing the Internals, Who Needs SQL Server Anyway?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Mark s rasmussen improve dk

Mark S. Rasmussen – improve.dk

Knowing the Internals, Who Needs SQL Server Anyway?


Whoami
Whoami

  • Tech Lead @ iPaper

  • Developer/DBA/Sysadmin/Project manager/*

  • Comp.Sci @ Aarhus University

  • Blogging at improve.dk

  • @improvedk

  • Author of the OrcaMDF project


Disclaimer
Disclaimer

  • Level 650 – meant to inspire, not teach!

  • Based on 2008 R2

  • I have no idea...

  • Most of what I say is incorrect


Background
Background

  • Presentation at Miracle Open World

  • Formally started OrcaMDF


Old school querying
Old School Querying

  • privatestaticvoid oldschool()

  • {

    • using (varconn=newSqlConnection("Data Source=.;Initial Catalog=QFD;…")

    • {

      • conn.Open();

      • varcmd=newSqlCommand("SELECT * FROM Persons", conn);

      • var reader = cmd.ExecuteReader();

      • while(reader.Read())

        • Console.WriteLine(reader["ID"] +": "+ reader["Name"] +" ("+ reader["A

    • }

  • }


Orcamdf querying
OrcaMDF Querying

  • using (varmdf=newMdfFile(mdfPath))

  • {

    • var scanner =new DataScanner(mdf);

    • var rows = scanner.ScanTable("Persons");

    • EntityPrinter.Print(rows);

  • }

  • using (varmdf=newMdfFile(mdfPath))

  • {

    • var scanner =new DataScanner(mdf);

    • var rows = scanner.ScanTable("Persons")

    • .Where(x => x.Field<short>(”Age”) < 40);

    • EntityPrinter.Print(rows);

  • }


Pages
Pages

The foundation of SQL Server storage


What is a page
What Is a Page?

  • 8192 bytes

  • Everything storedas pages


Undocumented dbcc commands flags
Undocumented DBCC Commands & Flags

  • DBCC IND

  • DBCC PAGE

  • DBCC TRACEON (3604)

  • Documented in 6.5 & 7.0

  • Unofficially documented


Page header
Page Header

  • Absolutely no documentation

  • Absolutely necessary for parsing


Reverse engineering the header
Reverse Engineering the Header

  • DEMO

  • OrcaMDF: PageHeader


Slot array
Slot Array

  • Points to beginningof records in body

  • Defines logicalorder of records


Records
Records

  • Data records

    • Stores table data

  • Index records

    • Stores nonclustered index data, as well as non-leaf level clustered index data

  • Stored in the FixedVar format





Null bitmap
NULL Bitmap

  • Bitmap tracking whether columns are NULL

  • CEIL(#Cols / 8) bytes

  • Always present on data pages, except when it’s not

  • Only trust defined bits – rest may be garbage



Example record
Example Record

CREATE TABLE RecordTest

(

A int,

B int,

C char(5),

D varchar(10),

E varchar(20)

)

INSERT INTO RecordTest VALUES (25, 38, 'ABCD', 'Mark', 'Denmark')


When is data present
When Is Data Present?

  • Fixed length data always present

    • Even if null

      • Though not necessarily tail columns!

  • Variable length data only present when not null

  • Adding nullable columns is a metadata op

    • Denali default value columns is metadata too!


Data types
Data Types

How are data types stored within a record?


Classifying data types
Classifying Data Types

  • Fixed length data types

    • bit, char, int, decimal, date, datetime, float, etc.

  • Variable length data types

    • (n)varchar, varbinary, varchar(MAX), text, etc.

  • sql_variant

    • Please just stay away from it


Variable length data types
Variable Length Data Types

  • SLOBs

    • varchar(x), nvarchar(x), varbinary(x)

  • LOBs

    • text, ntext, image, varchar(MAX), nvarchar(MAX), varbinary(MAX), xml

  • vardecimal


In row n varchar x storage
In-row (n)varchar(x) Storage

  • CREATE TABLE VarcharTest

  • (

  • A varchar(4)

  • )

  • INSERT INTO VarcharTest VALUES ('Mark')


  • Complex columns
    Complex Columns

    • DEMO

    • Identified using the sign bit

    • 0b1001001110010101 = 37.781

    • 0b0001001110010101 = 5.013

    • Use cases

      • Row-overflow/LOB pointers

      • Sparse vectors

      • Back pointers


    Off row slob storage
    Off-row SLOB Storage

    • Varchar, nvarchar, varbinary

    • DEMO


    Off row slob storage1
    Off-row SLOB Storage

    • Column data moved to new page, pointer left behind



    Off row slob storage3
    Off-row SLOB Storage

    B = [BLOB Inline Root] Slot 0 Column 2 Offset 0x11a1 Length (physical) 24

    Level = 0Unused = 0UpdateSeq = 1

    TimeStamp = 1298595840Link 0Size = 4500 RowId = (1:21:0)

    • timestamp =BitConverter.ToInt64(data, 8) <<16;


    Blob fragment record
    BLOB_FRAGMENT Record

    Blob row at: Page (1:21) Slot 0 Length: 4514 Type: 3 (DATA)

    Blob Id:469368832

    • Stored on shared (obj-level) TextMix pages


    Off row slob storage4
    Off-row SLOB Storage

    • Allways stored in-row if < 24 bytes

    • 24 byte [BLOB Inline Root] pointer

    • Data stored in BLOB_FRAGMENT on TextMix page

    • Timestamp == Blob ID

    • Performance prediction is tough


    Max lob storage
    (MAX) LOB Storage

    • varchar(MAX), nvarchar(MAX), varbinary(MAX)

    • The LOB that wanted to be a SLOB

    • Three scenarios

      • [BLOB Inline Data]

      • [BLOB Inline Root]

      • [Textpointer]

    • DEMO


    Blob inline data
    ”BLOB Inline Data”

    • Used when data fits in record

    • Not an official LOB structure

    Slot 0 Column 1 Offset 0x0 Length 0 Length (physical) 0

    A = [NULL]

    B = [BLOB Inline Data] Slot 0 Column 2 Offset 0x1393 Length 4 Length (physical) 4

    B = 0x41424344


    Blob inline root
    BLOB Inline Root

    • Can reference up to 5 pages – data, roots, trees, etc.

    • 12 byte header

    • Array of 12 byte references

    • Only used by SLOBs & (MAX) LOBs

    • Also not a LOB structure (by my definition)


    Lob structure records
    LOB Structure Records

    • Wrapped in a single-column ”meta” record


    Knowing the internals who needs sql server anyway
    DATA

    Blob row at: Page (1:176) Slot 0 Length: 8054 Type: 3 (DATA)

    Blob Id:1210253312

    • Type 3

    • Where data is actually stored

    • Size always > 64 bytes (SMALL_ROOT)


    Knowing the internals who needs sql server anyway
    DATA

    • How much data can we store in a DATA record?

      • 8096 – Page body size

      • 8080 (8094)– Theoretical max

      • 8040 (8054)- Reality


    Internal
    INTERNAL

    Blob row at: Page (1:55) Slot 0 Length: 324 Type: 2 (INTERNAL)

    Blob Id: 1210253312 Level: 0 MaxLinks: 501 CurLinks: 19

    Child 0 at Page (1:176) Slot 0 Size: 8040 Offset: 8040

    Child 1 at Page (1:177) Slot 0 Size: 8040 Offset: 16080

    • Type 2

    • CurLinks = number of references

    • MaxLinks = ?

    • Level = tree level

    • Size = computed


    Internal1
    INTERNAL

    Total record size = 20 + X * 16



    The tree grows
    The Tree Grows

    • In theory (INTERNAL)...

    • In reality... 500




    Two levels is all it takes
    Two Levels Is All It Takes

    • 8,040,000,000 bytes / 7,48 GB

    • (MAX) limit is 231-1

    • Many permutations


    Large value types out of row
    Large Value Types Out of Row

    • sp_tableoption ‘MyTable’, ‘Option’, ‘ON/OFF’

    • Even more permutations

    • “text in row” 24-7000, default 256


    Textpointer
    Textpointer

    • Used for classic LOB types & MAX LOB types with ’large value types out of row’ ON

      • text, ntext, image

    • Complex column


    Classic lob structures
    Classic Lob Structures

    • You thought (MAX) was complex?

    • Textpointer =


    Small root
    SMALL_ROOT

    • Type = 0

    • Used when data <= 64 bytes

    • Min size = 84

    • Data > length = garbage


    Large root yukon
    LARGE_ROOT_YUKON

    • Type = 5

    • Min size = 84

    • Part of LOB tree




    Lob storage overview
    LOB Storage Overview

    • Extreme impact on small data

    • The more data, the less of a diff

    • Performance differences

      • http://sqlblog.com/blogs/paul_white/archive/2011/02/23/Advanced-TSQL-Tuning-Why-Internals-Knowledge-Matters.aspx





    Lob summary
    LOB Summary

    • < 8000 => (MAX) = (X)

    • > 8000 => Tree is built

    • Text/ntext/image horribly inefficient

      • Lots of legacy details


    Indices heaps
    Indices & Heaps

    How pages are organized


    Clustered index vs heap
    Clustered Index vs Heap

    • Defines how data is *physically* stored

    • Clustered index

      • Guarantees physical order of data

      • Row identified by ”clustered key”

    • Heap

      • Data stored whereever SQL Server wants to

      • Row identified by ”RID”


    B tree scanning
    B+-tree Scanning


    Heaps
    Heaps

    • Relies on IAM pages

    • Leaf pages not linked

      • Except...


    Page extent allocation
    Page & Extent Allocation

    Extents, pages & objects


    Extents
    Extents

    • All pages allocated as part of an extent

    • Mixed extents

    • Uniform extents

    • First 8 = mixed, rest uniform


    Gam pages
    GAM Pages

    • Global Allocation Map

    • 1 = Free, 0 = Allocated

    • Bitmap tracks 63,904 extents, almost 4GB

    • Present every 511,232 pages

    • GAM interval

    • 2 / 511232, every 511232 pages


    Sgam pages
    SGAM Pages

    • Shared Global Allocation Map

    • 1 = Mixed & > 0 free pages

    • 0 = Either uniform or mixed w/no free pages

    • Structure identical to GAM

    • 3 / 511233, every 511232 pages


    Iam pages
    IAM Pages

    • Index Allocation Map

    • 1 = Uniformly allocated to IAM chain / allocation unit

    • 0 = Not owned by IAM chain / AU

    • No fixed positioning!

    • Tracks a GAM interval

    • Structure (almost) identical to GAM




    Pfs pages
    PFS Pages

    • Page Free Space

    • Bytemap

    • 1 / 8088, every 8088 pages

    • PFS interval

    • Only tracks fullness where necessary




    The boot page
    The Boot Page

    • Page 9 in primary data file

    • DBCC PAGE == DBCC DBINFO

    • Lots of interesting info

      • Physical version

      • Log rebuild count

      • Last OK CHECKDB

      • Last LOG backup

      • Name + ID

      • FirstSysIndexes


    System views base tables
    System Views & Base Tables

    The source of our parsing metadata


    Needed metadata for parsing
    Needed Metadata for Parsing

    • Schema

      • sys.tables + sys.columns / sys.indexes + sys.index_columns

    • Indexes

      • Root page

    • Heaps

      • IAM chain root

    • DEMO



    Dmvs are useless
    DMVs Are Useless!

    • Just views, no physical storage

    • Chicken or the egg

    • How about we take a look at those views?

    • DEMO


    Base tables
    Base Tables

    • The basis for DMV data

    • Can only be queried through the DAC

      • Here be dragons!

    • Confusing column names

    • Utilizes internal functions


    The holy grail of metadata
    The Holy Grail of Metadata

    • SELECT * FROM sys.sysschobjs

    • sysschobjs

    • syscolpars

    • sysrowsets

    • sysallocunits

    • DEMO


    Follow the rabbit
    Follow the Rabbit

    • Boot page points to sysallocunits (FirstSysIndexes)

    • Constant partition ID leads us to sysrowsets

    • Constant object ID leads us to sysschobjs

    • Using the above we can find syscolpars

    • DEMO


    Openrowset table rscprop
    OPENROWSET(TABLE RSCPROP)

    • SELECT

    • CASEc.maxinrowlen

    • WHEN0 THENp.length 

    • ELSEc.maxinrowlen

    • ENDASmax_inrow_length,

    • p.xtype ASsystem_type_id, 

    • p.length ASmax_length, 

    • p.prec ASPRECISION, 

    • p.scale ASscale,

    • FROM

    • sys.sysrscols c –-sys.system_internals_partition_columns

    • OUTERAPPLY

    • OPENROWSET(TABLERSCPROP, c.ti) p



    Openrowset table rscprop2
    OPENROWSET(TABLE RSCPROP)

    • CREATETABLETITest

    • (

    • A binary(50),

    • B char(10),

    • C datetime2(5),

    • D decimal(12, 5),

    • E float,

    • F int,

    • G numeric(11, 4),

    • H nvarchar(50),

    • I nvarchar(max),

    • J time(3),

    • K tinyint,

    • L varbinary(max),

    • M varchar(75),

    • N text

    • )


    Openrowset table rscprop3
    OPENROWSET(TABLE RSCPROP)

    • SELECT

    • t.name,

    • r.ti,

    • p.scale,

    • p.precision,

    • p.max_length,

    • p.system_type_id,

    • p.max_inrow_length

    • FROM

    • sys.system_internals_partition_columns p

    • INNERJOIN

    • sys.sysrscols r ON

    • r.rscolid = p.partition_column_id AND

    • r.rsid = p.partition_id

    • INNERJOIN

    • sys.types t ON

    • t.system_type_id = p.system_type_id AND

    • t.user_type_id = p.system_type_id

    • WHERE

    • partition_id = 72057594040614912



    Openrowset table rscprop5
    OPENROWSET(TABLE RSCPROP)

    • 12973 = 0x000032AD

    • 173 = 0xAD == 12973 & 0xFF

    • 50= 0x32 == (12973 & 0xFFFF00) >> 8


    Openrowset table rscprop6
    OPENROWSET(TABLE RSCPROP)

    • 1322= 0x0000052A

    • 42= 0x2A == 1322 & 0xFF

    • 5= 0x5 == (1322 & 0xFF00) >> 8

    • 25= 20 + scale


    Openrowset table rscprop7
    OPENROWSET(TABLE RSCPROP)

    • 330858= 0x00050C6A

    • 106= 0x6A == 330858 && 0xFF

    • 12= 0x0C == (330858 && 0xFF00) >> 8

    • 5= 0x05 == (330858 && 0xFF0000) >> 16


    Creating a type aware ti parser
    Creating a Type Aware TI Parser

    [Test]

    publicvoidDecimal()

    {

    var parser = newSysrscolTIParser(330858);

    Assert.AreEqual(5, parser.Scale);

    Assert.AreEqual(12, parser.Precision);

    Assert.AreEqual(9, parser.MaxLength);

    Assert.AreEqual(106, parser.TypeID);

    Assert.AreEqual(9, parser.MaxInrowLength);

    parser = newSysrscolTIParser(396138);

    Assert.AreEqual(6, parser.Scale);

    Assert.AreEqual(11, parser.Precision);

    Assert.AreEqual(9, parser.MaxLength);

    Assert.AreEqual(106, parser.TypeID);

    Assert.AreEqual(9, parser.MaxInrowLength);

    }


    Data recovery
    Data Recovery

    When everything elsefails


    Please don t rely on this
    Please Don’t Rely on This

    • You should always have backups available

    • Make sure to test your backups

    • Run regular consistency checks

    • This is a last resort measure


    What kind of problems to expect
    What Kind of Problems to Expect

    • Torn pages

    • Corrupt pages

    • Bad metadata

    • Accidental deletes & truncations

    • How does OrcaMDF differ?


    Torn pages
    Torn Pages

    • One page = 16 disk sectors

    • First and last sector most important

    • No header

      • Identify object from IAM, linked list

    • No slot array

      • Slot count in header

      • Identify record formats in body


    Corrupt pages
    Corrupt Pages

    • Checksum doesn’t match content

    • Could be minor issue, probably major

    • Treat like torn page


    Bad metadata
    Bad Metadata

    • SQL Server bugs

    • Corrupt/torn pages

    • Scan pages and identify object in header

    • Scan pages and look for IAM chain

    • Deduce schema

      • App

      • Docs

      • Record format


    Accidental deletes truncations
    Accidental Deletes & Truncations

    • Accidental delete

      • Records may be ghosted

      • Records removed from slot array

      • STOP!

    • Accidental truncation

      • Pages deallocated, physically intact

      • Scan pages, linked list


    Watch out for instant initialization
    Watch Out For Instant Initialization

    • Garbage may be mistaken for data

    • Was page allocated?

      • Look for clues in salvagable allocation structures


    Orcamdf future plans
    OrcaMDF Future Plans

    • CorruptMdf class

    • Verifying tornbits / checksum

    • Utility methods

      • Scan for pages belonging to object

      • Scan for IAM pages

    • Best-effort parsing of pages


    Questions
    Questions

    Now’s the chance


    Thank you
    Thank you!

    Blog:improve.dk

    Twitter:@improvedk

    Email:mark@improve.dk


    ad
  • Login