mark s rasmussen improve dk n.
Download
Skip this Video
Download Presentation
Knowing the Internals, Who Needs SQL Server Anyway?

Loading in 2 Seconds...

play fullscreen
1 / 98

Knowing the Internals, Who Needs SQL Server Anyway? - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

Mark S. Rasmussen – improve.dk. Knowing the Internals, Who Needs SQL Server Anyway?. Whoami. Tech Lead @ iPaper Developer/DBA/Sysadmin/Project manager/* Comp.Sci @ Aarhus University Blogging at improve.dk @improvedk Author of the OrcaMDF project. Disclaimer.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Knowing the Internals, Who Needs SQL Server Anyway?' - ursula-moreno


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
whoami
Whoami
  • Tech Lead @ iPaper
  • Developer/DBA/Sysadmin/Project manager/*
  • Comp.Sci @ Aarhus University
  • Blogging at improve.dk
  • @improvedk
  • Author of the OrcaMDF project
disclaimer
Disclaimer
  • Level 650 – meant to inspire, not teach!
  • Based on 2008 R2
  • I have no idea...
  • Most of what I say is incorrect
background
Background
  • Presentation at Miracle Open World
  • Formally started OrcaMDF
old school querying
Old School Querying
  • privatestaticvoid oldschool()
  • {
    • using (varconn=newSqlConnection("Data Source=.;Initial Catalog=QFD;…")
    • {
      • conn.Open();
      • varcmd=newSqlCommand("SELECT * FROM Persons", conn);
      • var reader = cmd.ExecuteReader();
      • while(reader.Read())
        • Console.WriteLine(reader["ID"] +": "+ reader["Name"] +" ("+ reader["A
    • }
  • }
orcamdf querying
OrcaMDF Querying
  • using (varmdf=newMdfFile(mdfPath))
  • {
    • var scanner =new DataScanner(mdf);
    • var rows = scanner.ScanTable("Persons");
    • EntityPrinter.Print(rows);
  • }
  • using (varmdf=newMdfFile(mdfPath))
  • {
    • var scanner =new DataScanner(mdf);
    • var rows = scanner.ScanTable("Persons")
    • .Where(x => x.Field<short>(”Age”) < 40);
    • EntityPrinter.Print(rows);
  • }
pages
Pages

The foundation of SQL Server storage

what is a page
What Is a Page?
  • 8192 bytes
  • Everything storedas pages
undocumented dbcc commands flags
Undocumented DBCC Commands & Flags
  • DBCC IND
  • DBCC PAGE
  • DBCC TRACEON (3604)
  • Documented in 6.5 & 7.0
  • Unofficially documented
page header
Page Header
  • Absolutely no documentation
  • Absolutely necessary for parsing
reverse engineering the header
Reverse Engineering the Header
  • DEMO
  • OrcaMDF: PageHeader
slot array
Slot Array
  • Points to beginningof records in body
  • Defines logicalorder of records
records
Records
  • Data records
    • Stores table data
  • Index records
    • Stores nonclustered index data, as well as non-leaf level clustered index data
  • Stored in the FixedVar format
null bitmap
NULL Bitmap
  • Bitmap tracking whether columns are NULL
  • CEIL(#Cols / 8) bytes
  • Always present on data pages, except when it’s not
  • Only trust defined bits – rest may be garbage
example record
Example Record

CREATE TABLE RecordTest

(

A int,

B int,

C char(5),

D varchar(10),

E varchar(20)

)

INSERT INTO RecordTest VALUES (25, 38, 'ABCD', 'Mark', 'Denmark')

when is data present
When Is Data Present?
  • Fixed length data always present
    • Even if null
      • Though not necessarily tail columns!
  • Variable length data only present when not null
  • Adding nullable columns is a metadata op
    • Denali default value columns is metadata too!
data types
Data Types

How are data types stored within a record?

classifying data types
Classifying Data Types
  • Fixed length data types
    • bit, char, int, decimal, date, datetime, float, etc.
  • Variable length data types
    • (n)varchar, varbinary, varchar(MAX), text, etc.
  • sql_variant
    • Please just stay away from it
variable length data types
Variable Length Data Types
  • SLOBs
    • varchar(x), nvarchar(x), varbinary(x)
  • LOBs
    • text, ntext, image, varchar(MAX), nvarchar(MAX), varbinary(MAX), xml
  • vardecimal
in row n varchar x storage
In-row (n)varchar(x) Storage
        • CREATE TABLE VarcharTest
        • (
        • A varchar(4)
        • )
  • INSERT INTO VarcharTest VALUES ('Mark')
complex columns
Complex Columns
  • DEMO
  • Identified using the sign bit
  • 0b1001001110010101 = 37.781
  • 0b0001001110010101 = 5.013
  • Use cases
    • Row-overflow/LOB pointers
    • Sparse vectors
    • Back pointers
off row slob storage
Off-row SLOB Storage
  • Varchar, nvarchar, varbinary
  • DEMO
off row slob storage1
Off-row SLOB Storage
  • Column data moved to new page, pointer left behind
off row slob storage3
Off-row SLOB Storage

B = [BLOB Inline Root] Slot 0 Column 2 Offset 0x11a1 Length (physical) 24

Level = 0 Unused = 0 UpdateSeq = 1

TimeStamp = 1298595840 Link 0 Size = 4500 RowId = (1:21:0)

  • timestamp =BitConverter.ToInt64(data, 8) <<16;
blob fragment record
BLOB_FRAGMENT Record

Blob row at: Page (1:21) Slot 0 Length: 4514 Type: 3 (DATA)

Blob Id:469368832

  • Stored on shared (obj-level) TextMix pages
off row slob storage4
Off-row SLOB Storage
  • Allways stored in-row if < 24 bytes
  • 24 byte [BLOB Inline Root] pointer
  • Data stored in BLOB_FRAGMENT on TextMix page
  • Timestamp == Blob ID
  • Performance prediction is tough
max lob storage
(MAX) LOB Storage
  • varchar(MAX), nvarchar(MAX), varbinary(MAX)
  • The LOB that wanted to be a SLOB
  • Three scenarios
    • [BLOB Inline Data]
    • [BLOB Inline Root]
    • [Textpointer]
  • DEMO
blob inline data
”BLOB Inline Data”
  • Used when data fits in record
  • Not an official LOB structure

Slot 0 Column 1 Offset 0x0 Length 0 Length (physical) 0

A = [NULL]

B = [BLOB Inline Data] Slot 0 Column 2 Offset 0x1393 Length 4 Length (physical) 4

B = 0x41424344

blob inline root
BLOB Inline Root
  • Can reference up to 5 pages – data, roots, trees, etc.
  • 12 byte header
  • Array of 12 byte references
  • Only used by SLOBs & (MAX) LOBs
  • Also not a LOB structure (by my definition)
lob structure records
LOB Structure Records
  • Wrapped in a single-column ”meta” record
slide36
DATA

Blob row at: Page (1:176) Slot 0 Length: 8054 Type: 3 (DATA)

Blob Id:1210253312

  • Type 3
  • Where data is actually stored
  • Size always > 64 bytes (SMALL_ROOT)
slide37
DATA
  • How much data can we store in a DATA record?
    • 8096 – Page body size
    • 8080 (8094)– Theoretical max
    • 8040 (8054)- Reality
internal
INTERNAL

Blob row at: Page (1:55) Slot 0 Length: 324 Type: 2 (INTERNAL)

Blob Id: 1210253312 Level: 0 MaxLinks: 501 CurLinks: 19

Child 0 at Page (1:176) Slot 0 Size: 8040 Offset: 8040

Child 1 at Page (1:177) Slot 0 Size: 8040 Offset: 16080

  • Type 2
  • CurLinks = number of references
  • MaxLinks = ?
  • Level = tree level
  • Size = computed
internal1
INTERNAL

Total record size = 20 + X * 16

the tree grows
The Tree Grows
  • In theory (INTERNAL)...
  • In reality... 500
two levels is all it takes
Two Levels Is All It Takes
  • 8,040,000,000 bytes / 7,48 GB
  • (MAX) limit is 231-1
  • Many permutations
large value types out of row
Large Value Types Out of Row
  • sp_tableoption ‘MyTable’, ‘Option’, ‘ON/OFF’
  • Even more permutations
  • “text in row” 24-7000, default 256
textpointer
Textpointer
  • Used for classic LOB types & MAX LOB types with ’large value types out of row’ ON
    • text, ntext, image
  • Complex column
classic lob structures
Classic Lob Structures
  • You thought (MAX) was complex?
  • Textpointer =
small root
SMALL_ROOT
  • Type = 0
  • Used when data <= 64 bytes
  • Min size = 84
  • Data > length = garbage
large root yukon
LARGE_ROOT_YUKON
  • Type = 5
  • Min size = 84
  • Part of LOB tree
lob storage overview
LOB Storage Overview
  • Extreme impact on small data
  • The more data, the less of a diff
  • Performance differences
    • http://sqlblog.com/blogs/paul_white/archive/2011/02/23/Advanced-TSQL-Tuning-Why-Internals-Knowledge-Matters.aspx
lob summary
LOB Summary
  • < 8000 => (MAX) = (X)
  • > 8000 => Tree is built
  • Text/ntext/image horribly inefficient
    • Lots of legacy details
indices heaps
Indices & Heaps

How pages are organized

clustered index vs heap
Clustered Index vs Heap
  • Defines how data is *physically* stored
  • Clustered index
    • Guarantees physical order of data
    • Row identified by ”clustered key”
  • Heap
    • Data stored whereever SQL Server wants to
    • Row identified by ”RID”
heaps
Heaps
  • Relies on IAM pages
  • Leaf pages not linked
    • Except...
page extent allocation
Page & Extent Allocation

Extents, pages & objects

extents
Extents
  • All pages allocated as part of an extent
  • Mixed extents
  • Uniform extents
  • First 8 = mixed, rest uniform
gam pages
GAM Pages
  • Global Allocation Map
  • 1 = Free, 0 = Allocated
  • Bitmap tracks 63,904 extents, almost 4GB
  • Present every 511,232 pages
  • GAM interval
  • 2 / 511232, every 511232 pages
sgam pages
SGAM Pages
  • Shared Global Allocation Map
  • 1 = Mixed & > 0 free pages
  • 0 = Either uniform or mixed w/no free pages
  • Structure identical to GAM
  • 3 / 511233, every 511232 pages
iam pages
IAM Pages
  • Index Allocation Map
  • 1 = Uniformly allocated to IAM chain / allocation unit
  • 0 = Not owned by IAM chain / AU
  • No fixed positioning!
  • Tracks a GAM interval
  • Structure (almost) identical to GAM
pfs pages
PFS Pages
  • Page Free Space
  • Bytemap
  • 1 / 8088, every 8088 pages
  • PFS interval
  • Only tracks fullness where necessary
the boot page
The Boot Page
  • Page 9 in primary data file
  • DBCC PAGE == DBCC DBINFO
  • Lots of interesting info
    • Physical version
    • Log rebuild count
    • Last OK CHECKDB
    • Last LOG backup
    • Name + ID
    • FirstSysIndexes
system views base tables
System Views & Base Tables

The source of our parsing metadata

needed metadata for parsing
Needed Metadata for Parsing
  • Schema
    • sys.tables + sys.columns / sys.indexes + sys.index_columns
  • Indexes
    • Root page
  • Heaps
    • IAM chain root
  • DEMO
dmvs are useless
DMVs Are Useless!
  • Just views, no physical storage
  • Chicken or the egg
  • How about we take a look at those views?
  • DEMO
base tables
Base Tables
  • The basis for DMV data
  • Can only be queried through the DAC
    • Here be dragons!
  • Confusing column names
  • Utilizes internal functions
the holy grail of metadata
The Holy Grail of Metadata
  • SELECT * FROM sys.sysschobjs
  • sysschobjs
  • syscolpars
  • sysrowsets
  • sysallocunits
  • DEMO
follow the rabbit
Follow the Rabbit
  • Boot page points to sysallocunits (FirstSysIndexes)
  • Constant partition ID leads us to sysrowsets
  • Constant object ID leads us to sysschobjs
  • Using the above we can find syscolpars
  • DEMO
openrowset table rscprop
OPENROWSET(TABLE RSCPROP)
  • SELECT
  • CASEc.maxinrowlen
  • WHEN0 THENp.length 
  • ELSEc.maxinrowlen
  • ENDASmax_inrow_length,
  • p.xtype ASsystem_type_id, 
  • p.length ASmax_length, 
  • p.prec ASPRECISION, 
  • p.scale ASscale,
  • FROM
  • sys.sysrscols c –-sys.system_internals_partition_columns
  • OUTERAPPLY
  • OPENROWSET(TABLERSCPROP, c.ti) p
openrowset table rscprop2
OPENROWSET(TABLE RSCPROP)
  • CREATETABLETITest
  • (
  • A binary(50),
  • B char(10),
  • C datetime2(5),
  • D decimal(12, 5),
  • E float,
  • F int,
  • G numeric(11, 4),
  • H nvarchar(50),
  • I nvarchar(max),
  • J time(3),
  • K tinyint,
  • L varbinary(max),
  • M varchar(75),
  • N text
  • )
openrowset table rscprop3
OPENROWSET(TABLE RSCPROP)
  • SELECT
  • t.name,
  • r.ti,
  • p.scale,
  • p.precision,
  • p.max_length,
  • p.system_type_id,
  • p.max_inrow_length
  • FROM
  • sys.system_internals_partition_columns p
  • INNERJOIN
  • sys.sysrscols r ON
  • r.rscolid = p.partition_column_id AND
  • r.rsid = p.partition_id
  • INNERJOIN
  • sys.types t ON
  • t.system_type_id = p.system_type_id AND
  • t.user_type_id = p.system_type_id
  • WHERE
  • partition_id = 72057594040614912
openrowset table rscprop5
OPENROWSET(TABLE RSCPROP)
  • 12973 = 0x000032AD
  • 173 = 0xAD == 12973 & 0xFF
  • 50 = 0x32 == (12973 & 0xFFFF00) >> 8
openrowset table rscprop6
OPENROWSET(TABLE RSCPROP)
  • 1322 = 0x0000052A
  • 42 = 0x2A == 1322 & 0xFF
  • 5 = 0x5 == (1322 & 0xFF00) >> 8
  • 25 = 20 + scale
openrowset table rscprop7
OPENROWSET(TABLE RSCPROP)
  • 330858 = 0x00050C6A
  • 106 = 0x6A == 330858 && 0xFF
  • 12 = 0x0C == (330858 && 0xFF00) >> 8
  • 5 = 0x05 == (330858 && 0xFF0000) >> 16
creating a type aware ti parser
Creating a Type Aware TI Parser

[Test]

publicvoidDecimal()

{

var parser = newSysrscolTIParser(330858);

Assert.AreEqual(5, parser.Scale);

Assert.AreEqual(12, parser.Precision);

Assert.AreEqual(9, parser.MaxLength);

Assert.AreEqual(106, parser.TypeID);

Assert.AreEqual(9, parser.MaxInrowLength);

parser = newSysrscolTIParser(396138);

Assert.AreEqual(6, parser.Scale);

Assert.AreEqual(11, parser.Precision);

Assert.AreEqual(9, parser.MaxLength);

Assert.AreEqual(106, parser.TypeID);

Assert.AreEqual(9, parser.MaxInrowLength);

}

data recovery
Data Recovery

When everything elsefails

please don t rely on this
Please Don’t Rely on This
  • You should always have backups available
  • Make sure to test your backups
  • Run regular consistency checks
  • This is a last resort measure
what kind of problems to expect
What Kind of Problems to Expect
  • Torn pages
  • Corrupt pages
  • Bad metadata
  • Accidental deletes & truncations
  • How does OrcaMDF differ?
torn pages
Torn Pages
  • One page = 16 disk sectors
  • First and last sector most important
  • No header
    • Identify object from IAM, linked list
  • No slot array
    • Slot count in header
    • Identify record formats in body
corrupt pages
Corrupt Pages
  • Checksum doesn’t match content
  • Could be minor issue, probably major
  • Treat like torn page
bad metadata
Bad Metadata
  • SQL Server bugs
  • Corrupt/torn pages
  • Scan pages and identify object in header
  • Scan pages and look for IAM chain
  • Deduce schema
    • App
    • Docs
    • Record format
accidental deletes truncations
Accidental Deletes & Truncations
  • Accidental delete
    • Records may be ghosted
    • Records removed from slot array
    • STOP!
  • Accidental truncation
    • Pages deallocated, physically intact
    • Scan pages, linked list
watch out for instant initialization
Watch Out For Instant Initialization
  • Garbage may be mistaken for data
  • Was page allocated?
    • Look for clues in salvagable allocation structures
orcamdf future plans
OrcaMDF Future Plans
  • CorruptMdf class
  • Verifying tornbits / checksum
  • Utility methods
    • Scan for pages belonging to object
    • Scan for IAM pages
  • Best-effort parsing of pages
questions
Questions

Now’s the chance

thank you

Thank you!

Blog: improve.dk

Twitter: @improvedk

Email: mark@improve.dk

ad