tame your unruly data with constraints n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tame Your Unruly Data… With Constraints! PowerPoint Presentation
Download Presentation
Tame Your Unruly Data… With Constraints!

Loading in 2 Seconds...

play fullscreen
1 / 35

Tame Your Unruly Data… With Constraints! - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Tame Your Unruly Data… With Constraints!. Rob Volk. Speaker Bio / Contact. SQL Server DBA/Developer since 1998 / 6.5 Started with dBase III & Clipper And of course MS Access Wireless telecom, utility billing, credit cards SQLTeam moderator & contributor

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tame Your Unruly Data… With Constraints!' - patia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speaker bio contact
Speaker Bio / Contact
  • SQL Server DBA/Developer since 1998 / 6.5
  • Started with dBase III & Clipper
    • And of course MS Access
  • Wireless telecom, utility billing, credit cards
  • SQLTeam moderator & contributor
  • Blog: http://weblogs.sqlteam.com/robv/
  • Twitter: @sql_r
  • Email: sql_r@outlook.com
  • I do deranged things in SQL Server, so you don’t have to
what are constraints
What are Constraints?

constraint |kənˈstrānt|

noun

a limitation or restriction : the availability of water is the main constraint on food production | time constraints make it impossible to do everything.

  • Constraints maintain the integrity of your database
    • Domain – what kind of data types/attributes
    • Entity – what kind of entities/objects/rows
    • Relational – control relations between entities
  • Constraints don’t allow good data, they prevent “bad” data
      • But be mindful of what your constraints allow
  • Let me tell you a story... (SalesOrderDetails)
why don t we use constraints
Why Don’t We Use Constraints?
  • Developer ignorance
    • Not necessarily stupid, just didn’t know
  • Cross-database integrity not supported
  • Multi-platform support (or lack thereof)
  • Data cleansing as a career (e.g. Data Quality Services)
  • “The app does all the checking, we don’t need them”
    • Yeah, right. We’ll get to that later
what aren t constraints
What aren’t Constraints?
  • Identity columns
  • Application Logic
  • Triggers
  • Security
  • Development environment
  • Execution restrictions
  • Server Policies (PBM etc.)
  • XML Schema Collections (these are a good thing though)

Still important, just not managed using Constraints

Still important, just not managed using Constraints

examples
Examples
  • February 31
  • Incomplete phone numbers
    • Still invalid if padded or trimmed
  • Duplicate rows*
  • Sales to non-existent customer accounts
    • Did they get the merchandise? Did they pay us? WHO ARE THESE PEOPLE?
but really how bad could it be
But really, how bad could it be?
  • Ariane Flight 501 (16 bit integer overflow)
  • Mars Climate Orbiter (metric vs. US)
  • USS Yorktown (1997, divide by zero)
  • F-22 (crashes, Intl. Date Line) & F-35 (entire fleet grounded)
  • Qantas Flight 72 (didn’t crash, nose dived twice, injuries)
  • Patriot Missile (clock drift, missed target, soldiers died)
  • Therac-25 (deaths)
    • Not data but process, lack of safeguards at every level
  • More: http://goo.gl/4Akqm
domain constraints
Domain Constraints
  • Domain – fancy name for type or attribute (column in SQL)
    • Data type - what kind of data you’re storing (number, date, etc.)
    • Nullability- whether it’s required or not
    • Default
    • Check - what values are valid and invalid
  • Demo (discovery script)
domain constraints types
Domain Constraints - Types
  • First constraint: Data type
    • Choose the right type
    • VARCHAR(max) is a bad sign
    • Float or real -> Bad!
    • Money -> Better (but still problematic)
    • Dates (use proper date/time types)
  • Good data type = storage and performance efficiency
    • But don’t be TOO efficient

Ariane 501

NVARCHAR(max) is overkill for a phone #

Ariane 501

domain constraints not null
Domain Constraints – NOT NULL
  • Limit or eliminate nullable columns
  • Nulls are not comparable, even to themselves
    • Not applicable vs. unknown
      • Which one does Null represent?
  • Nulls cannot be typed properly (or at all)
  • Read Chris Date
  • In SQL Server, nulls have storage overhead
domain constraints default
Domain Constraints - Default
  • Use this default when value is unknown or unsupplied
  • N/A vs. Unknown can be addressed with defined values
  • Can be useful in cascading foreign keys
    • SET DEFAULT supported in SQL Server 2005+
domain constraints check
Domain Constraints - CHECK
  • Restrict values further than what data type allows
    • Dates
    • Numeric/Money
    • Character
  • Read Alex Kuznetsov
entity constraints
Entity Constraints
  • Entity – fancy name for row or record (tuple – even fancier)
    • Primary Key - value(s) that identify the entity
    • Unique - value(s) that must be unique
    • Table-level check - values that are valid or invalid
  • One or more domains/attributes/columns
  • Demo (discovery script)
domain constraints external
Domain Constraints - External
  • Very often ignored, not all values are valid (special meaning)
  • Doesn’t matter if external system can’t provide invalid value
  • Read Joe Celko
  • Read Karen López (@DataChick)

Data, Measurements and Standards in SQL

domain constraints internal
Domain Constraints - Internal
  • Very often ignored (and rarely documented)
    • $10,000 utility bills for single (unoccupied) apartment
    • Percentage values (tax rate, discount)
      • See AdventureWorks Sales tables (smallmoney)
    • App could handle, but then must also fix if wrong
    • Date range overlaps (Alex Kuznetsov)
entity constraints primary key
Entity Constraints – Primary Key
  • Uniquely identifies the row/record/tuple/entity
  • Must provide a value (NOT NULL by definition)
  • Logical construct, NOT physical
  • Surrogate vs. natural key
  • Celko: if you don’t have a key, you don’t have a table
  • Demo (discovery script)
entity constraints unique
Entity Constraints - Unique
  • Candidate keys (natural keys)
    • Phone number is unique, but not good candidate for PK
    • Screen Actors Guild (billed/credited name)
    • SSN, Credit card # (unique but can’t be used for ID)
  • Multiple columns are unique
  • Constraint vs. index (logical vs. physical)
entity constraints check
Entity Constraints - CHECK
  • Multi-column conditions
    • Pregnancy status (N/A for males or young girls)
      • Unless you're British...http://goo.gl/msSeP
  • ANSI/ISO supports subqueries
  • Single table only
  • e.g. Force date- or time-only values in datetime columns (< SQL 2008)
  • Demo (discovery script)
relational constraints
Relational Constraints
  • Foreign keys
    • Requires data in one table to reference another table’s data
    • Referenced table (parent)
    • Referencing table (child)
    • Can be one OR MORE columns
  • Referential actions on UPDATE/DELETE in referenced
  • Demo (discovery script)
demos
Demos
  • AdventureWorks
    • Good (Birth and Hire dates)
    • Could be better (Salesperson commission pct.)
    • SalesOrderDetail
performance
Performance
  • UNIQUE/PRIMARY keys can improve performance
    • Demo
    • Can also save space in index structures
  • CHECK and Foreign Key
    • Eliminate data access if WHERE clause violates CHECK constraint
      • Must be trusted (next slides)
trusted constraints
Trusted Constraints
  • ALTER TABLE…CHECK/NOCHECK CONSTRAINT
  • CHECK WITH CHECK…
  • Performance improvement
    • Demo
    • Partitioned views
untrusted constraints
Untrusted Constraints
  • Constraint could be enabled but data could be invalid
  • Performance hit
    • Demo
  • How to detect:
    • sys.check_constraints.is_not_trusted = 1
    • sys.foreign_keys.is_not_trusted= 1
  • Primary key and Unique are always trusted
considerations
Considerations
  • Be wary of “meets our current needs”
  • Be wary of “will also meet all possible future needs”
    • Aaron Bertrand “Bad Habits to Kick” blog
  • If there is a standard defined for your industry, use it (Celko – ISO)
    • Sanity check, make sure you covered everything
    • If you must deviate from the standard, document why
  • Multi-platform support
    • Doomed
    • Least-feature support = baseline
      • Constraints (if any) go into app logic
  • Use DBCC CHECKCONSTRAINTS! (thanks to Neil Hambly for reminding me!)
    • PASS Data Architecture Virtual Chapter! June 21 2012: http://dataarch.sqlpass.org/
domain support
DOMAIN Support
  • ANSI/ISO
  • Combines data type, nullability, default and checks in one object
  • Not available in SQL Server 
    • Please vote on Connect: http://goo.gl/HbRo2
  • PostgreSQL support! Example:

CREATE DOMAINus_postal_codeAS varchar(10)

NOT NULL

DEFAULT('00000')

CHECK(

VALUE ~ '^\\d{5}$'

ORVALUE ~ '^\\d{5}-\\d{4}$'

);

  • Can be simulated with user-defined types and rules…
rules
Rules
  • Deprecated vendor extension (Celko gripe)
    • Since 7.0, still supported in SQL 2012
    • Not part of formal definition of object
  • Must reference data value and have condition on it
    • No “anonymous” rules like with constraints
  • Only one rule can bind to an object
    • But that rule can have multiple conditions
    • And rule can bind to multiple objects
  • Which is too bad, because…
unusual constraint use
Unusual Constraint Use
  • Security
    • Prevent sa or sysadmin updates/inserts
  • Execution
    • Ad-hoc updates/inserts
    • Cursors
    • Connection settings like ANSI_PADDING
    • Time-of-day
    • Production vs. Dev/QA/UAT environments (thanks Argenis!)
  • See blog post: http://goo.gl/p23NL
  • Unusual data restrictions
can you have too many
Can you have too many?
  • IMHO, no 
  • What about performance?
  • Layers?
    • App logic + stored procedure + triggers + constraints + rules
      • Don’t assume a single layer can do all the work
  • What’s the cost of fixing bad data?
    • And will it actually get fixed?
  • Importing/migrating bad data or design
    • Cruft from old systems
    • ETL – transform bad data, but don’t constrain it in destination

Therac-25

Never time to do it right,

Always time to do it over

-Ron Soukup, SQL Server PM

Ariane 501

slide29
But…
  • IF:
    • Customers don’t care
    • Boss doesn’t care
    • Devs/$$$ Contractor say app will cover
  • Make them personally fix any bad data
    • Yes, that includes customers
    • Have contractor refund fee AND fix bad data at no cost 
  • If they don’t agree or can’t be convinced:
    • Add constraintsanyway
    • Unit testing

An ounce of prevention is worth a metric ton of cure

but but
But…but…
  • Now I’m just having fun…
  • Devs say Application does everything…
    • And DB constraints will make sure it works
    • Then why use SQL Server? Write your own storage engine. Or use NoSQL. The data will be right…eventually.
    • In 5 years we’ll rewrite your app in Ocaml, or Scala, or Blub
  • Moral: you’re using SQL Server anyway, use what it offers
    • If app actually works correctly, constraints won’t interfere
    • Even if you switch RDBMS, USE CONSTRAINTS!

Credit: Andy Leonard & Buck Woody

and if you think i m strict
And if you think I’m strict…
  • SQLite testing procedure: http://goo.gl/GfvIH
    • 1177x as much test code as program code
  • Space Shuttle Software Development: http://goo.gl/1k2kt
    • 11 versions * 400K+ lines each = 17 errors total
  • F-35 (JSF) Software coding standard: http://goo.gl/4rPxu
    • 141 pages on C++ programming guidelines
    • Which is really ironic…
  • Mars Spacecraft (Curiosity, Opportunity/Spirit, Phoenix)
    • Extremely limited resources (20 MHz/10 MB RAM)
conclusion
Conclusion
  • Types of constraints
    • Domain, Entity, Relational
    • Why you need them
  • Process, procedure and environmental
    • Apps and triggers can’t reliably protect or enforce
    • New apps/processes must duplicate all logic
  • Assume layers/components/processes will fail or change
    • Protect data and DB integrity with constraints
    • Constraints document integrity rules
  • Use them! And be sure to check on them (DBCC)
references
References
  • Books Online!
  • MSDN
  • Books
    • Beginning SQL Server 2008 for Developers: From Novice to Professional by Robin Dewson (Apress)
    • Pro SQL Server 2008 Relational Database Design and Implementationby Louis Davidson, with Kevin Kline, Scott Klein, Kurt Windisch(Apress)
    • A Developer’s Guide to Data Modeling for SQL Server, by Eric Johnson and Joshua Jones (Addison-Wesley)
    • Defensive Database Programming by Alex Kuznetsov (Red Gate Books, PDF link below)
    • SQL Antipatterns: Avoiding the Pitfalls of Database Programming by Bill Karwin (Pragmatic Bookshelf)
    • SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date(O’Reilly)
  • Blogs/Sites
    • http://weblogs.sqlteam.com/
    • http://sqlblog.com/ (lots of great people)
    • http://sqlskills.com/ (Paul Randal, Kimberly Tripp, Jonathan Kehayias, Bob Beuchemin, Erin Stellato, Joe Sack, Glenn Berry)
    • http://tsql.solidq.com/ (Itzik Ben-Gan)
    • http://www.simple-talk.com/sql/database-administration/constraints-and-the-test-driven-database/
    • http://www.simple-talk.com/books/sql-books/defensive-database-programming/
    • http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/28/using-rowversion-to-enforce-business-rules.aspx
    • http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/03/08/storing-intervals-of-time-with-no-overlaps.aspx
    • http://sqlblog.com/blogs/hugo_kornelis/archive/2007/03/29/can-you-trust-your-constraints.aspx
    • http://goo.gl/xCwdK (Datachix - Trusted Constraints)
  • More Books
    • Microsoft SQL Server 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp, Conor Cunningham, Adam Machanic (MSPress)
    • SQL Server MVP Deep Dives by SQL Server MVPs for War Child International (Manning)
    • SQL Server MVP Deep Dives 2 by SQL Server MVPs for Operation Smile (Manning)
    • Microsoft SQL Server 2008: T-SQL Fundamentalsby Itzik Ben-Gan (MSPress)
    • SQL for Smartiesby Joe Celko (Morgan Kauffman)
coming up

Coming up…

  • #SQLBITS
thanks
Thanks!

SQLBitsSponsors, Organizers & Volunteers!