1 / 35

Tame Your Unruly Data… With Constraints!

Tame Your Unruly Data… With Constraints!. Rob Volk. Speaker Bio / Contact. SQL Server DBA/Developer since 1998 / 6.5 Started with dBase III & Clipper And of course MS Access Wireless telecom, utility billing, credit cards SQLTeam moderator & contributor

ryans
Download Presentation

Tame Your Unruly Data… With Constraints!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tame Your Unruly Data…With Constraints! Rob Volk

  2. Speaker Bio / Contact • SQL Server DBA/Developer since 1998 / 6.5 • Started with dBase III & Clipper • And of course MS Access • Wireless telecom, utility billing, credit cards • SQLTeam moderator & contributor • Blog: http://weblogs.sqlteam.com/robv/ • Twitter: @sql_r • Email: sql_r@outlook.com • I do deranged things in SQL Server, so you don’t have to

  3. What are Constraints? constraint |kənˈstrānt| noun a limitation or restriction : the availability of water is the main constraint on food production | time constraints make it impossible to do everything. • Constraints maintain the integrity of your database • Domain – what kind of data types/attributes • Entity – what kind of entities/objects/rows • Relational – control relations between entities • Constraints don’t allow good data, they prevent “bad” data • But be mindful of what your constraints allow • Let me tell you a story... (SalesOrderDetails)

  4. Why Don’t We Use Constraints? • Developer ignorance • Not necessarily stupid, just didn’t know • Cross-database integrity not supported • Multi-platform support (or lack thereof) • Data cleansing as a career (e.g. Data Quality Services) • “The app does all the checking, we don’t need them” • Yeah, right. We’ll get to that later

  5. What aren’t Constraints? • Identity columns • Application Logic • Triggers • Security • Development environment • Execution restrictions • Server Policies (PBM etc.) • XML Schema Collections (these are a good thing though) Still important, just not managed using Constraints Still important, just not managed using Constraints

  6. Examples • February 31 • Incomplete phone numbers • Still invalid if padded or trimmed • Duplicate rows* • Sales to non-existent customer accounts • Did they get the merchandise? Did they pay us? WHO ARE THESE PEOPLE?

  7. But really, how bad could it be? • Ariane Flight 501 (16 bit integer overflow) • Mars Climate Orbiter (metric vs. US) • USS Yorktown (1997, divide by zero) • F-22 (crashes, Intl. Date Line) & F-35 (entire fleet grounded) • Qantas Flight 72 (didn’t crash, nose dived twice, injuries) • Patriot Missile (clock drift, missed target, soldiers died) • Therac-25 (deaths) • Not data but process, lack of safeguards at every level • More: http://goo.gl/4Akqm

  8. Domain Constraints • Domain – fancy name for type or attribute (column in SQL) • Data type - what kind of data you’re storing (number, date, etc.) • Nullability- whether it’s required or not • Default • Check - what values are valid and invalid • Demo (discovery script)

  9. Domain Constraints - Types • First constraint: Data type • Choose the right type • VARCHAR(max) is a bad sign • Float or real -> Bad! • Money -> Better (but still problematic) • Dates (use proper date/time types) • Good data type = storage and performance efficiency • But don’t be TOO efficient Ariane 501 NVARCHAR(max) is overkill for a phone # Ariane 501

  10. Domain Constraints – NOT NULL • Limit or eliminate nullable columns • Nulls are not comparable, even to themselves • Not applicable vs. unknown • Which one does Null represent? • Nulls cannot be typed properly (or at all) • Read Chris Date • In SQL Server, nulls have storage overhead

  11. Domain Constraints - Default • Use this default when value is unknown or unsupplied • N/A vs. Unknown can be addressed with defined values • Can be useful in cascading foreign keys • SET DEFAULT supported in SQL Server 2005+

  12. Domain Constraints - CHECK • Restrict values further than what data type allows • Dates • Numeric/Money • Character • Read Alex Kuznetsov

  13. Entity Constraints • Entity – fancy name for row or record (tuple – even fancier) • Primary Key - value(s) that identify the entity • Unique - value(s) that must be unique • Table-level check - values that are valid or invalid • One or more domains/attributes/columns • Demo (discovery script)

  14. Domain Constraints - External • Very often ignored, not all values are valid (special meaning) • Doesn’t matter if external system can’t provide invalid value • Read Joe Celko • Read Karen López (@DataChick) Data, Measurements and Standards in SQL

  15. Domain Constraints - Internal • Very often ignored (and rarely documented) • $10,000 utility bills for single (unoccupied) apartment • Percentage values (tax rate, discount) • See AdventureWorks Sales tables (smallmoney) • App could handle, but then must also fix if wrong • Date range overlaps (Alex Kuznetsov)

  16. Entity Constraints – Primary Key • Uniquely identifies the row/record/tuple/entity • Must provide a value (NOT NULL by definition) • Logical construct, NOT physical • Surrogate vs. natural key • Celko: if you don’t have a key, you don’t have a table • Demo (discovery script)

  17. Entity Constraints - Unique • Candidate keys (natural keys) • Phone number is unique, but not good candidate for PK • Screen Actors Guild (billed/credited name) • SSN, Credit card # (unique but can’t be used for ID) • Multiple columns are unique • Constraint vs. index (logical vs. physical)

  18. Entity Constraints - CHECK • Multi-column conditions • Pregnancy status (N/A for males or young girls) • Unless you're British...http://goo.gl/msSeP • ANSI/ISO supports subqueries • Single table only • e.g. Force date- or time-only values in datetime columns (< SQL 2008) • Demo (discovery script)

  19. Relational Constraints • Foreign keys • Requires data in one table to reference another table’s data • Referenced table (parent) • Referencing table (child) • Can be one OR MORE columns • Referential actions on UPDATE/DELETE in referenced • Demo (discovery script)

  20. Demos • AdventureWorks • Good (Birth and Hire dates) • Could be better (Salesperson commission pct.) • SalesOrderDetail

  21. Performance • UNIQUE/PRIMARY keys can improve performance • Demo • Can also save space in index structures • CHECK and Foreign Key • Eliminate data access if WHERE clause violates CHECK constraint • Must be trusted (next slides)

  22. Trusted Constraints • ALTER TABLE…CHECK/NOCHECK CONSTRAINT • CHECK WITH CHECK… • Performance improvement • Demo • Partitioned views

  23. Untrusted Constraints • Constraint could be enabled but data could be invalid • Performance hit • Demo • How to detect: • sys.check_constraints.is_not_trusted = 1 • sys.foreign_keys.is_not_trusted= 1 • Primary key and Unique are always trusted

  24. Considerations • Be wary of “meets our current needs” • Be wary of “will also meet all possible future needs” • Aaron Bertrand “Bad Habits to Kick” blog • If there is a standard defined for your industry, use it (Celko – ISO) • Sanity check, make sure you covered everything • If you must deviate from the standard, document why • Multi-platform support • Doomed • Least-feature support = baseline • Constraints (if any) go into app logic • Use DBCC CHECKCONSTRAINTS! (thanks to Neil Hambly for reminding me!) • PASS Data Architecture Virtual Chapter! June 21 2012: http://dataarch.sqlpass.org/

  25. DOMAIN Support • ANSI/ISO • Combines data type, nullability, default and checks in one object • Not available in SQL Server  • Please vote on Connect: http://goo.gl/HbRo2 • PostgreSQL support! Example: CREATE DOMAINus_postal_codeAS varchar(10) NOT NULL DEFAULT('00000') CHECK( VALUE ~ '^\\d{5}$' ORVALUE ~ '^\\d{5}-\\d{4}$' ); • Can be simulated with user-defined types and rules…

  26. Rules • Deprecated vendor extension (Celko gripe) • Since 7.0, still supported in SQL 2012 • Not part of formal definition of object • Must reference data value and have condition on it • No “anonymous” rules like with constraints • Only one rule can bind to an object • But that rule can have multiple conditions • And rule can bind to multiple objects • Which is too bad, because…

  27. Unusual Constraint Use • Security • Prevent sa or sysadmin updates/inserts • Execution • Ad-hoc updates/inserts • Cursors • Connection settings like ANSI_PADDING • Time-of-day • Production vs. Dev/QA/UAT environments (thanks Argenis!) • See blog post: http://goo.gl/p23NL • Unusual data restrictions

  28. Can you have too many? • IMHO, no  • What about performance? • Layers? • App logic + stored procedure + triggers + constraints + rules • Don’t assume a single layer can do all the work • What’s the cost of fixing bad data? • And will it actually get fixed? • Importing/migrating bad data or design • Cruft from old systems • ETL – transform bad data, but don’t constrain it in destination Therac-25 Never time to do it right, Always time to do it over -Ron Soukup, SQL Server PM Ariane 501

  29. But… • IF: • Customers don’t care • Boss doesn’t care • Devs/$$$ Contractor say app will cover • Make them personally fix any bad data • Yes, that includes customers • Have contractor refund fee AND fix bad data at no cost  • If they don’t agree or can’t be convinced: • Add constraintsanyway • Unit testing An ounce of prevention is worth a metric ton of cure

  30. But…but… • Now I’m just having fun… • Devs say Application does everything… • And DB constraints will make sure it works • Then why use SQL Server? Write your own storage engine. Or use NoSQL. The data will be right…eventually. • In 5 years we’ll rewrite your app in Ocaml, or Scala, or Blub • Moral: you’re using SQL Server anyway, use what it offers • If app actually works correctly, constraints won’t interfere • Even if you switch RDBMS, USE CONSTRAINTS! Credit: Andy Leonard & Buck Woody

  31. And if you think I’m strict… • SQLite testing procedure: http://goo.gl/GfvIH • 1177x as much test code as program code • Space Shuttle Software Development: http://goo.gl/1k2kt • 11 versions * 400K+ lines each = 17 errors total • F-35 (JSF) Software coding standard: http://goo.gl/4rPxu • 141 pages on C++ programming guidelines • Which is really ironic… • Mars Spacecraft (Curiosity, Opportunity/Spirit, Phoenix) • Extremely limited resources (20 MHz/10 MB RAM)

  32. Conclusion • Types of constraints • Domain, Entity, Relational • Why you need them • Process, procedure and environmental • Apps and triggers can’t reliably protect or enforce • New apps/processes must duplicate all logic • Assume layers/components/processes will fail or change • Protect data and DB integrity with constraints • Constraints document integrity rules • Use them! And be sure to check on them (DBCC)

  33. References • Books Online! • MSDN • Books • Beginning SQL Server 2008 for Developers: From Novice to Professional by Robin Dewson (Apress) • Pro SQL Server 2008 Relational Database Design and Implementationby Louis Davidson, with Kevin Kline, Scott Klein, Kurt Windisch(Apress) • A Developer’s Guide to Data Modeling for SQL Server, by Eric Johnson and Joshua Jones (Addison-Wesley) • Defensive Database Programming by Alex Kuznetsov (Red Gate Books, PDF link below) • SQL Antipatterns: Avoiding the Pitfalls of Database Programming by Bill Karwin (Pragmatic Bookshelf) • SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date(O’Reilly) • Blogs/Sites • http://weblogs.sqlteam.com/ • http://sqlblog.com/ (lots of great people) • http://sqlskills.com/ (Paul Randal, Kimberly Tripp, Jonathan Kehayias, Bob Beuchemin, Erin Stellato, Joe Sack, Glenn Berry) • http://tsql.solidq.com/ (Itzik Ben-Gan) • http://www.simple-talk.com/sql/database-administration/constraints-and-the-test-driven-database/ • http://www.simple-talk.com/books/sql-books/defensive-database-programming/ • http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/28/using-rowversion-to-enforce-business-rules.aspx • http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/03/08/storing-intervals-of-time-with-no-overlaps.aspx • http://sqlblog.com/blogs/hugo_kornelis/archive/2007/03/29/can-you-trust-your-constraints.aspx • http://goo.gl/xCwdK (Datachix - Trusted Constraints) • More Books • Microsoft SQL Server 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp, Conor Cunningham, Adam Machanic (MSPress) • SQL Server MVP Deep Dives by SQL Server MVPs for War Child International (Manning) • SQL Server MVP Deep Dives 2 by SQL Server MVPs for Operation Smile (Manning) • Microsoft SQL Server 2008: T-SQL Fundamentalsby Itzik Ben-Gan (MSPress) • SQL for Smartiesby Joe Celko (Morgan Kauffman)

  34. Coming up… • #SQLBITS

  35. Thanks! SQLBitsSponsors, Organizers & Volunteers!

More Related