1 / 56

Practical Query Governance & Data Security

Practical Query Governance & Data Security. UPDATED!. TCC 2013 . akrinsky@tableausoftware.com. Agenda. Terms Security filtering Managing big data Drill-down design patterns Limiter design patterns. Terms. Data Governance Data Security Query Governance. “Data Governance”.

hanne
Download Presentation

Practical Query Governance & Data Security

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Query Governance & Data Security UPDATED! TCC 2013 akrinsky@tableausoftware.com

  2. Agenda Terms Security filtering Managing big data Drill-down design patterns Limiter design patterns

  3. Terms Data Governance Data Security Query Governance

  4. “Data Governance” • “The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organization.” • Wikipedia

  5. In-scope • Data Security • Authentication • Data encryption • Row-level security • Query Governance

  6. Data Security • Data encryption • Use the database: Tableau will not decrypt • Tableau provides 2 methods for authenticating into server • AD Security • Local security • v8 introduced Datasource filters • Immutable • Table row-level filters on saved datasources • Security enforced through SQL where clauses: must be modeled

  7. Datasource Filters Like common filters, but applied to the datasourceobject directly. Useful session security expressions can bind rules to user identities. Saved with the datasource Can be published with the datasource.

  8. Session security expressions • USERNAME() • Username of logged in user, otherwise Windows username. Authors may impersonate other server users. • ISMEMBEROF(<string literal>) • FULLNAME() • USERDOMAIN() • Tableau server domain or windows domain • ISFULLNAME(< string literal>) • ISUSERNAME(< string literal>)

  9. Use data server to “bake” security into datasources

  10. Data Server • Control for IT: • Users cannot override calculations • Users cannot edit joins or connection information • Users cannot write SQL • Users cannot alter or republish data sources • Authentication can be fixed or prompted • Flexibility for users • Can access remotely over HTTP • Can write new calculations • Can blend desktop data • No need to download data/extracts • No drivers to install • Can leverage power of Tableau Server

  11. Group/Role vs. User Filtering Superstore Tables 1:0 or many Restrict to regions user manages: [Users].[Manager]=Username() Tableau wants to see a STRING literal here (not a dynamic variable) THIS WILL NOT WORK: IsMemberOf ([Users].[Region])

  12. Dynamic group filtering does NOT work Does not work! Works!

  13. Adding users to groups on Tableau server

  14. User IsMemberOf() for role based permissions • ISMEMBEROF(“<ROLE>”) • Use group membership for capabilities/ROLE based security: • ISMEMBEROF(“HR”) • ISMEMBEROF(“FINANCE”) • ISMEMBEROF(“ADMIN”)

  15. User username() for dynamic security <NAMEFIELD>=USERNAME() for user or group-level dynamic security

  16. User security design pattern Inner joins Fact 1:many 1:many Table 1 Table N USERNAME USERNAME USERNAME

  17. Blend possible, but not recommended • Filtering on secondary dimensions supported in v8. • But… • Defined for view: can be easily defeated or misapplied by author. • Can explode memory footprint if security table is too large. Fact blend Security Table (in another datasource) USERNAME USERNAME

  18. View-level filter test

  19. Database enforced join

  20. Make immutable with Datasource filters

  21. Works same as local filter but tamper-proof

  22. Impersonation different for published datasource Note that when publishing the datasource, Desktop user impersonation will stop working. To test different users, edit datasource connection.

  23. Big Data: what can possibly go wrong? The data Innocent Tableau user! Data source just dragand….

  24. When too much data becomes a problem • Reports take too long to render • Report interactions are too sluggish • Tableau (Server often) is unstable • Memory issues may be latent. • You may not notice a problem until reports stop processing (spinners on one or more panes of a dashboard) • Long-running queries are outraging the DBA

  25. What is a lot of marks? Tableau renders images fast. 1 million marks is fast, fluid, natural. 5 million? 1 trillion? Can you make sense of 1 million marks???

  26. What is a lot of data? • Report complexity, query working set size, and cache settings, affect memory footprint. • Render time is not always a good indicator of memory utilization (ie. Table calcs are fast). • v8.0 VizQL processes are 32-bit process (2GB or 4GB per process). • 8.1 with 64-bit (16 TB) practically erases limits. • As a rule of thumb: • 100 mb is a lot of data for a view to process. • If caching is enabled, 10 mb per view adds up quickly.

  27. Sizing query results Function of aggregation, data types and factorial of dimensions and rows. Database profiling tools can often tell you exactly how much data is being transferred/requested. Check size/cardinality of dimensions using “describe field” -> “load”

  28. Specifics: MSQL “Client Statistics” Example: 1 million row fact table query = 80 mb

  29. Cardinality of dimensions (high = lots)

  30. How to blow up a VizQL process? • You’re rarely going to see it on million row datasets but as results get larger, it’s possible… • Drag-drop high-cardinality dimension in DW • ie. SKU, names or IDs • (Inadvertently) tell Tableau to NOT aggregate huge queries. • Blending per se, will not constrain the size of the secondary query. • Table calculations are applied AFTER data is returned. • A “top N” filter on “index()” requires all rows to be retrieved FIRST.

  31. Caution: your report may run fine in Desktop! Yes, memory-hogging reports may “just work” on your laptop. Desktop = dedicated server, per user. Desktop processing is easier: no HTML to render. Server does not always release memory (especially if “refresh less often” is enabled)

  32. Use common sense Can you make sense of 1 million points? How many rows does your report require to render 1,000 marks? If queries require 10x as many rows, or more, you may have a report design issue. If your data set has 1 million rows or less, it probably doesn’t matter. If it’s more, consider defensive maneuvers.

  33. Tableau Warnings: Informed Consent

  34. Take precautions • Defensive data source definition • Be mindful of high-cardinality dimensions • Avoid index() filters (post query) • Avoid blends (post query) • Or keep secondary data sources small • By default, show no/few records • Use data source filters/parameters • Summary tables • Consider extracts • Use summary-detail design pattern • In-database query governance

  35. Summary Tables • Summarize data you will query often using aggregate functions (SUM, AVG, MIN, MAX) and GROUP BY on dimensions. • Create summary tables in the database (materialized views make this easy) • Extracts are a great option (up to 1 billion rows) • Remember to use the “aggregate” option on creation. • You are generally “safe” with extracts (memory-backed file architecture)

  36. Summary Extracts

  37. Summary-Detail Design Pattern

  38. Summary-Detail filter action Make sure to “Exclude all values” when not selected

  39. When page loads, no detail shown.

  40. Database Top N • Syntax varies by database • MySQL, Postres, Vertica, Pivotal, Netezza syntax • SELECT column_name(s)FROM table_nameLIMIT number • Oracle syntax • SELECT column_name(s)FROM table_nameWHERE ROWNUM <= number; • SQL Server / MS Access / Excel syntax • SELECT TOP numbercolumn_name(s)FROM table_name;

  41. Excel example SELECT TOP 1000

  42. Top N in Tableau • Simple T/F on # number of rows • Does not work! • Top N on a dimension • Works! • But not immutable • Sets & T/F calculated field • Works! • Can bind to data source filter (immutable) • Using index()<top N • Works! • But does not filter.

  43. But remember… order of evaluation # = order of evaluation. Are we filtering on a pre-limited subset??? 5. Where you would want a LIMIT function (if other filters) Where you can add one easily 4. Filter Shelf (no context) Top N on dim 3. Context Filters 2. Datasource Filters Top N with sets 1. Custom SQL or DB view Top N or LIMIT in SQL

  44. What about 100% Custom SQL filters But to get accurate results, you can’t use these filters AFTER limit has been applied. Ok if logical. Less Ok if arbitrary. 4. Filter Shelf (no context) 3. Context Filters 2. Datasource Filters 1. Custom SQL or DB view

  45. So… Top N as a governor: pick 2

  46. A note about context filters… • By default, context filters write results to temp tables • With big data… writing temp tables can be slow (depending on database) • On ODBC and some databases, can you disable temp file generation through datasource XML. <customization name='CAP_CREATE_TEMP_TABLES' value='no' /> <customization name='CAP_SELECT_INTO' value='no' />

  47. So… We can use Top N for governance, but then we won’t necessarily get the right results. But if we choose methods that get the right results, either defeatable by the author (not bound to the datasource/immutable), or actually do not actually limit rows (table calcs) Can you constrain your query non-arbitrarily?

  48. Key insights so far… • Be thoughtful in exposing and authoring with large dimensions without summarization • Consider non-arbitrary (ie. Top N) filters/parameters to constrain dimensions to a sensible domain • Datasource parameters (new in 8.0) • Custom SQL, Views • Stored procedures (new in 8.1) • SAP BW variables • Leave run-away query governance to the database itself – if your database supports it. • Then, detect when the database is truncating results and indicate if necessary.

  49. In-database query governance # = order of evaluation 5. In-database query governor for Tableau user Truncates but does not introduce inaccuracies 4. Filter Shelf (no context) 3. Context Filters 2. Datasource Filters 1. Custom SQL or DB view

  50. In-database query governors • Elephant in the room: • Tableau does not have a mid-tier query governor to wrapper the final SQL w/limit • And… most but not all query governors are resource not row-constrained • Bullet-proof, but functionality does not exist for all databases – I think. • Set initial SQL coming to more databases in 8.2 (supported in Teradata today) • Work with your DBA to setup resource limits • I welcome your feedback: akrinsky@tableausoftware.com

More Related