1 / 12

Custom SQL in Tableau

Custom SQL in Tableau. Ashley Ohmann June 20, 2013. What is Custom SQL? What can I do with it? Join conditions Unions and Self Joins Ranks Derived Tables. Agenda. Tableau queries each data source using SQL that’s specific to the data type

astrid
Download Presentation

Custom SQL in Tableau

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Custom SQL in Tableau Ashley Ohmann June 20, 2013

  2. What is Custom SQL? • What can I do with it? • Join conditions • Unions and Self Joins • Ranks • Derived Tables Agenda

  3. Tableau queries each data source using SQL that’s specific to the data type • The differences querying different data source types are in • Syntax • Functions • Field length limitations • Tableau allows the SQL used to query a data source to be customized (relatively easily) in order to manipulate the joins, filters, and field lengths and types to produce a more accurate output Tableau and SQL

  4. Every data connection, whether it’s to one or multiple tables, has custom SQL underpinning it. • The syntax between different database types varies—for instance, in most flat file and server-based data sources, table and field names are identified by brackets. • With Hive, the bracket has been replaced by ` • The functions in a query must be functions that the database can understand, which is a particular issue with SQL queries that aggregate • For the most part, all databases use some version of SQL, but there are minor differences • For instance, flat Microsoft-based sources, like Access, Excel, and flat files, use the conventions of Microsoft’s JET driver • Another example: the ISNULL function, which is used to replace nulls in SQL Server data sources and is a corollary to IFNULL in Tableau, is not read by Teradata—one would use COALESCE instead. Basics of Custom SQL

  5. Join conditions tell the database how the fields in different tables are related to each other • A “table” is a queried set of data, not necessarily all of the rows and columns in a pre-defined table in a database • Custom SQL is a good tool for using joins to transform a data source • Inner join: keeps records common to both tables; this join type usually has the least number of null values, provided that the join condition is the right one • Left join: keeps all records from the first table and the matching records from the second table • Right join: keeps all records from the second table and the matching records from the first table • Outer/full join: keeps all records from both tables • Self join: refers to the same table in the inner and outer query. A union can be a self join. • Union: stacks tables on top of each other. Basics: Join Conditions

  6. Unions are particularly useful for transforming a data source, either because it was not set-up properly, or because one would like to create a visualization that needs for it to be set-up in a certain way • The Region Sales example data source presents the current and previous year’s sales by Division and Region for two different battery types. • The format is similar to that of a pivot table, and the problem with analyzing it is that the same measure is duplicated according to a qualitative attribute. • In order to analyze this data source and to create metrics based on the measures, it needs to be restructured: • We need to add a field that identifies the battery type as an attribute • We need to put the sales for this year and last year, respective, in one column Unions

  7. Connect to the Union Data.xls data source, and select the Region Sales sheet. • Click the “Custom SQL” option under Step 2 of the data connection • Click the expansion button to see the whole Custom SQL window • Re-order the fields so that Division comes first, then Region • Highlight and copy all of the text, then underneath it, type UNION on a new row, then create another two carriage returns, and paste the code • In the outer query, create a new field that identifies the battery type as an attribute • Alias the battery type’s sales this year with a generic field name, and do the same for last year’s sales. Then, delete the fields that reference the other battery type • In the bottom query, paste the field that identifies the battery type attribute, but change the value to “Energizer” • Alias the sales fields for that battery type so that they have the same names as the sales fields in the outer query • Make sure that the commas are in the right places • Click OK, and then give the data source a good name Create a Union

  8. Duplicate the data source, and call it “Union with Calculated Fields” • We’ll use the existing join conditions to add in some other calculations • Custom SQL can be used to apply conditions to different groups of data based on an attribute of the group • It might be useful to add a quota, and in this case, the quote will be the previous year’s sales multiplied by 1.15 for Energizer and 1.10 for Duracell • Edit the Custom SQL to multiple the previous year’s sales by the factors listed above, and call the new field [Quota] • This data is useful for making a nice little bullet chart. Add a calculated field in Tableau identifying whether or not the sales exceeded the quota, and put it on the color shelf. Add Calculated Fields

  9. One of the shortcomings of Tableau is that it doesn’t have a real ranking function • RAWSQL can be used to rank, but for a large data set with ranking conditions on several fields, it’s best to edit the SQL and include a ranking function • It doesn’t look like an Excel data connection supports this, so one would need to pre-rank the data in Excel • For SQL Server or Oracle data connections, RANK and DENSE_RANK functions are appropriate • Syntax: RANK ( ) OVER ( [ partition_by_clause ] order_by_clause) • Looks like: RANK() OVER (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank • I have added the sample data to my SQL Server and written a ranking query—see the next slide. Ranking

  10. SELECT [Region Sales].[Division] AS [Division], [Region Sales].[Region] AS [Region], 'E' as [Battery Type], [Region Sales].[Energizer] as [TY Sales], DENSE_RANK() OVER (ORDER BY [Region Sales].[Energizer] DESC) AS TY_Rank FROM [dbo].[Region Sales] [Region Sales] Union SELECT [Region Sales].[Division] AS [Division], [Region Sales].[Region] AS [Region], 'D' as [Battery Type], [Region Sales].[Duracell] AS [TY Sales], DENSE_RANK() OVER (ORDER BY [Region Sales].[Duracell] DESC) AS Rank FROM [dbo].[Region Sales] [Region Sales]

  11. Derived tables in a query are very useful for transforming data as well as aggregating at different dimension levels • We’ll use a different version of our previous data set to create a derived table • The YOY Sales sheet in our data source shows, for each Division, Region, Battery Type, and Year, the Sales. • We want to have this year’s sales and last year’s sales as separate columns, though. • Unfortunately, we can’t use a derived table with an Excel data connection, so the example on the next slide is for code from a SQL Server. Derived Tables

  12. SELECT [TY].[Battery Type] AS [Battery Type], [TY].[Division] AS [Division], [TY].[Region] AS [Region], [TY].[Sales] AS [TY Sales], [LY].[LY Sales] AS [LY Sales] FROM [YOY Sales] [TY] Left Join (SELECT [YOY Sales].[Battery Type] AS [Battery Type], [YOY Sales].[Division] AS [Division], [YOY Sales].[Region] AS [Region], [YOY Sales].[Sales] AS [LY Sales] FROM [YOY Sales] where [YOY Sales].[Year] = 2012) as [LY] on [TY].[Battery Type] = [LY].[Battery Type] and [TY].[Division] = [LY].[Division] and [LY].[Region] = [LY].[Region] where [TY].[Year] = 2013 Derived Table Code

More Related