1 / 16

Module 1: Quick Notes on SQL & Getting Started With This Course

Module 1: Quick Notes on SQL & Getting Started With This Course. 1.1 Our Data. Our Data. BigEcomm – a fictitious online retailer. As you’ll see in the browser walkthrough, we have 4 tables. These are all joined through customer_id. Our Tables. Customer

rchavez
Download Presentation

Module 1: Quick Notes on SQL & Getting Started With This Course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Module 1: Quick Notes on SQL & Getting Started With This Course

  2. 1.1 Our Data

  3. Our Data BigEcomm – a fictitious online retailer. As you’ll see in the browser walkthrough, we have 4 tables. These are all joined through customer_id.

  4. Our Tables Customer Contains many features about the customer. How much they paid in commission in August 2018, location, social media usage, ever visited the company blog, etc. Webscrapingdata Page load time, word count, form count, language on their website, has a shopping cart, etc. Salescall Dates that calls with sales were opened and closed, type of call resolution, and call type. Billedservices Currency of the billed service, how much was paid, status, etc. And much more….

  5. 1.2 How You’ll Be Accessing Data

  6. SQLite Browser The SQL browser with data already loaded can be found here: https://kristenkehrer.github.io/datamovesme-sqlcourse/ You’ll write all SQL queries for this course in this browser, and it’ll return your results. You can then use the results of those queries to answer the quiz questions throughout the course. Let’s do a walkthrough of the browser……..

  7. 1.3 What is Dirty Data?

  8. Type of Dirty Data Incomplete Data NULLs. Incorrect Data US States shouldn’t include anything other than the 50 states. Page load time shouldn’t include entries that are not numeric. Inconsistent Data If data is redundant (same information in multiple tables) it is possible that someone could have 2 different addresses in the same database. This is a typical use case in healthcare data. Inaccurate Data If you had rates > 1, you would know that the underlying calculation is causing incorrect data. Because rates are between [0,1]. Business Rule Violation Someone can’t covert before they’ve ever visited the website. If a purchase date was before a first website visit, this would be a violation of the business rules. Duplicate Data In our data, the customer table should be unique (one row per customer). We’ll see this isn’t the case. We’ll cover these scenarios in this course….

  9. 1.4 Relevant SQL Background

  10. What is SQL Structured Query Language – This is a language for relational databases. Data is persistent – data sits on a server that is local or remote. The data is in 2 dimensional tables (rows and columns) and they’re logically connected. NoSQL is not a relational database and is typically used for very large datasets sometimes stored in JSON format.

  11. More on SQL SQL is not case sensitive Queries end in semicolons There is much more to learn about SQL than is covered in this course, but we’re looking to make you an effective data scientist. The aim is not to become a DBA (database admin).

  12. Data Types Numeric (int, float) String (varchar) Dates (datetime)

  13. STAR Schema Can make queries for modeling purposes quite long. Picture taken from Wikipedia.org

  14. Database Permissions Typically as a data scientist we are not working with the production tables. You can create new tables, delete tables in a “sandbox”. Set your queries up to a “scheduler” to get your data regularly refreshed in the database or work with IT or BI to productionalize your table.

  15. SQL Servers MySQL Oracle DB2 Postgres Microsoft SQL Server SQLite – what we’re using in this course.

  16. This is the end of Module 1 Congrats! Now let’s have a quick quiz and then dive in to some SQL.

More Related