1 / 35

Big Data in Excel

Big Data in Excel. “Big Data is defined as anything that doesn’t fit in a spreadsheet.”. –Anon. Map-Reduce. Map : Run one calculation over an entire column Filter: Select which rows to keep Reduce: Combine an entire, filtered column into a result. Map. =LTV * 2. Map-Reduce.

mahdis
Download Presentation

Big Data in Excel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data in Excel

  2. “Big Data is defined as anything that doesn’t fit in a spreadsheet.” –Anon.

  3. Map-Reduce • Map: Run one calculation over an entire column • Filter: Select which rows to keep • Reduce: Combine an entire, filtered column into a result

  4. Map =LTV * 2

  5. Map-Reduce • Map: Run one calculation over an entire column • Filter: Select which rows to keep • Reduce: Combine an entire, filtered column into a result

  6. Filter =IF(LTV > 0, LTV, 0)

  7. Map-Reduce • Map: Run one calculation over an entire column • Filter: Select which rows to keep • Reduce: Combine an entire, filtered column into a result

  8. Reduce =AVERAGE(LTV)

  9. 100,000 rows • Data still fits in Excel, but Excel will start to slow down • Best practices: • Start with a sample • Operate on columns (Map, Filter) • Aggregate Results (Reduce) • Alternative: Pivot Tables

  10. Map, Filter

  11. Reduce

  12. 100,000 rows • Data still fits in Excel, but Excel will start to slow down • Best practices: • Start with a sample • Operate on columns (Map, Filter) • Aggregate Results (Reduce) • Alternative: Pivot Tables

  13. Pivot Table

  14. 10,000,000 rows • Data no longer fits in Excel (but it’s still not too big) • For viewing: store data outside of Excel • SQL database + PowerQuery • Filter data, run calculations inside spreadsheet • For modeling: Run calculations outside of Excel • Python

  15. 10,000,000 rows • Data no longer fits in Excel (but it’s still not too big) • For viewing: store data outside of Excel • SQL database + PowerQuery • Filter data, run calculations inside spreadsheet • For modeling: Run calculations outside of Excel • Python

  16. Power Query

  17. 10,000,000 rows • Data no longer fits in Excel (but it’s still not too big) • For viewing: store data outside of Excel • SQL database + PowerQuery • Filter data, run calculations inside spreadsheet • For modeling: Run calculations outside of Excel • Python

  18. Python

  19. 1,000,000,000 rows • Power Pivot • Can work with as much data as will fit in RAM(~1B rows) • Power BI • Visualize huge data sets

  20. 1,000,000,000 rows • Power Pivot • Can work with as much data as will fit in RAM(~1B rows) • Power BI • Visualize huge data sets

  21. Power Pivot

  22. 1,000,000,000 rows • Power Pivot • Can work with as much data as will fit in RAM(~1B rows) • Power BI • Visualize huge data sets

  23. Power BI

  24. Power BI

  25. 100,000,000,000 rows • For viewing: store data in Hadoop • Azure + Power Query • For modeling: Rook • Explore and Analyze Hadoop datadirectly from Excel

  26. 100,000,000,000 rows • For viewing: store data in Hadoop • Azure + Power Query • For modeling: Rook • Explore and Analyze Hadoop datadirectly from Excel

  27. Azure + PowerQuery

  28. Azure + PowerQuery

  29. Azure + PowerQuery

  30. 100,000,000,000 rows • For viewing: store data in Hadoop • Azure + Power Query • For modeling: Rook • Explore and Analyze Hadoop datadirectly from Excel

  31. Rook

  32. Rook

  33. Rook

  34. Rook • Explore your company’s entire datastore in Excel • We’ll support Hadoop, SQL, MongoDB, etc. • Good from one million rows to one trillion+ rows • Manipulate the entire data set with Excel formulas • Get results directly in your spreadsheet

  35. We’re looking for beta users. • Interested? Contact us at info@datanitro.com. • Thanks for listening!

More Related