470 likes | 1.47k Views
This Edureka Python Pandas tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) will help you learn the basics of Pandas. It also includes a use-case, where we will analyse the data containing the percentage of unemployed youth for every country between 2010-2014. Below are the topics covered in this tutorial: <br><br>1. What is Data Analysis? <br>2. What is Pandas? <br>3. Pandas Operations <br>4. Use-case
E N D
Data Analysis With Python EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Agenda Python Applications Data Life-cycle Python For Data Analysis What is Pandas? – Numpy, Scipy Pandas Operations Python for Statistics Python for Hadoop EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Python Applications EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Python Applications Web Web Scraping Development Testing Data Analysis EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Data Life-Cycle EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Data Life-Cycle Data Data Data Warehousing Data Data Analysis Data Analysis Data Analysis Data Visualization Data EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
What is Data Analysis? EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
What is Data Analysis? Data of unemployed youth across the globe from 2010-2014 Percentage increase in unemployed youth in Afghanistan between 2010-2011 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
What is Pandas? EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Data Analysis Using Python Pandas is a software library written for the Python programming language for data manipulation and analysis. Pandas is well suited for many different kinds of data: Tabular data with heterogeneously-typed columns. Ordered and unordered time series data. Arbitrary matrix data with row and column labels Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure Numpy and Scipy and Matplotlib EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Pandas Operations Slicing the DataFrame Joining and Merging Changing the Index Concatenation Changing the column headers Data conversion EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Slicing EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Slicing Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Slicing Index 2001 2002 Int rate 2 3 US GDP Thousands 50 55 Slicing the starting 2 rows Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2003 2004 Int rate 2 2 US GDP Thousands 65 55 Slicing the last 2 rows EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Merging EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Merging Index 2001 2002 2003 2004 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2005 2006 2007 2008 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Merging Index 2001 2002 2003 2004 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index HPI Int rate US GDP Thousands x 50 55 65 55 US GDP Thousands y 50 55 65 55 Merging 0 1 2 3 80 85 88 85 2 3 2 2 Index 2005 2006 2007 2008 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Joining EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Joining Index Int rate US GDP Thousands 50 55 65 55 2001 2002 2003 2004 2 3 2 2 Index Low tier HPI 50 52 50 43 Unemployment 2001 2003 2004 2005 7 8 9 6 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Joining Index Int rate US GDP Thousands 50 55 65 55 2001 2002 2003 2004 2 3 2 2 Index Int rate US GDP Thousands 50.0 55.0 65.0 55.0 NaN Low tier HPI 50.0 NaN 52.0 50.0 53.0 Unemployment 2001 2002 2003 2004 2005 2.0 3.0 2.0 2.0 NaN 7.0 NaN 8.0 9.0 6.0 Joining Index Low tier HPI 50 52 50 43 Unemployment 2001 2003 2004 2005 7 8 9 6 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Changing the Index and Column Headers EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Changing the Index and Column Headers Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2001 2002 2003 2004 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Changing the Index and Column Headers Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2 3 2 2 US GDP Thousands 50 55 65 55 Changing the Index Index 2001 2002 2003 2004 US GDP Thousands 50 55 65 55 Index 2001 2002 2003 2004 GDP 50 55 65 55 Changing the column headers EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Concatenation EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Concatenation Concatenation Student Name: Age: Sex: Phone number: E-mail: Concatenate Student Name: Age: Sex: Phone number: E-mail Student Data EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Data Munging EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Data Munging EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Use-Case EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Example: Youth Unemployment Data EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Example: Youth Unemployment Data Problem Statement Find the change in percentage of unemployed youth for every country from 2010-2011 There is approx. 3.1% increase in unemployed youth in ‘Arab World’ EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Example: Youth Unemployment Data Column 1 – Country Name Column 2 – Country Code Column 3 – 2010 Column 4 – 2011 Column 5 – 2012 Column 6 – 2013 Column 7 – 2014 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Python For Statistics Mean Mode from statistics import mean from statistics import mode print(mean([1,1,1,1,3,4,4,4,5,2])) print(mode([1,1,1,1,3,4,4,4,5,2])) High Median Median Variance from statistics import median from statistics import mode Low Median print(median([1,1,1,1,3,4,4,4,5,2])) print(mode([1,1,1,1,3,4,4,4,5,2])) EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Python For Hadoop : Pydoop Pydoop is a Python interface to Hadoop that allows you to write MapReduce applications and interact with HDFS in pure Python. EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
Python For Hadoop : Pydoop What Is Pandas What Is Data Analysis Python Applications Python For Statistics And Python For Hadoop Pandas Operations Data Analysis Use-Case EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python
EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python