Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Training | Edureka

Data Analysis With Python EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Agenda Python Applications Data Life-cycle Python For Data Analysis What is Pandas? – Numpy, Scipy Pandas Operations Python for Statistics Python for Hadoop EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Python Applications EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Python Applications Web Web Scraping Development Testing Data Analysis EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Data Life-Cycle EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Data Life-Cycle Data Data Data Warehousing Data Data Analysis Data Analysis Data Analysis Data Visualization Data EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

What is Data Analysis? EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

What is Data Analysis? Data of unemployed youth across the globe from 2010-2014 Percentage increase in unemployed youth in Afghanistan between 2010-2011 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

What is Pandas? EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Data Analysis Using Python Pandas is a software library written for the Python programming language for data manipulation and analysis. Pandas is well suited for many different kinds of data:  Tabular data with heterogeneously-typed columns.  Ordered and unordered time series data.  Arbitrary matrix data with row and column labels  Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure Numpy and Scipy and Matplotlib EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Pandas Operations Slicing the DataFrame Joining and Merging Changing the Index Concatenation Changing the column headers Data conversion EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Slicing EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Slicing Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Slicing Index 2001 2002 Int rate 2 3 US GDP Thousands 50 55 Slicing the starting 2 rows Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2003 2004 Int rate 2 2 US GDP Thousands 65 55 Slicing the last 2 rows EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Merging EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Merging Index 2001 2002 2003 2004 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2005 2006 2007 2008 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Merging Index 2001 2002 2003 2004 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index HPI Int rate US GDP Thousands x 50 55 65 55 US GDP Thousands y 50 55 65 55 Merging 0 1 2 3 80 85 88 85 2 3 2 2 Index 2005 2006 2007 2008 HPI 80 85 88 85 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Joining EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Joining Index Int rate US GDP Thousands 50 55 65 55 2001 2002 2003 2004 2 3 2 2 Index Low tier HPI 50 52 50 43 Unemployment 2001 2003 2004 2005 7 8 9 6 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Joining Index Int rate US GDP Thousands 50 55 65 55 2001 2002 2003 2004 2 3 2 2 Index Int rate US GDP Thousands 50.0 55.0 65.0 55.0 NaN Low tier HPI 50.0 NaN 52.0 50.0 53.0 Unemployment 2001 2002 2003 2004 2005 2.0 3.0 2.0 2.0 NaN 7.0 NaN 8.0 9.0 6.0 Joining Index Low tier HPI 50 52 50 43 Unemployment 2001 2003 2004 2005 7 8 9 6 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Changing the Index and Column Headers EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Changing the Index and Column Headers Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2001 2002 2003 2004 US GDP Thousands 50 55 65 55 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Changing the Index and Column Headers Index 2001 2002 2003 2004 Int rate 2 3 2 2 US GDP Thousands 50 55 65 55 Index 2 3 2 2 US GDP Thousands 50 55 65 55 Changing the Index Index 2001 2002 2003 2004 US GDP Thousands 50 55 65 55 Index 2001 2002 2003 2004 GDP 50 55 65 55 Changing the column headers EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Concatenation EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Concatenation Concatenation Student Name: Age: Sex: Phone number: E-mail: Concatenate Student Name: Age: Sex: Phone number: E-mail Student Data EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Data Munging EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Use-Case EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Example: Youth Unemployment Data EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Example: Youth Unemployment Data Problem Statement Find the change in percentage of unemployed youth for every country from 2010-2011 There is approx. 3.1% increase in unemployed youth in ‘Arab World’ EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Example: Youth Unemployment Data Column 1 – Country Name Column 2 – Country Code Column 3 – 2010 Column 4 – 2011 Column 5 – 2012 Column 6 – 2013 Column 7 – 2014 EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Python For Statistics Mean Mode from statistics import mean from statistics import mode print(mean([1,1,1,1,3,4,4,4,5,2])) print(mode([1,1,1,1,3,4,4,4,5,2])) High Median Median Variance from statistics import median from statistics import mode Low Median print(median([1,1,1,1,3,4,4,4,5,2])) print(mode([1,1,1,1,3,4,4,4,5,2])) EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Python For Hadoop : Pydoop Pydoop is a Python interface to Hadoop that allows you to write MapReduce applications and interact with HDFS in pure Python. EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Python For Hadoop : Pydoop What Is Pandas What Is Data Analysis Python Applications Python For Statistics And Python For Hadoop Pandas Operations Data Analysis Use-Case EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

EDUREKA PYTHON CERTIFICATION TRAINING www.edureka.co/python

Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Training | Edureka

Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Training | Edureka

Presentation Transcript

Python Tutorial | What is Python? | Python Programming