130 likes | 140 Views
Handling large datasets becomes a major problem in the Statistical Analysis because of the inferring invalid results. However, recently with the advent of computational strategies, researchers are involved in handling big data in easier way particularly using Hadoop and MapReduce techniques. There are lot of scope for the Data Scientist to handle big data through machine learning and deep learning techniques. Statswork offers statistical services as peru202ftheu202frequirementsu202fof the customers. When you Order statistical Services at Statswork, we promise you the following u2013 Always on Time, outstanding customer support, and High-quality Subject Matter Experts.u202fu202f<br><br>Contact Us:<br><br>Website: www.statswork.com<br><br>Email: info@statswork.com<br><br>UnitedKingdom: 44-1143520021<br><br>India: 91-4448137070t<br>tt<br>WhatsApp: 91-8754446690<br><br>
E N D
Researchpaper Analysis and Predictionof Income and Economic Hierarchy on CensusData using DataAnalytics TAGS- Machine-learning, Data Analytics, data analytics, Statistical Analysis, Statistical data analysis, Data Interpretations, Cluster Analysis, Statistical Analysis, DataAnalytics. SERVICES- Research Planning | Data Collection | Semantic Annotation | Business Analytics | BioStatistics |Econometrics Copyright © 2019 Statswrok. All rightsreserved
The greatest trouble in the machine- learning field is the availability of clean and high qualitydatasets. Demographic data constitutes a major role in the economic growth of thenation. It helps in finding the income growth of the people, how much are from urban and rural areas and how educated every person in thenation. INTRODUCTION Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics
The data analytics method to predict the income and economic hierarchy on the census data obtained from Kaggle Sharath et al (2016). . Example: DATA ANALYTICS IN PREDICTING THE INCOME AND ECONOMIC HIERARCHY ON CENSUS DATA ANALYTICS. The dataset involve 3.5 million U.S. households consists of their education, work, transportation they use, usage of internet,etc. . Before analysing the data, the main pre-requisite is that the data must be normalized for performing Statistical Analysis. . Hadoop is used as a first stage for the large dataset and PIG MapReduce is adopted for the normalisation of the dataset. . Later, the statistical analysis is performed and the results areinterpreted. Copyright © 2019 Statswrok. All rightsreserved MOCHARINTOCOACHING|2020 Business Analytics | BioStatistics |Econometrics
Copyright © 2019 Statswrok. All rightsreserved Aim of thestudy Gender distribution againstoccupation . Relationship between education andsalary . Economic hierarchy and prediction ofclasses . Plotting theoretical versus the actual valuesfor Benford’sLaw . Mean and Median of Income usingHeatmap . Business Analytics | BioStatistics |Econometrics
HADOOP PIG for MapReduce LOAD HugeDataset Fig.1: Step bystep procedure NORMALIZEDDATA Data Mining/ statisticalAnalysis Graphical Representation and Interpretation Final ProcessedData Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics
PersonalCare PersonalCare BuildingCleaning BuildingCleaning 75% 70% 68% 56% Farming,Fishing Food Farming,Fishing Food 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 (a) BeforeNormalization (b) AfterNormalization Fig.2: The importance of the normalization before proceeding with the Statistical dataanalysis Copyright © 2019 Statswrok. All rights reserved Business Analytics | BioStatistics |Econometrics
RESULTS &DISCUSSIONS . It is noted that the percentage of men in farming and fishing industry is found to beincreased . for 3.8% after normalization and the percentage of women in that field is also getsincreased. . This clearly satisfies the need and importance of normalization of the censusdata. . Normalization is used to reduce the execution time and improves the efficiency of theresults. Two normalization is done for this purpose but before that there exists few blank entries in the dataset and this can be handled to avoid invalidresults. - - First level, the actual data is used without anymodifications Second level of normalization, the actual data is inputted and then modified with a suitable mathematicalmethods. Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics
DataInterpretations Fig3: Depicts the first objective of thestudy i.e. obtaining the percentage of gender distribution againstoccupation. Contd... Business Analytics | BioStatistics |Econometrics
.Percentage of men in the dales field is more in number than compared to others. .Transportation field also contains higher percentage of men next to the Sales. .In addition, percentage of women is distributed almost equally in all the occupationalfields. .Further, in order to achieve the second objective, boxplot technique is used to identify the relationship between the education andsalary. .This helps in understanding the income growth under different levels of education. Contd... Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics
.Usually, the more educated person will get higher salary. However, fromthis graph, Professional degree holders are getting more salary than the doctorate degree holders which is quiteunusual. .Likewise, one can compare the median and quartiles of each field in theboxplot for better understanding of the level of education and annualsalary. Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics
.Cluster Analysismethods are the useful tool for analysing large dimensional dataset. K-MEANS CLUSTERING .However, K-means clustering is the most versatile technique for getting valid results. .In that sense, in order to achieve the economic hierarchy, k-means clustering technique is adopted for economic income variable, .Even though, the clustering technique is widely used in the literature, the problem of finding the number of clusters still persist. . and the distance between each data values and a set of clusters are measured using centroid clustering method and thenplotted the cluster against theclasses. .Furthermore, Benford’s law is discussed for plotting the actual versus the theoretical values. . Finally, the mean and median of the income across the states is depicted using heatmaps. In addition, the time complexity of the performance of the analysis gets decreased by including the level of normalization is also discussed and are tabulatedbelow. Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics
Summary Handling large datasets becomes a major problem in the StatisticalAnalysisbecause of the inferring invalid results. Recently with the advent of computational strategies, researchers are involved in handling big data in easier way particularly using Hadoop and MapReducetechniques. There are lot of scope for the Data Scientistto handle big data through machine learning and deeplearning techniques. Thedevelopmentofthenationisanalysedusingthecensusdatacollectedduringcertainperiods. This helps in understanding the growth of the population, wealth of the nation, and understanding the needs for the improvement for the welfare of the nation andpeople. Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics
Work WithUs ContactUs Freelancer Consultant Guest BlogEditor ( hr@workfoster.com) UK:+44-1143520021 INDIA:+91-4448137070 info@statswork.com Copyright © 2019 Statswrok. All rightsreserved Business Analytics | BioStatistics |Econometrics