html5-img
1 / 19

Name: Gul rukh khan

Name: Gul rukh khan. Data Mining Data Analysis with SPSS To check data whether it is fit for Research or otherwise (or it is hooked data) Regression Analysis on Data. Hooked data: Manipulated Data. Data mining:. Huge amount of data and too little information

Download Presentation

Name: Gul rukh khan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Name: Gul rukh khan • Data Mining • Data Analysis with SPSS • To check data whether it is fit for Research or otherwise (or it is hooked data) • Regression Analysis on Data Hooked data: Manipulated Data

  2. Data mining: Huge amount of data and too little information There is a need to extract useful information from the data and to interpret the same. To discover Business Intelligence from Mountain of Accumulated Data. • DATA EXPLOSION • Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories • We are drowning in data, but trying and starving for knowledge! Data mining Definition Extracting or “mining” knowledge from large amounts of patterns/data Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data Issues in Data Mining: Huge Volume and complex data issue Data Ownership issue Privacy and Security issue Gul Rukh Khan

  3. SPSS : Statistical Package For The Social Sciences Types of data: Cross-sectional data refer to observations of many different individuals (subjects, objects) at a given time, each observation belonging to a different individual. 2 Sources: • Primary • Secondary • 3 types of Data • CROSS SECTIONAL DATA (Observation Data of many indiv:) • TIME SERIES DATA (Sequence of Data Points) • PANEL OR POOL DATA (CrossSection+TimeSeries Data) Database specifically designed for Time Series Data. A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. variables: • Qualitative OR Categorical OR Dummy Variable (e.g. Gender) • Quantitative OR Numeric Variable • Nominal (Names, Non-Ranking, just like Banks (HBL, ABL, UBL etc.) (whichever may come first) • Religion : Islam, Hindu, Christian etc. • Ordinal (which are in Order): ORDINAL • Often • Very often • Daily basis ORDINAL • Agree • Slightly Agree • Extremely Agree

  4. Basic analysis in SPSS: • If U have QualitativeORCategoricalVariable: • Then U should go for FREQUENCIES • If U have NumericORQuantitative Data • Then U should go for DESCRIPTION • Quantitative Data • Standard Deviation • Variance • Minimum Value • Max Value etc. • Frequencies: • Display the Data • One – Way – Data • Two-way or Cross Tab

  5. Tests required :to check your data for random and normality • Two types of Tests: • Parametric and Non-Parametric • Parametric: • Sometimes based on Population. • Unknown value which may be calculated from population e.g. population mean, std deviation etc. • Non – Parametric: • Unknown value which may be calculated from Qualitative Data e.g. sample mean, std deviation etc. • Assumption for Parametric Test: • Data should be Random. • Should follow Random distribution.

  6. Test for random data and normality • Check whether data is random or not? RUNS TEST

  7. Asymp. Sig. (2-tailed): 0.913 0.913 > 0.5 therefore, data is Random In this case Data is fit for Research Interpretation : IF Asymp. Sig. (2-tailed): 0.013 0.013 < 0.5 therefore, data is Not Random In this case Data is not fit for Research

  8. Test : to check whether the data is normal or not (two Russians analysts) Kolmogorov and Smirnov Test

  9. Interpretation of data normality This Test is self Explanatory Test Distribution is normal

  10. What is Regression? Introduction Regression based on Prediction that how one variable Regress other variable. It measures the relationship between Variables. • RegressionAnalysis is a very valuable tool for a manager • Regression can be used to • Understand the relationship between variables • Predict the value of one variable based on another variable • Simple linear regression models have only two variables • Multiple regression models have more variables

  11. Independent variable Independent variable Dependent variable = + What is Regression? Introduction • The variable to be predicted is called the Dependent Variable (also called Response Variable) • The value of this variable depends on the value of the Independent Variable (Explanatory or Predictor Variable)

  12. Example How to Draw Regression Line: -2.5 6.25 2.5 -0.42857 1 0.183672 -1 3.57143 -1.5 2.25 0 -0.25714 0.066121 3.74286 -0 -0 -0.5 3.91429 0.5 -0.08571 1 0.007346 -1 0.25 0.5 0.007348 1 0.25 2 0.08572 2 4.08572 1.5 1 0.25715 1 1.5 2.25 0.066126 4.25715 0.42858 1 -2.5 6.25 2.5 -1 0.183681 4.42858 8 Sum 3 Sum 0.514294 Sum 3.5 X 4.0 Y 17.5 Sum β1 = (X – X) (Y – Y) = 3 = 0.17143 (X – X)2 17.5 Regression Line βo = Y – β1X = 4 – 0.17143 x 3.5=3.4 Y-Intercept Y = βo + β1X = 3.4 + 0.17143 x 1=3.57143 βo+ β1X = 3.4 + 0.17143 x 6=4.42858 βo 3.4 0.0642857142857143 0.327014947

  13. Thank you very much sirand dear colleagues For taking your valuable Time To my little presentation

  14. SPSS stands for "Statistical Package for the Social Sciences SpSS Windows : Data Editor (by default) Output viewer Syntax window Script window • Qualitative OR Categorical Variables or : • e.g. Gender :Male and Female • Quantitative Variables: • e.g. current-Salary, beginning-Salary etc.

  15. If QuantitativeORnumeric Data If QualitativeORCategorical Data

  16. One way data Two way data (Cross tab)

  17. Data sorting: • Data  sort-case • Select Field • Then select Ascending or Descending • OK • (OUTPUT view will appear) Note: First Save your Original File, otherwise all data will be changed accordingly.

  18. Data transform: • Transfer  Compute Variable • Target Variable (Select name for New Variable) as • Target variable: Total Marks: Marks+20 SQRT(marks)

  19. Analysis Output

More Related