1 / 27

Data Mining&Business Planning of Engineering/Research Projects

Data Mining&Business Planning of Engineering/Research Projects. Practice 8 Dr. Gá bor Pauler , Associate Professor Faculty of Sciences, University of Pécs Tel:30/9015-488 E-mail: pauler@ t-online.hu. Content of the Practice. Checking Home Assignment 7: Calculating direct unit costs

naiya
Download Presentation

Data Mining&Business Planning of Engineering/Research Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining&Business Planning of Engineering/Research Projects Practice 8 Dr. Gábor Pauler, Associate Professor Faculty of Sciences, University of Pécs Tel:30/9015-488 E-mail:pauler@t-online.hu

  2. Content of the Practice • Checking Home Assignment 7: Calculating direct unit costs • Crosstabulation • Problems of segment profiling at nominal variables • Definition of Crosstabulation • Probability Distribution Map • Definition of Residuals • Residual Map • Measuring Strenght of Associative Relations • Evaluation of Crosstabs, Multi-Dimensional Crosstabs • Software application: PaulerSoft MultiCrosstab • Purpose, maximal capacity • Input Interface • Copying Database From SPSS • Copying Data Dictionary from SPSS • Dimension Settings • Filter Settings, Export • Output Interface • Refresh Pivot Diagram • Refresh Diagram Colors and –Sections • Observe Associtation Tests • Formatting Excel Diagrams • Chart Area, Basic Font, Plot Area • Legend, Axises, 3DView • Drawing into Excel Diagrams • Pasting Excel Diagrams into your presentation: HTML, Metafile, Bitmap • Home Assignment 8: Market Segmentation • References

  3. Problems of segment profiling at nominal variables • We already learnt that depending on Scale type(Skálatípus) of a variable, different type of operations are allowed on them (#:Count/compare vales, ↓↑:Sort values, ±:Add/subtract, /:Divide): • Nominal (Nominális):#,↓↑,± ,/ Eg. Names or ID numbers:not quantity! • Ordinal (Ordinális): #,↓↑,± ,/ Eg. Education levels: unequal stages • Interval (Intervallum):#,↓↑,± ,/ Eg. Time: equally paced but no 0 point • Ratio (Arányskála): #,↓↑,± ,/ Eg. Milk,pints it has absolute 0 point • When we calculate group averages of profiling variables to create segment profiles, very strictly this can be done only for ratio scaled-variables • But if we code ordinal variables in strictly ascending order, Eg.: 46. Residence: 1OVillage, 2OTownship, 3OTown, 4OCity it still does make sense to compute average of them to get some rough estimation in segment profile: (Eg. 3.2 means living mostly in town) • However, if a variable is clearly nominal, Eg.: 46. What is your occupation: (Please check one:) 1Student 2Help hand 3Unskilled worker 4Skilled worker 5Entrepeneur6Highly qualified job 7Creative, Artist, etc8Other we cannot set up any meaningful sorting among its values, so computing their average does not make sense at all (Eg. 3.2 here means nothing!) • Although, it can be very important for us, whether there is a connection between segment membersip variable values (Eg. 1:Dreamer, 2:Rational, 3:Tuning) and occupation, when we are Targeting (Céloz) our selected segment with customized 4P marketing mix. So we need alternative method in fine profiling which is capable to examine relationship of even nominal variables (segment membership is also a nominal variable!)

  4. Definition of Crosstabulation (Kontingencia Tábla) • Crosstabulation (Kontingencia Tábla): • Combines all values of 2 or more Discrete valued (Diszkrét értékű) (Eg.Gender:1:Male, 2:Female) or originally continous, but pre-Categorized (Kategorizált) (Eg:Icome catego-ry:1:Low <$24000, 2:Medium $24000-$36000, 3:High >$36000) variables • In Cartesian Product (Descartes-szorzat): combining all values with all values (Eg. Male&Low, Female&Low, Male&Medium…) forming a table/matrix with Rows(Sor), Columns(Oszlop) and Cells(Cella) • Containing Observed Joint Relative Frequencies (Együttes relatív bekövetkezési gyako-riság) of value-combinations (Eg. 11.7% Male+Medium from 100% of respondents) • This was a stupid boring statistical definition, but let me give you an example, which con-cerns everybody: Eg.: Whether men or Women have higher IQ in general? Averages of IQ are indecisive: both genders have the average IQ around 90-100 with insignificant difference • Relative frequency of Men49% and Women51% of population(men die earlier because of womens’ hysteria) • IQ is originally a continous variable with many possible values, but lets categorize it: • High(IQ>=110): 15% of population • Medium(110>IQ>=70): 65% of population • Low(70>IQ): 20% of population • Lets collect data from 10000 respondents and prepare crosstabulation of variables Gender and IQ Category: • Green cells are Observed Joint Relative Frequencies (Együttes relatív bekövetkezési gyakoriság) • Summing them up in rows/columns we get Marginal Relative Frequencies (Perem relatív gyakoriságok) • Summing marginal relative frequencies either row or column, we get Grand Total (Főtotál)=100%. • If we have lot of observations, all relative frequencies can be considered as Probabilities of Occourence (Előfordulási Valószínűség)

  5. Probability Distribution Map (Valószínűség Eloszlási Térkép) Frequency distribution map Female Relative frequency 40.00% 35.00% Gender 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% Male High Medium Low IQ 0.00%-5.00% 5.00%-10.00% 10.00%-15.00% 15.00%-20.00% 20.00%-25.00% 25.00%-30.00% 30.00%-35.00% 35.00%-40.00% • Probability Distribution Map (Valószínűség Eloszlási Térkép) is a crosstab plotted on a 3D area chart, where axises X and Y contain values of 2 discrete/category variables, and axis Z shows their joint probabilities with Gradient Colouring (Színfokozati skála): • We can see that Probability Distribution Map is still not very informative: at both genders we can see that there are more people with medium IQ, and there is only a very slight difference in the shape of distribution between men and women • The reason is that partial distributions of the 2 variables (Eg.: Gender: there are slightly more women, IQ category: there are much more medium ability people than extreme) highly influence the joint picture, covering the possible relation among variables • How their effect can be eliminated from the picture?

  6. Definition of Residuals (Reziduumok) – = × • If discrete variables A with values ak k=1..c and B with values bh h=1..l are Independent (Független) from each other, then probabilities of combinations of their values p(ai bh) areequal the multiplication of partial probabilities of respective values p(ai) , p(bh): p(ak bh) = p(ak) × p(bh), k=1..c, h=1..l (8.14) • Eg.: If Gender and IQ Category are independent variables, then probability of high IQ women should be equal multiplication of probability of being woman and probablity of being high IQ: 51%(Woman)×15%(High IQ) = 7.65%(Woman&High IQ) • Thus at all cells, we can compute Expected Probabilities of Independence (Függetlenség esetén érvényes várható valószínűség)p(ak) × p(bh) • If the variables are not totally independent, they will be different from Joint probabilities of occourence (Tényleges bekövetkezési valószínűség) p(ak bh).Their differences cell by cell are called Residuals(Reziduum): r(ak bh) = p(ak bh) - p(ak) × p(bh), k=1..c, h=1..l(8.15) • Positive residual means there are more observations in the cell than there should be in case of independence, so corresponding values forming the cell „attract” each other  variables are associated • Negative residual means there are less observations inthe cell than it should be, so values „distract” each other  variables are associated • Around 0 residuals mean that variables and values are independent from each other • Sum of all cells resiuduals are always 0, even at stron association (positive negative residuals always balance each other) • Eg. We know that if Gender and IQ Category are independent, there should be 7.65% of high IQ women, but in the reality there are only 5.52%, so the residual is 5.52% - 7.65% = -2.13%

  7. Residual Map (Reziduális Térkép) Residual map Female Residual 4.00% 3.00% Gender 2.00% 1.00% 0.00% -1.00% -2.00% -3.00% -4.00% Male High Medium Low IQ -4.00%--3.00% -3.00%--2.00% -2.00%--1.00% -1.00%-0.00% 0.00%-1.00% 1.00%-2.00% 2.00%-3.00% 3.00%-4.00% • Residual Map (Reziduális Térkép) is a residuals table plotted on a 3D area chart, where axises X and Y contain values of 2 discrete/category variables, and axis Z shows ±residuals with Gradient Colouring (Színfokozati skála): • Observing cells with high positive residuals we can see the tendency: relatively, there are more very clever and very dumb men, while there are relatively more women with medium IQ than it should be in case of independence, so variables Gender and IQ category are associated • These results can highly influence forming 4P marketing mix: • There is no sense to sell something for the „average man”: you can either sell protein-rich „food supplement” for „macho body builders” or laptop with tetra-core supercooled processor for „yuppies” • There is some sense to sell something for the „average woman”: celeb magazines are enthusiastically read by both cleaning maids and profes- soresses, this is something genetically encoded in all women… • But before investing lot of money in customized 4P, we should know how reliable the association is. How to test and quantify it?

  8. Measuring Strenght of Associative Relations 1 • Associative Relation (Asszociációs Kapcsolat) of 2 or more discrete/cetegorized variables is when certain combination of their values tend to appear frequently (Eg. alcoholism& stuttering hands). While other combinations almost never appear together (Eg. alcoholism& excellent performance) • Association does not examine which variable can be Reason (Ok) (Eg. alcoholism) or Consequence (Okozat) (Eg. stuttering hands), it is a symmetric measure based on the following terms: • Relative residuals (Relatív rezidumok): they describe how much % are the residuals of expected probabilities of independence. This expresses the relative size of the residual compared to the importance of the given cell: r(ak bh) / (p(ak) × p(bh)) = r’(ak bh), k=1..c, h=1..l(8.16) When we want to sum them for all cells, it is still a problem that ±relative residuals would largely balance each other, which is a problem bcause both signs show association. Therefore we take the square of residuals in (8.16) resulting: • Khi-Square (Khi-Négyzet) values: r(ak bh)2 / (p(ak) × p(bh)) × m = KhiSq(ak bh), k=1..c, h=1..l (8.17) where: m:number of observations • They are all positive, so they can be added up, • Bigger residuals will be even more expressed because of squaring. / = 2 ×100= /

  9. Measuring Strenght of Associative Relations 2 • Summing squares fits better treating independent residual values of cells, just like we sum variances instead of standard deviations: SumKhiSq = SkShKhiSq(ak bh)Eg.: = 227.27% (8.18) • Degrees of freedom (Szabadságfok): However, as sum of all residuals is by definition always 0, not all residual values of cells are independent from each other. For discrete variables, number of independent residuals are 1 less than number of values: Dfk = c-1, Dfh = l-1 Eg.: = 2-1, 3-1 (8.19) For the whole table: Df = Dfk × DfhEg.: = (2-1)×(3-1) = 2 (8.20) We also observe the minimal DF: MinDf = Min(Dfk,Dfh) Eg.: = 1 (8.21) • Khi-Square Test (Khi-Négyzet Próba): shows the p probability that there is an associative relationship between values of the discrete/categorized variables. Computing in Excel: p = 1 – Khi.Distribution(SumKhiSq, Df) Eg.: = 68% (8.22) • Cramer V value (Cramer V érték): shows strenght of the association in [0,1]interval (0:no assiciation, 1:perfect, 1:1relation between values of two variables): Cramer V = (SumKhiSq / m / MinDf)0.5Eg.: = 15%, weak (8.23) We divide by m and by MinDf (this is how many independent residual values of cells we can have at perfect association) to normalize results in [0,1]. Square root is necessary to counterbalance squaring made earlier • Average number of observation by cell(Cellánkénti átlagos megfigyelésszám): it should be at least 5 for valid results: 5 ≤ m / (c × l) Eg.: = 100/6 (8.24)

  10. Evaluation of Crosstabs, Multi-Dimensional Crosstabs •  Crosstabs can work with variables measured on nominal scale, and it is Distribution-Free (Eloszlásfüggetlen): it does not assume presence of any specific probalbility distribution. While most other statistical methods (Eg. Averages, ANOVA) tied to interval/ratio scale variables and Normal Distribution (Normális eloszlás) •  It can detect Non-monotonic (Nem monoton) relationships (Eg. kids have low income, adults have higher, and seniors have again lower income) •  It can use continous variables only if they are pre-categorized, but method of categorization influences results of association •  All computations of crosstab can be easily extended to more than 2 dicrete/categorized variables, and Multi-dimensional (Többdimenziós) crosstabs/residual maps can be very powerful data mining tools (Eg. Detec-ting the relationship among: • Sales-categories× • Sales gap-categories (how much we could still sell the guy)× • Profit-categories) •  But number of crosstab cells increases expo- nentially with number of observations: Eg.if you have 10 variables each with 5 categories, there are 5×5×5×5×5×5×5×5×5×5 = 510 = 9,765,625 cells, and we need 5 times more observations, which is unrealistic.So lack of observations usu- ally prevents multi-dimensional analysis, thus simple statitistical tools (Eg. SPSS) do not sup- port it – except our own software:

  11. Content of the Practice • Checking Home Assignment 7: Calculating direct unit costs • Crosstabulation • Problems of segment profiling at nominal variables • Definition of Crosstabulation • Probability Distribution Map • Definition of Residuals • Residual Map • Measuring Strenght of Associative Relations • Evaluation of Crosstabs, Multi-Dimensional Crosstabs • Software application: PaulerSoft MultiCrosstab • Purpose, maximal capacity • Input Interface • Copying Database From SPSS • Copying Data Dictionary from SPSS • Dimension Settings • Filter Settings, Export • Output Interface • Refresh Pivot Diagram • Refresh Diagram Colors and –Sections • Observe Associtation Tests • Formatting Excel Diagrams • Chart Area, Basic Font, Plot Area • Legend, Axises, 3DView • Drawing into Excel Diagrams • Pasting Excel Diagrams into your presentation: HTML, Metafile, Bitmap • Home Assignment 8: Market Segmentation • References

  12. PaulerSoft MultiCrosstab: Purpose, Maximal capacity • PaulerSoft MultiCrosstab is an MS Excel+MS Access-based crosstabulator and residual mapping utility for SPSS capable of treating max. 256 database variables, 24 labeled values per variable, 65530 observations, 3 observation filters, 1 observation weight variable, and max. 6 dimensions, which can be: • Discrete numeric or text variables • Continous numeric or date variables: they will be auto-categorzied by given number of categories • Sets of binary variables of multi-response questions • Installation: the system consists of 3 files should be placed in C:\MultiCrosstab directory on your PC, otherwise they will not work • MultiCrossTabInput.xls:Input interface • MultiCrossTabProcess.mdb:Calculations in database (you don’t have to open it) • MultiCrossTabOutput.xls:Output interface • Copying your database from SPSS into MultiCrossTabInput.xls: • Open your SPSS database created earlier (Eg. CarSculpturersData.sav) in SPSS with File|Open|Database and page it to Data view • Select all variables with Shift+Click their variable names header and copy them to clipboard with Ctrl+C (this can take 30-40 secs) • Open MultiCrossTabInput.xls in Excel with File|Open (Fájl|Megnyitás) and page it to Worksheet DataBase|AdatBazis, where click on top left green cell (you cannot copy it in purple cells of first row, they will be auto-filled with variable names later) • Copy data Edit|Paste special|Text (Szerkesztés|Irányított beillesztés|Szöveg)

  13. PaulerSoft MultiCrosstab: Copying Database from SPSS • SPSS uses "." as missing values, while Excel uses "" (empty cell), thus cannot understand them. Therefore, in Excel press Ctrl+H for Replace text (Keresés és csere) panel: • What to search (Keresett szöveg) = "." • Replace for (Csere erre) = "" • Press Special (Egyebek) button • Check Full cell only (Ha ez teljes cella) • Press Replace all(Összes cseréje)button • Shift to Worksheet DataDict|AdatSzotar.Copy name of variables from SPSS|Variable view|Name column with Ctrl+C and paste into Ran-ge VarName with Edit|Paste special|Text (Szerkesztés|Irányított beillesztés|Szöveg): • At variables of single response questions, variable names will not be transferred to MultiCrosstab, just value labels • At binary variable sets of multiple response questions, variable names will be used as value labels on crosstab • Copy label of variables from SPSS|Variable view|Label column with Ctrl+C and paste into Range VarDesc with Edit|Paste special|Text(Szerkesztés|Irányított beillesztés|Szöveg) • Copy labels of variable values from SPSS|Variable view|Values column with Ctrl+C and paste into Range VarValLab with Edit|Paste special|Text(Szerkesztés|Irányí-tott beillesztés| Szöveg): • At continous numeric, date or text variablesleave it empty • At binary variable sets you can either fill with 0:No, 1:Yes or leave empty

  14. PaulerSoft MultiCrosstab: Categorizing continous variables 1 • We already mentioned that crosstab can handle only discrete variables, so continous numeric or date variables should be Categorized (Kategorizál). At these variables (Eg. CarBasPr:Base price(without extras), HUF): • Do not write any value labels into Range VarValLab. Value labels are counted in Range VarValNum. If it is 0, the system recognizes that you consider the current variable continous • In Range VarCatNum it offers to form 3 categories by default, but you can manually override that with values 2..9: •  Higher number of categories can give more exact picture •  But – because of at least 5 observation in average cells requirement – it multiplies number of necessary observations • Categorized variable will be auto-value labeled asVarName:VarValndex/VarCatNum(Eg.CarBasPr:1/3 means 1stof 3 cats of variable CarBasPr) • How categories of continous variables are created? • Man-on-the-street would categorize by Equal Width Intervals(Egyenlő Szé-lességű Intervallumok):Eg. if a sample of students can reach 0..200 po-ints on exam one would categorize them 0..49,50..99,100..149,150..200

  15. PaulerSoft MultiCrosstab: Categorizing continous variables 2 • However this can be totally misleading if Probability Distribution (Valószínűsé-gi Eloszlás) of variable is very Uneven (Egyenletlen) or Multimodal (Multimo-dális):has multiple peaks. In some categories there can be only very few observations (Eg. 50..99 only 5) • Therefore, the system will use Equal Frequency Intervals (Azonos Gyakori-ságú Intervallumok), which may have very unequal width: sample is sorted in ascending order and cut up in equal sized groups by given number of catego-ries in Range VarCatNum. This method fits much better to crossta-bulation. In Excel, function =PercentileRank(ActualValue, AllValuesOfVariable) (=Százalékrang(AktuálisÉrték; VáltozóMindenÉrtéke)) gives the relative position of an actual value in the ascending order of all variable values of observations as a [0,1] number (Eg. if it is 0.45 then 45% of values are less or equal than current value. If we have 4 categories, the breaks are at 1/4 2/4 and 3/4, so 0.45 will fall in 2nd category)

  16. PaulerSoft MultiCrosstab: Dimension Settings • After completing data dictionary settings, page to Worksheet Dimensions|Dimenziok. This will control Dimensions(Dimenzió):which discrete/categorized variables appear on crosstab: • Select Discrete or Continous Numeric variable containing weight of observations at row Weight|Súlyin Range DimVar. If it is left empty, all observations will get weigth of 1 • At WgtProc, you can set how to process observation weights at multi-response questions: • Multiply: the more responses the respondent checked, the more important he/she becomes, as its weight is multiplied at all responses. (Eg. this fits to a situation where somebody can be a buyer of multiple product categories) • Pool: weight is distributed evenly among all responses, so their number will not influen-ce importance of respondent (Eg. If somebody checks topics of media interest) • Select variables of dimensionsat rows Dimension1..6: • If dimension is a variable of single response question: select it in Range DimVar: • It can be Text variable: value labels transferred to crosstab will be text values itself • It can be a Discrete Numeric variable: you should already give value labels for it at Worksheet DataDict|AdatSzotar • It can be a Continous Numeric or Date variable: it is categorized and auto-value-labeled by its settings of Range VarCatNum at Worksheet DataDict|AdatSzotar • If dimension is a set of binary variables of a multi-response question: • You can select first and last variable of a Consecutive(Egymás után következő)set of Numeric variables in Ranges DimFirstBin and DimLastBin • All variable values ≤ 0 are considered as Uncheck/No answers and not processed • All variable values >0 are treated as Check/Yes answer.Values differing from 1are considered as weight of the given response alternative, and will modify weighting: Total weight = Observation weight × Response alternative weight (8.16) • Response alternatives are auto-labeled by their binary variable name Click Click Click Click Click Click

  17. PaulerSoft MultiCrosstab: Filter Settings, Export • Filter settings: If no filter is given, all observations are selected automatically. If we would like to run the analysis on a subset of observations: • We can select max. 3 filter variables of any type (Numeric/Date/Text) in Range FiltVar, (Eg. Age category) connected with AND/OR logical operators selected in Range FiltOper • We can select type of filters (=, <>, ≥, ≤, >,<, between, outside) in Range FiltRelat • =, <>, ≥, ≤, >,< require one filter value can be given in Range FiltVal1. Its type should match to the given Numeric/Date/Text filter variable, otherwise we will get error message. If filter variable is a discrete variable with given value labels, the system tries to indentify value label of the filter value (Eg. 2:Age19-21) in Range FiltVal1Lab • between, outside require two filter values can be given in Ranges FiltVal1 and FiltVal2. They should form valid Numeric/Date/Alphabetic text range (the second should be bigger than first) otherwise we will get error message. • Running the export: Press Export button. • The system computes the export data necessary for the output interface (it can take couple of minutes) and saves MultiCrossTabInput.xls. • Then, please close it with File|Close (Fájl|Bezárás) menu, otherwise output interface will not work! Click Click Click Click

  18. PaulerSoft MultiCrosstab: Output Interface: Refresh Pivot Table • MultiCrossTabOutput.xls is opened in Excel with File| Open(Fájl|Megnyitás)menu • Page to Worksheet Cross-Tab|KontTab • With View|Tools|Pivottable (Nézet|Eszközök|Kimuta-tás)menu,open PivotTools • Click anywhere on diagram and Press ( ) button, wait until data are read • Click dimension values drop downs ( ) and uncheck old values from earlier runs (it tends to remember them) • Pull dimension boxes ( ) to X or Y axis to get reques-ted chart layout (Eg.Gender, Segment on X, City on Y) • Press( )to update co-lor grades of residuals (this should be repeated after any change, because it forgets) • See crostabulation tests: • Average observations/cell shown in Range ObsPerCell should be ≥ 5, otherwise: • More observation needed • Select less dimensions • Or select dimensions with less discrete values • Or select multi response dimension with more checked alternatives Click Pull Click Pull Click Click Click Click • See number of observations in ObsNum • See number of cells in Range CellNum • See probability of existing association relationship in Range KhiSqProb (Eg.: 99.72%). See strenght of association in Range CramerV (Eg.: 41.12% medium) • Pullgray rectangles( ) to cover meaningless parts of the diagram: it is a mosaic made of many partial residual maps, but Excel cannot delimit them and draws as single continous diagram (Eg. on axis X: 1st is Dreamer&Female, 2nd is Dre-amer&Male, 3rd is Rational&Femaledimensional break!) Click Click Click Click Click Click Click

  19. PaulerSoft MultiCrosstab: Output Interface: Residual Map Tuning City Rational Town Township Dreamer Female Village Male • Why is it important to correctly delimit partial diagrams? • Because they take place on different „layers” above each other, as they belong to different values of Dimension3, 4, 5, 6 we cannot directly plot • Technically it would be possible to make multi layered 3D area chart from them, but for most people it is getting un-understandable mess, even using transparent coloring • So, the best you can do as a Data Miner, training your brain „assembling” even 5-6 dimensional pictures from partial diagrams by head. The more dimensions you can handle, the more deep insight you can get working of the market (Eg. on residual map you can see that your target segment – Tuning Guys – are not really from villages as you thought from simple segment profiling. They are quite well associated with townships, while Dreamers are with medium towns. Now you can set up your 4P accordingly) • If you need more rectangles for delimiting partial diagrams you can copy them with Ctrl+C, Ctrl+V

  20. Content of the Practice • Checking Home Assignment 7: Calculating direct unit costs • Crosstabulation • Problems of segment profiling at nominal variables • Definition of Crosstabulation • Probability Distribution Map • Definition of Residuals • Residual Map • Measuring Strenght of Associative Relations • Evaluation of Crosstabs, Multi-Dimensional Crosstabs • Software application: PaulerSoft MultiCrosstab • Purpose, maximal capacity • Input Interface • Copying Database From SPSS • Copying Data Dictionary from SPSS • Dimension Settings • Filter Settings, Export • Output Interface • Refresh Pivot Diagram • Refresh Diagram Colors and –Sections • Observe Associtation Tests • Formatting Excel Diagrams • Chart Area, Basic Font, Plot Area • Legend, Axises, 3DView • Drawing into Excel Diagrams • Pasting Excel Diagrams into your presentation: HTML, Metafile, Bitmap • Home Assignment 8: Market Segmentation • References

  21. Formatting Excel Diagrams: Chart Area, Basic Font, Plot Area • Handling charts: As default chart format of Excel is insane, therefore we have to format charts correctly in Excel before copying them into presentation as picture metafile: • Format Chart area with doubleclicking on it. This is the basic container of the chart influencing its general formatting: • Background color should be set white, or transparent, which allows slide background to show up – it is more fancy • Set basic font type of chart: Arial Narrow, 16p, Bold, Indigo color • Inactivate auto resize font: otherwise, if you try to enlarge chart to make it more visible, Excel will enlarge font size automatically even more, squeeezing plot and making it just smaller and smaller… • Format Plot area with doubleclicking on it. This is an INVISIBLE frame embedded in chart area containing the plot. Excel by de-fault leaves huge margin between chart- and plot areas squeezing plot too little • You should set it maximal size and transparent background. Its background can be set to any picture also, which is useful if you want to make a map background for an XY scatterplot • Activating chart legenda will automatically squeeze plot area and plot. So afterwards you should stretch plot area again (even over legenda!!!) Click Click Click Click

  22. Formatting Excel Diagrams: Legend, Axises, 3DView Click Click • Format Legenda (Jelmagyarázat) with double-clicking on it. Always turn on legenda first, even if you do not want to see it permanently, because this is the only place you can set gradient color ramp scheme for 3D area charts (eg. blues greens reds yellows white). By default, Excel uses random color ramps, which has less plastic 3D effect. After formatting colors you can hide legenda with Chart menu|Settings|Legenda|Don’t show • Format Chart floor area with doubleclicking on it, set to transparent • Format Gridlines with doubleclicking on them, set for all axis, 1pic wide, Light grey, solid • Format Axises with doubleclicking on them: • On Value axis: set axis Scale|Min, Max values (by default they may be too wide and chart is squeezed by this).Set Scale|Large unit around 10% of Max-Min difference to get about 10 color ramps on 3D area charts. Inactivate show symbols between tickmarks (Szimbólumok az osztásjelek közt) • On Category axis: reverse order of categories (Kategóriák fordított sorrendben) to comply with their order in data sorce table • Format 3D settings at Chart menu|3D settings(Diagramm menü|3D beállítások): • Rotate 3D charts by pulling corners of their box, set perspective (Távlat) to 10% • Adjust relative chart height (Magasság) to 50-75%. Making chart relatively taller will show differences of plotted values better, but columns or hills will cover each other more intensively. Reducing height will ease covering problem but makes hard to see differences Click Click Click Click Click Click Click Click Pull

  23. Formatting Excel Diagrams: Drawing into Excel Diagrams Click • Re-formatting charts at presentation: There are some features of charts we cannot set in Excel, but it is possible to do in Metafile:Eg. at complex 3D area charts,it would be great to create semi-transparent function surfaces partially covering eachother, but it cannot be done in Excel 2003. • How to do: Copy 3D area chart trough clipboard as metafile • Convert metafile into PPT drawing with View|Toolbars|Drawing|Drawing me- nu|Ungroup(Nézet|Eszköztárak|Raj-zoló |Rajzoló menü|Csoportbontás), repeat it as long as it can be done • Delete unnecessary chart background, axis, axis text, etc. elements • Select all remaining elements, format them Doubleclicking on selection, set their color, border, and transparency • Group elements together again • But a difficult drawing containing 1000s of elements can eat up lot of resources and freeze presentation • Therefore, cut metafile to clipboard with Edit|Cut (Szerkesztés|Kivágás) • Paste as GIF picture with Edit|Paste special|GIF (Szerkesztés|Irányított beillesztés|GIF). It keeps transparency, and reduces resource consumption, but it can be edited only as image anymore Click Click Click Click Pull Click Click Click Click Click Click Szerkesztés Click Kivágás Click Szerkesztés Click Click Click

  24. Pasting Excel Diagrams into your presentation: HTML Click • Handling clipboard: When you pas-te copied objects from clipboard, you can change their format with Edit|Paste special (Szerkesz-tés|Irányított beillesztés) menu. When you insert Excel charts or tables into your pre-sentation, it is extremly useful: NEVER paste Excel charts and tables into a presentation in their original format: the whole - sometimes pretty large - source Excel file is embedded invisibly into the presentation, as many times as you pasted any part of it. This will result in freezing huge presentation files. • Instead of that Excel tables should be pasted as HTML format: •  This preserves table-like func-tionality, you can add/delete rows/columns, merge cells, format cell background/border color/font etc. with PowerPoint’s View|Toolbars|Table (Nézet| Eszköztárak|Táblázat) tools •  However, it does not preserve row/column width, font sizes set in Excel. It applies cell margins unnecessary. You have to set them again with Doubleclick on table frame Textbox(Szöveg-doboz) panel|Margins(Margók) Másolás Click Click Shift +Pull Click Click Click Click Click Click Click

  25. Pasting Excel Diagrams into your presentation: Metafile, Bitmap •  Another silly feature that pasting HTML will automatically realign textboxes and title on the slide, so always past all HTML tables first then align textboxes on slide • Picture metafile format, you can use this if there are problems with HTML table: •  95% of formattings of Excel table are preserved •  You cannot edit it as a table anymore, however metafiles can be converted into PPT drawing objects with View|Toolbars|Drawing| Dra-wing menu|Ungroup (Nézet|Eszköz-tárak|Rajzoló|Rajzoló menü|Cso-portbontás), and you can do limited editing of celltext,font types,colors,etc. • Bitmap format, if nothing else works: •  This shows table as it appeared on screen in Excel •  You can edit it very limited with any graphic software •  Converting to bitmap eats more space than metafiles containing only 40-50 graphic objects, however can save space if metafile contains several thousand of objects (eg. complex, large tables) Excel charts, or selected parts of an Excel worksheet containing both cells and charts should be pasted first in Picture metafile, or as Bitmap in case of format problem Click Click Click

  26. Home Assignment 8: Market Segmentation • Set up 3 to 9 market segments from your survey using Ward Hierarchic Clustering in SPSS on the following grouping variables (0.5point): • I: What Product features(Jellemző) are important. Alternatively, if it does not work very well, you can use: • S: Their Socio-demographic (Szocio-demográfiai) features: age, gender, education, occupation, living area, income • Create segment profiling with PaulerSoft MarketSegmentator using the grouping- and further profiling variables below: • P: How well our product performs in important features • C: How well competing products perform in important features Add the radar plot, pie chart, and series of bubble charts combining triples most important group differentiating variables to your report! Name segments, and explain each naming in one short paragraph! Prepare their socio-demo photos! (0.5point) • Select tartgeted market segment(s) and give a reasoning in your report why did you select that (0.5point) • Prepare fine profiling of your tartget segment(s) with the help of 2 or 3 dimensional crostabulation of your Segment membership variable (Eg. 1:Dreamer, 2:Rational, 3:Tuning) and discrete or categorized continous variables from questions group: • B: Where, when customers usually buy the product category and how much they usually spend? • N: How customers will buy your planned product: • NF: From which source they would finance buying the product, in which proportion? • M: On which medium customers can be reached, at which topic, at which time? • S: What is customers socio-demographic background (Eg. Housing, Industry, Occupation)? of your survey with the help of PaulerSoft MultiCrosstab (0.5point) • Copy residual maps, Khi-square tests and Cramer V values into your report (0.5point) • Add one short paragraph of explanantion of resullts can be seen on maps (0.5point)

  27. References • Crosstabulation: • http://www.statsoft.com/textbook/basic-statistics/ • http://www.custominsight.com/articles/crosstab-sample.asp • http://www.csse.monash.edu.au/~smarkham/resources/crosstab.htm • Khi-square test: • http://www2.lv.psu.edu/jxm57/irp/chisquar.html • http://en.wikipedia.org/wiki/Pearson's_chi-square_test • http://www.graphpad.com/quickcalcs/chisquared1.cfm • Cramer V Association Measure: • http://changingminds.org/explanations/research/analysis/cramers_v.htm • http://faculty.vassar.edu/lowry/newcs.html • http://www.people.vcu.edu/~pdattalo/702SuppRead/MeasAssoc/NominalAssoc.html • Excel Charts manual: • http://peltiertech.com/Excel/Charts/format.html • http://www.mrexcel.com/archive/Chart/

More Related