Joe F. Hair, Jr. Founder & Senior Scholar, DBA Program

Joe F. Hair, Jr. Founder & Senior Scholar, DBA Program PLS-SEM: Introduction (Part 1)

SEM Model: Predicting the Birth Weight of Guinea Pigs X & Y = different outcomes B, C & D = common causes A & E = independent causes Sewall Wright, Correlation and Causation, Journal of Agricultural Research, Vol. XX, No. 7, 1921.

The greatest interest in any factor solution centers on the correlations between the original variables and the factors. The matrix of such test-factor correlations is called the factor structure, and it is the primary interpretative device in principal components analysis. In the factor structure the element rjk gives the correlation of the jth test with the kth factor. Assuming that the content of the observation variables is well known, the correlations in the kth column of the structure help in interpreting, and perhaps naming, the kth factor. Also, the coefficients in the jth row give the best view of the factor composition of the jth test. MV Analysis - The Past?! Another set of coefficients of interest in factor analysis is the weights that compound predicted observations z from factor scores f. These regression coefficients for the multiple regression of each element of the observation vector z on the factor f are called factor loadings and the matrix A that contains them as its rows is . . . . . Source: Cooley, William W., and Paul R. Lohnes, Multivariate Data Analysis, John Wiley & Sons, Inc., New York, 1971, page 106.

Structural Equations Modeling • What comes to mind? CB-SEM LISREL AMOS ? PLS-SEM

CB-SEM (Covariance-based SEM) – objective is to reproduce the theoretical covariance matrix, without focusing on explained variance. PLS-SEM (Partial Least Squares SEM) – objective is to maximize the explained variance of the endogenous latent constructs (dependent variables).

CB-SEM Model HBAT, MDA database

Covariance Matrix = HBAT 3-Construct model

CB-SEM – evaluation focuses on goodness of fit = minimization of the difference between the observed covariance matrix and the estimated covariance matrix. • Research objective: testing and confirmation where prior theory is strong. • Assumes normality of data distribution, homoscedasticity, large sample size, etc. • Only reliable and valid variance is useful for testing causal relationships. • A “full information approach” which means small changes in model specification can result in substantial changes in model fit.

PLS-SEM – objective is to maximize the explained variance of the endogenous latent constructs (dependent variables). • Research objective: theory development and prediction. • Normality of data distribution not assumed. • Can be used with fewer indicator variables (1 or 2) per construct. • Models can include a larger number of indicator variables (CB-SEM difficult with 50+ items). • Preferred alternative with formative constructs. • Assumes all measured variance (including error) is useful for explanation/prediction of causal relationships.

PLS Path Model • Latent Construct

Multivariate Methods

Should SEM Be Used? Considerations: The Variate Multivariate Measurement Measurement Scales Coding Data Distribution

Variate = a linear combination of several variables, often referred to as the fundamental building block of multivariate analysis. Variate value = x1w1 + x2w2 + . . . + xkwk Data Matrix

Multiple Regression Model Variate = x1 + x2 + xk + e

Multivariate Measurement Measurement = the process of assigning numbers to a variable/construct based on a set of rules that are used to assign the numbers to the variable in a way that accurately represents the variable. When variables are difficult to measure, one approach is to measure them indirectly with proxy variables. If the concept is restaurant satisfaction, for example, then the several proxy variables that could be used to measure this might be: The taste of the food was excellent. The speed of service met my expectations. The wait staff was very knowledgeable about the menu items. The background music in the restaurant was pleasant. The meal was a good value compared to the price. Multivariate measurement involves using several variables to indirectly measure a concept, as in the restaurant satisfaction example above. It also enables researchers to account for the error in data.

Data Characteristics – PLS-SEM

Model Characteristics – PLS-SEM

Algorithm Properties – PLS-SEM

Model Evaluation Issues – PLS-SEM

Rules of Thumb: PLS-SEM or CB-SEM? • Use PLS-SEM when: • The goal is predicting key target constructs or identifying key “driver” constructs. • Formative constructs are easy to use in the structural model. Note that formative measures can also be used with CB-SEM, but doing so requires construct specification modifications (e.g., the construct must include both formative and reflective indicators to meet identification requirements). • The structural model is complex (many constructs and many indicators). • The sample size is small and/or the data is not-normally distributed, or exhibits heteroskedasticity. • The plan is to use latent variable scores in subsequent analyses.

Rules of Thumb: PLS-SEM or CB-SEM • Use CB-SEM when: • The goal is theory testing, theory confirmation, or the comparison of alternative theories. • Error terms require additional specification, such as the covariation. • Structural model has non-recursive relationships. • Research requires a global goodness of fit criterion.

Systematic Process for applying PLS-SEM

Should You Use SEM? • Journal reviewers rate SEM papers more favorably on key manuscript attributes . . . • Mean Score • Attributes SEM No SEM p-value • Topic Relevance 4.2 3.8 .182 • Research Methods 3.5 2.7 .006 • Data Analysis 3.5 2.8 .025 • Conceptualization 3.1 2.5 .018 • Writing Quality 3.9 3.0 .006 • Contribution 3.1 2.8 .328 • Note: scores based on 5-point scale, with 5 = more favorable • Source: Babin, Hair & Boles, Publishing Research in Marketing Journals Using Structural Equation Modeling, Journal of Marketing Theory and Practice, Vol. 16, No. 4, 2008, pp. 281-288.

PLS-SEM Stages 1, 2 & 3: Design Issues • Scale Measures • Scale selection/design • Reflective vs. Formative • Common Methods Variance • Harmon Single Factor Test • Common Latent Factor • Marker Construct • Missing Data, outliers, etc.

Scale Design • Revise/Update • Established scales – how old? • Double barreled; negatively worded • Number of Scale Points • More scale points = greater variability • Single Item Scales

Single Item Scales ?

Indicator 1 Indicator 1 Indicator 2 Indicator 2 Indicator 3 Indicator 3 ? Construct Construct Reflective (Scale) Versus Formative (Index) Operationalization of Constructs A central research question in social science research, particularly marketing and MIS, focuses on the operationalization of complex constructs: Are indicators causing or being caused by the latent variable/construct measured by them? Changes in the latent variable directly cause changes in the assigned indicators Changes in one or more of the indicators causes changes in the latent variable

Example: Reflective vs. Formative World View Can’t walk a straight line Drunkenness Smells of alcohol Slurred speech

Example: Reflective vs. Formative World View Consumption of beer Drunkenness Consumption of wine Consumption of hard liquor

Basic Difference Between Reflective and Formative Measurement Approaches “Whereas reflective indicators are essentially interchangeable (and therefore the removal of an item does not change the essential nature of the underlying construct), with formative indicators ‘omitting an indicator is omitting a part of the construct’.” (DIAMANTOPOULOS/WINKLHOFER, 2001, p. 271) The formative measurement approach generally minimizesthe overlap between complementary indicators Construct domain Construct domain The reflective measurement approach focuses on maximizingthe overlap between interchangeable indicators

Exercise: Satisfaction in Hotels as Formative and Reflective Operationalized Construct The rooms‘ furnishings are good The hotel’s recreation offerings are good The rooms are clean Taking everything into account, I am satisfied with this hotel I appreciate this hotel Satisfaction with Hotels The hotel‘s personnel are friendly The hotel is low-priced The rooms are quiet I am looking forward to staying overnight in this hotel I am comfortable with this hotel The hotel’s service is good The hotel’s cuisine is good

Formative Constructs – Two Types Composite (formative) constructs – indicators completely determine the “latent” construct. They share similarities because they define a composite variable but may or may not have conceptual unity. In assessing validity, indicators are not interchangeable and should not be eliminated, because removing an indicator will likely change the nature of the latent construct. Causal constructs – indicators have conceptual unity in that all variables should correspond to the definition of the concept. In assessing validity some of the indicators may be interchangeable, and also can be eliminated. Bollen, K.A. (2011), Evaluating Effect, Composite, and Causal Indicators in Structural Equations Models, MIS Quarterly, Vol. 35, No. 2, pp. 359-372.

PLS-SEM Example

Types of Measurement Models PLS-SEM Example

Indicators for SEM Model Constructs

Data Matrix for Indicator Variables

Getting Started with the SmartPLS Software The next slide shows the graphical interface for the SmartPLS software, with the simple model already drawn. We describe in the following slides how to set up this model using the SmartPLS software program. Before you draw your model, you need to have data that serves as the basis for running the model. The data we will use to run our example PLS model can be downloaded either as comma separated values (.csv) or text (.txt) data files at the following URL: http://www.smartpls.de/cr/. When you get to the website scroll down to the Corporate Reputation Example where it says Click on the following links to download files. SmartPLS can use both data file formats (i.e., .csv or .txt). Follow the onscreen instructions to save one of these two files on your hard drive. Click on Save Target As… to save the data to a folder on your hard drive, and then Close. Now go to the folder where you previously downloaded and saved the SmartPLS software on your computer. Click on the file that runs SmartPLS ( ) and then on the Run tab to start the software. You are now ready to create a new SmartPLS project.

SmartPLS Graphical Interface

Example with Names and Data Assigned

Brief Instructions: Using SmartPLS • Load SmartPLS software – click on • Create your new project – assign name and data. • Double-click to get Menu Bar. • Draw model – see options below: • Insertion mode = • Selection mode = • Connection mode = • Save model. • Click on calculate icon and select PLS algorithm on the Pull-Down menu. Now accept the default options by clicking Finish.

To create a new project, click on → File → New → Create New Project. The screen below will appear. Type a name in the window. Click Next.

You now need to assign a data file to the project, in our case, data.csv (or whatever name you gave to the data you downloaded). To do so, click on the dots tab (…) at the right side of the window, find and highlight your data folder, and click Open to select your data. Once you have specified the data file, click on Finish.

SmartPLS Software Options Find your new project in window, expand list of projects to get project details (see below), click on the .splsm file for your project

Double click on your new model to get the menu bar to appear at the top of the screen. Selection mode Draw constructs Draw structural paths

Initial Structural Model – No Indicator Variables

Structural Model with Names and Paths

Name Constructs, Align Indicators, Etc. . . .  Start calculation Rename Construct Hide used indicators Show measurement model Change reflective to formative

How to Run SmartPLS Software

Default Settings for Example – Click Finish to run Trade-off in missing value treatment: Case wise replacement can greatly reduce the number of cases but sample mean imputation reduces variables’ variance.  Preferred approach to deal with missing data is combination of sub-group and nearest neighbor, or use EM imputation using SPSS. Always use path weighting scheme

Joe F. Hair, Jr. Founder & Senior Scholar, DBA Program