AN EMPIRICAL EXAMINATION OF THE IMPACT OF DATA MODEL DENORMALIZATION ON THE

AN EMPIRICAL EXAMINATION OF THE IMPACT OF • DATA MODEL DENORMALIZATION ON THE • MODEL VALIDATION RESULTS • A Dissertation Submitted to the • Faculty • in partial fulfillment of • the requirements for the • Doctoral degree in • Information Systems

The Problem • According to Shanks and Darke (1999), the first audience of a data model is the user community that has to validate and explain why the entities of a data model are interacting in the way depicted in the model. Another audience is developers of systems who will use the data models to create detailed systems based on the models that have been validated by the users. • In general, as the model becomes more useful to the system designers, it also becomes more complex. This leads to more areas for the business users to validate, while the increased complexity makes it harder for the users to understand and correctly identify errors and design implications of the model. Model validation is therefore a precarious effort, where the readability of the model must be weighted against the benefits to the system designers (Shanks & Darke, 1999).

Purpose of the Study • This study was designed to empirically test the concepts of abstraction as proposed by Chan, Wei and Siau (1993) to see if the denormalization of the data, as done in a dimensional data model, actually increases the business validators' ability to correctly identify modeling error, omissions and inconsistencies. • The purpose of the study was to add significant scientific base research to the field of data modeling by testing the underlying assumption of reduced semantic technical value that is assumed to be traded against readability, when increasing the abstraction level through denormalization.

Research Questions • While this study will created a substantial amount of base research material that can be leveraged in future studies on this subject, this study focused on three core questions: • Does denormalization of logical data models benefit the non-technical validator of a data models when validating the attribute completeness. • Are relationship errors, as expressed in optionality and cardinality easier to detect in models that are denormalized as compared with normalized logical models. • Are attribute duplication errors better detected in a denormalized data model.

Importance of the Study • A substantial amount of research exists around the purposes, standards, semantic layer and usability of data models. However, there appears to be a significant gap in the availability of empirical evidence of base research around which modeling constructs are better to use when communicating with people who have limited exposure to the technical development effort. In fact, as recent publications have demonstrated, there are very conflicting views on the core value of data models to the audience. Each of the current modeling techniques are focusing on one part of the audience without any empirical support to substantiate their claims of superiority. • This study was centered on empirically exploring the question of, “can the validators leverage their business knowledge in the validation effort and thereby correctly identify modeling constructs without having to see them expressed in a notational form?” as suggested by the concept of cognitive inference found in the psychology literature.

Important Assumptions • One core assumption of this research was the premise in the value of having the business community involved in the system development, past the requirements gathering effort. This was based on the notion that better involvement leads to better systems. However, some authors have argued that most research in the user involvement field has been seriously flawed. An extensive review of research by Ives and Olson (1984) shows little support of user involvement as required for successful information system implementations. Ives and Olson argue that the economic benefits of user involvement is hard to determine since intangible costs and benefits are hard to define and measure in economic equivalents, as well as the fact that some decision support systems are used for unstructured environments and therefore the associated value is impossible to determine. • However, this contradicts the view that direct influence over the process leads to better results as proposed by Robey & Farrow (1982) and later the suggestion that increased responsibility of the participants increases the quality of the systems (Barki and Hartwick, 1994). These authors seem to provide a support for the notion that the business involvement in the review phase is influential to the success of a system development effort. However, it is weakly supported in the empirical and academic literature. • As an extension to these observations, other research argues that the role may be a stronger indicator of the benefits than the time spent with the users. This research suggests that participation and involvement is two different areas and that involvement drives different behavior (Barki & Hartwick, 1994). The research therefore suggests that the key to involvement in the validation process is responsibility given to a participant.

Research Design • The study was conducted in the form of a questionnaire with multiple choices. The subjects were provided a printed paper copy of either a dimensional, or a relational model of the sales business event, and were asked to validate the model by responding to a multiple choice questionnaire. • The questionnaire was identical regardless of the model presented to them and focused on correctly identifying attribute omissions, errors and duplications as proposed in the research questions. The models used the same notations and were made using the same tool, but they followed the standards of their respective modeling technique, and therefore contained a different number of entities and relationships. • Both models were otherwise equal in terms of numbers of omissions, errors and duplications, as well as in print format and physical appearance.

Research Design- Selection of subjects. • The subjects for this study were two hundred bachelor level students engaged in non information technology studies. The student population of over 19,000 represents the user community that will soon be in the corporate and private business world, and may be engaged in the validation of the data models prepared by the information system professionals. The selection was based on a convenience sample from voluntarily participants at the University. • The subjects were randomly assigned to two equally sized groups that were given the same verbal orientation at the same time from a written script that was also being provided to them in a printed format. To assure equal treatment of both groups and the internal validity to the experiment no questions were allowed, nor was any dialogue conducted with the subjects beyond the standard instructions.

Research Design- Selection of subjects. • The research groups consisted of 206 individuals. The participants were randomly assigned into one group to validate a dimensional data model (DDM group) or a relational model (REL group). Of the participants, 102 were assigned to the DDM group and 104 to the REL group. To assure that the participants did not have any advantages of prior experience, the first two questions of the questionnaire were used to eliminate participants from that sample who were not members of the population. The population is defined as people who have little or no knowledge of modeling semantics. • In the 102 member DDM group, 4 members answered that they had reviewed data models in the past and were very familiar with data modeling notations, 4 more participants in this group answered that they had not reviewed models in the past, but that they were very familiar with the data modeling notations. In addition, 5 members stated that they had reviewed models in the past, but were not very familiar with the data modeling notations. All 13 of these members were considered to not be part of the population and were consequently removed from the analysis of the results, leaving a sample size of 89.

Research Design- Selection of subjects. • In the 104 member REL group, 3 members answered that they had reviewed data models in the past and were very familiar with data modeling notations, 6 more participants in this group answered that they had not reviewed models in the past, but that they were very familiar with the data modeling notations. In addition, 10 members stated that they had reviewed models in the past, but were not very familiar with the data modeling notations. All 19 of these members were considered to not be part of the population and were consequently removed from the analysis of the results, leaving a sample size of 85. • The total rejections were 32 participants from total sample size of 206, or a total of 15.5%, whereas 12.7% of the DDM group and 18.3% of the REL group was rejected.

Research Design - Instrumentation. • The primary instrumentation consisted of two data models that correctly depict the sales event across the dimensions, or entities as expressed by the notational techniques of Chen (1977) for relational modeling and Kimball (1996) for dimensional modeling. Each of the models used in this study has been manipulated to address the research questions of correct identification of attribute omissions, duplications and relationships. The core instrumentation questions may be summarized as :

Research Design - Instrumentation. • The dimensional model with identified errors:

Research Design - Instrumentation. • The relational model with identified errors:

Research Design - Instrumentation. • The Survey questionnaire with explanations:

Research Design - Instrumentation. • The instruction sheet given to all participants:

Data Collection Procedure

Data Processing Parametric statistics such as the z-test, chi-square test and F statistics are based on some fundamental assumptions. This includes a random sample, independent observations, a normal distribution (or very close to normal), variance of the means must be homogeneous and the data must be interval or ratio scales (Slavin, 1992). Since the results in this survey are based on a true/false questionnaire and the scores are expected to have a bi-nominal distribution, the use of parametric statistics is not appropriate. Therefore the non-parametric statistical computation of chi-square was used to examine if there were significant differences in the responses of the DDM and the REL groups. The chi-square statistics were calculated as: Whereas degrees of freedom were calculated as: d.f. = (# cols - 1) * (# rows - 1)

Data Processing There is a significant school of thought around the reliability measure of the instrument. Kuder and Richardson devised a procedure for estimating the reliability of a test in 1937 (Mervis & Spagnolo, 1995). It has become one of the standards for estimating reliability for single administration of a single form such as this study. Kuder-Richardson measures inter-item consistency. It is very similar to completing a split-half reliability (Guilford & Fruchter, 1978) on all of the combinations of items resulting from each splitting of the test. Since the instrument presented has only one correct answer and only one incorrect answer, the KR20 number provided by Kuder and Richardson is algebraically equivalent to Cronbach's alpha and is a special case of Cronbach’s alpha (Cronbach, 1970). The rationale for the calculation of KR20 is first securing the mean inter-correlation of the number of items (k) in the test, and then considering this to be the reliability coefficient for the typical item in the test (Mervis & Spagnolo, 1995). The formula for KR20 used was:

Data Processing The KR20 requires substantially more computations than the later KR21 formula. The KR21 requires only the test mean (M), the variance (s 2) and the number of items on the test (k). It assumes that all items are of approximately equal difficulty. (N=number of students) i.e. M - the assessment mean, k - the number of items in the assessment, s2 - variance, gives: While the KR21 formula simplifies the calculations, it will usually yield, a lower estimate of reliability (Mervis & Spagnolo, 1995). The differences between KR20 and KR21 are, however, smaller on questionnaires with all questions of similar difficulty. Since both the REL group and the DDM group were given a questionnaire that spans attribute completeness, entity relationship in form of cardinality and optionality, as well as attribute duplication, the difficulty level of the survey is not assumed to be homogeneous and consequently the KR20 was used to measure the reliability of the instrument for both groups

Data Processing To review if there was any correlation between the groups, whereas the groups performed increasingly well, or poor, on a certain type of question, a regression analysis was performed on the mean scores of the groups relative to the eight questions. The correlation was calculated as: The adjusted r squared, representing the correlation of the mean scores of each group on the eight questions was expected to be low (Appendix 11). Since there is no indication that the effort or magnitude of effort, to respond correctly to the questions is substantially equal on the notational forms being validated, there is little rationale that the performance would be strongly correlated as measured by the mean scores of the groups. Future in-depth research might establish a measure for the magnitude of effort involved in validating the models. This is beyond the scope of this research, where the focus is on determining if a difference in notational form has an impact on the results as measured by attribute completeness, entity relationship optionality, entity relationship cardinality and attribute duplication.

Some High-level findings • The DDM group had a maximum total score of 712, of this maximum, the group scored 540 points. This constitutes an accuracy of 75.4% in the overall validation of the dimensional data model. The average score for each individual survey was 6.07, which means that each participant, on average was able to answer a little over six out of eight questions correctly when validation of the dimensional data model presented to them. • The REL group had a maximum total score of 680, of this maximum the group scored 370 points. This constitutes an accuracy of only 54.41% in the overall validation of the relational data model. The average score for each individual survey was 4.35, which means that each participant, on average was able to answer a slightly over four out of eight questions correctly when validation of the relational data model was presented to them. • This finding was rather surprising. The REL group overall only performed slightly better than what one would expect to find by random guessing on a true/false questionnaire. However, as presented later, there are some indications that the REL notational form actually is misleading in the areas of multiple entities for a single construct, such as the address construct being modeled as multiple entities, specifically, question number eight that asks the participants to identify the presence of the state attribute. Only 40.0% of the participants of the REL group were able to correctly validate this, while 83.2% of the DDM group answered this question correct.

Some High-level findings • The distribution of correct responses on all eight validation questions Notice the skewness to the right in the DDM group and the more normal distribution of observations in the REL group

Some High-level findings High-level findings: Since the critical value is lower than the calculated value for chi-squared We must reject the hypothesis that the modeling form (DDM/REL) is not significantly related to the validation performance.

Research Question One: Attribute Completion Findings • To analyze research question 1, “Does denormalization of logical data models benefit the non-technical validator of a data model when validating for attribute completeness?”, a chi-square test of the categories of the scores from the REL and the DDM groups on questions seven and eight was performed Since the critical value is lower than the calculated value for chi-squared, we must reject the hypothesis that the modeling form is not significantly related to the validation of attribute completeness performance.

Research Question Two: Relationship Validation Findings • To analyze research question 2, “Are relationship errors, as expressed in optionality and cardinality, easier to detect in models that are denormalized as compared with normalized logical models?”, another chi-square test of the categories of the scores from the REL and the DDM groups on questions three, four, nine and ten was performed. Since the critical value is lower than the calculated value for chi-squared, we must reject the hypothesis that the modeling form (DDM/REL) is not significantly related to the validation of modeling relationship performance.

Research Question Three: Attribute Duplication Validation Findings • To analyze research question 3, “Are attribute duplication errors better detected in a denormalized data model?”, another chi-square test of the categories of the scores from the REL and the DDM groups on questions five and six was performed. Since the critical value is lower than the calculated value for chi-squared, we must reject the hypothesis that the modeling form (DDM/REL) is not significantly related to the validation of attribute duplication performance.

Internal reliability • The KR20 score is sometimes referred to as the “reliability coefficient” of the instrument and is meant to measure how well the instrument is correlated internally with the scores it produces. In general, the higher the score, the better the internal reliability is. • A score of 0.6152 or 0.6288 must be considered to be moderate. However, according to Mervis & Spagnolo, (1995, p. 5) “A high reliability coefficient is no guarantee that the assessment is well-suited to the outcome. It does tell you if the items in the assessment are strongly or weakly related with regard to student performance. If all the items are variations of the same skill or knowledge base, the reliability estimate for internal consistency should be high. If multiple outcomes are measured in one assessment, the reliability estimate may be lower”. • Since the instrument is measuring three distinct areas where the ability in one area does not necessarily translate to equal performance in other areas, the domains of knowledge or skills assessed are more diverse and a student who knows the content of one outcome may not be as proficient relative to another outcome. As a result, the moderate score on the KR20 might be expected. A higher score might be obtained by narrowing the research into a single research question and creating an instrument targeting only this area.

Correlation • After an exhaustive statistical analysis of the 174 questionnaires from bachelor level students, the non-parametric statistics were compiled, a regression analysis of the overall performance was conducted, and the statistical findings were reviewed to determine what answers to the research questions could be stated, or inferred, based on the results. To review if there were any correlation between the groups, whereas the groups performed increasingly well, or poor, on a certain type of question, a regression analysis was performed on the mean scores of the groups relative to the eight questions. The correlation was measured as the adjusted r-square and was found to be slightly negatively correlated with a magnitude of -0.1988. A test of variance, ANOVA, was also performed with a calculated F of 0.0049 and a critical F of 0.9470 at a 95% confidence level with 1 degree of freedom between the groups and 14 degrees of freedom within the groups. The variance was therefore found to be significantly different with respect to the performance of the groups on the questions.

Conclusions • A major challenge to the overall usefulness of modeling forms has been the lack of empirical research to support the fundamentals of the assumptions being made. While some notational forms are attempting to become as expressive as possible, others are relying on the base knowledge of the audience. • The establishment of the psychology school of thought around cognitive inference has demonstrated that all items do not need to be clearly expressed in order for the non-initiated participant to understand. Nor does every instance that can occur have to be expressed. • As an example, a child understands that a hand being placed under a table does not cease to exist, once the base cognitive skill has been established. This cognitive inference allows the child to expand its knowledge to understand that the hand still exists when placed under a brown table, a red table, a circular table or a rectangular table. The child may never have seen a hand being placed under grandma’s pink triangle table, but cognitive inference allows the child to extend its knowledge without being exposed to all possible instances of the event. Unfortunately, this skill is often ignored in the modeling field, where each instance is often required to be expressed at the cost of the readability of the model by the audience.

Conclusions • In general, as the model becomes more useful to the system designers, it also becomes more complex. This makes it harder for the users to understand and correctly identify errors and the design implications of the model. Model validation is therefore a precarious effort, where the readability of the model must be weighted against the benefits to the system designers. The research presented in this paper is the first step towards re-examining the direction of increased expression as introduced in the modeling forms of late. • The core issue of normalization is the dual audience the models have. While normalization is appropriate for the technical system developers, there is increased support in the field for a more denormalized approach to modeling when interacting with the business community. This implies that the semantic richness of a more normalized model has to be suppressed to the metadata layer in the form of business rules. • It does not mean that normalization is inappropriate for the technical audience who has to accurately build the systems with the correct data relationships and therefore need to understand those relationships. However, it does imply that the normalized modeling formats and conventions are not appropriate when interacting with the business community.

Conclusions • The first research question was stated as “Does denormalization of logical data models benefit the non-technical validator of a data model when validating the attribute completeness?” The research provided clear indications that the validators of the denormalized model benefited by the denormalization of the model, as they performed 65% better than their counterparts who had to validate a more normalized model. • The second research question that was explored was “Are relationship errors, as expressed in optionality and cardinality easier to detect in models that are denormalized as compared with normalized logical models?” Again the research results from this experiment provide strong indications that this is also true. The group validating the denormalized model performed 31% better than the group who were asked to validate the more normalized model.

Conclusions • The final research question area was the testing of validation performance regarding attribute duplication. The research question was stated as “Are attribute duplication errors better detected in a denormalized data model?” This also appears to be true, as the group validating the denormalized model performed 35% better than the group who was asked to validate the more normalized model. • The overall result from the three research questions was that errors in attribute completeness, attribute duplication, relationship optionality and relationship cardinality was detected 39.4% better by the group that was asked to validate the denormalized model. • As a result, the overall findings from this study are the empirical establishment of a strong positive impact of model denormalization on the validation performance of non-technical model reviewers.

Conclusions • A likely explanation for this difference is the volume of constructs in the normalized model relative to the denormalized. As an example, the denormalized model contained only five entities and four relationships, while the normalized model had twelve entities and thirteen relationships. • This proliferation of relationships and number of entities is a direct result of the normalization effort as proposed by Codd (1972 & 1974), Fagin (1977), Chen (1977), Ullman (1988) and Dinech (1997). This is not to say that normalization is a general cause of system development problems. Normalization is in fact an essential step for the technical system developer and a very important tool to assure the accuracy of the system being built. • However, the research presented in this paper indicates that the expressiveness of the normalized model might be at such a level where the non-technical validators are unable to perform the validation effort independently with any relevant degree of accuracy.

An Excerpt of the Recommendations • In the area of model validation, more research is needed. The implications from the findings of the research presented here is that the driving forces and the magnitude of the cognitive inference needs to be established. While the findings in this paper provide evidence of the existence of a link between validation performance and normalization, there are significant gaps in how to measure the magnitude of the relationship at various levels of normalization, as well as gaps in the empirical evidence to support the rationale as to how cognitive inference works with visual models. • The fist gap must be addressed by examining the steps associated with the validation effort as it relates to the various levels of normalization. The validation steps, once established, could be correlated with the overall performance of a validation group and a measure of the magnitude relative to the objects and notational form could be established. This magnitude could then be used to score a model based on the level of complexity and the difficulty of validating it. • This “readability statistic” could also be extended to other notational forms and modeling conventions used in object, function and process models. The readability statistics could also be used to score new modeling conventions and any proposed modeling form. As so, the field of modeling may start to mature and base new developments on the fundamental science behind modeling, instead of opinions.

Questions and Answers

AN EMPIRICAL EXAMINATION OF THE IMPACT OF DATA MODEL DENORMALIZATION ON THE