ABSTRACT:
Real world problem solving can include complex dimensions, especially while tackling broad business problems or organizational challenges. To do justice to the complexity of issues involved, advanced modeling techniques need to be used. Use of complex modeling techniques bring along many issues associated with them. Chief among these issues are the effectiveness of the model and the modeler as well as an organization’s control over use of the model to solve the problem at hand. These issues affect an organization’s ability to learn the advanced modeling procedures and consistently and correctly apply them to develop implementable and highly effective solutions to problems and challenges. This paper suggests a structured framework to assess the quality of a knowledge management model and provides an illustration using a Partial Least Squares (PLS) Regression model used in a parallel research effort focused on knowledge management (Srinivasan and Horowitz, 2004). This paper also discusses the efficacy of hypotheses-based model construction, the trade-offs modelers need to make and the importance of the skills of the modelers, based on observations from a simulated experiment.
Introduction
Given that most modern day companies are struggling with hard to quantify problems like the ones described above, corporations must rely heavily on their modelers’ skills. Therefore it is vital to have a structured framework to assess modeling quality in order to place trust in a business modeler’s work as a basis for investing large sums of money. To address the need for model validation when addressing knowledge management problems, this paper proposes and demonstrates the use of simulation as a powerful tool for assessing the quality of an analytical knowledge management model. In particular, the paper demonstrates the use of simulation in assessing the quality of a model constructed using a technique called Root Causes Analysis (RCA) Modeling, generically known as Structural Equation Modeling (Barclay et. al., 1995). The RCA modeling technique refers specifically to the methodology applied to a broader set of business challenges solved as part of a larger research effort (Srinivasan and Horowitz, 2004), where root causes were hypothesized for creating potential management responses to key business challenges and the relationships between the root causes and the challenge being addressed were modeled using a structural equation model.
This paper draws upon the results of a controlled simulation experiment that was performed to evaluate an RCA model constructed in the larger research effort (Srinivasan and Horowitz, 2004). In particular, different types of sensitivity analyses were conducted to demonstrate the robustness of the model.
The first section of the paper provides a high level description of an RCA model and a brief explanation of the Partial Least Squares (PLS) Regression technique that is used to analyze the model. In the second section, the underlying RCA model that is evaluated using a simulation experiment is described briefly. This is followed by the model quality assessment methodology that involves four different sensitivity analyses that are described in turn in the subsequent sections. Next, a discussion on the efficacy of hypotheses-based model construction and the importance of the skills of the modelers is provided. Finally, the key conclusions from this paper are synthesized.
RCA Model And The PLS Technique
RCA models for knowledge management are used to identify the potential relationship between management actions and desired effects. RCA models are similar to conceptual models (Barclay et. al., 1995) that consist of:
§ Problem-specific hypotheses or CONSTRUCTS (also known as entities, that are items of direct interest but not directly measurable within the organization)
§ MEASURES that are manifestations of the hypothesis constructs (also known as attributes that can be more easily observed in the participating organization). These measures are related to the constructs either in a formative sense (measures define the construct) or in a reflective sense (construct defines the measures)
§ CAUSAL PATHS linking the constructs (also known as relationships or influence paths).
Figure 1 isa simple illustration of an RCA model that identifies two root causes for the “Employee Productivity” problem.

Figure 1 Simple RCA Model
In this illustration, we see critical root causes listed in oval blocks with the connecting arrowheads (causal paths) pointing in the direction of causality. The measures for these root causes are shown within rectangles, adjacent to each root cause.
PLS, a structural causal modeling technique is used to rank order, by importance, the root causes in an RCA model. The PLS methodology is used to build a linear model, Y= f(X) + Error Term, where, in this application, Y= Business Challenge; X = Root Causes. Further, each of the root causes are themselves linearly regressed with their measures, to arrive at strong measurement equations for the root causes. References (Barclay et. al., 1995), (Fornell and Larcker, 1981), (Hulland, 1999) and (Bontis and Fitz-enz, 2002) detail the analytics behind this statistical technique. The data that is collected in order to carry out an analysis is a set of employee responses to questions regarding root cause candidates. The data required to populate the RCA model was captured using a survey that contains Likert-type scale questions (1-Strongly disagree to 4- Strongly agree).
RCA Model Used For Simulation Experiment
As a first step, a base RCA model was identified for use in the simulation study. This RCA model was created for a large technology professional service organization whose management team defined a key business challenge of increasing the amount of innovation provided in the development of technical solutions provided to clients. Figure 2 illustrates the RCA model that is being used in this study. As shown in the model, there were 12 constructs (Srinivasan & Horowitz, 2004) used to explain the variance in the final dependent construct, “Innovation Throughput”, which will be referred to as MAIN from now on.

Figure 2 RCA Model
Model Quality Assessment Methods
Three research questions arise in the context of assessing the quality of our RCA model. They are as follows:
1. How does a management team know that the model is good?
2. How will modeling quality vary with different model formulations?
3. How will modeling quality vary with different levels of response quality?
The answer to the first question is derived by looking at the final R2 (the value of co-efficient of determination that reflects the ‘goodness of fit’, or the explanatory power of the RCA model) value of the dependent construct. The higher the value of R2, which means most of the significant explanatory variables have been accounted for in the model, the better the model. Hence, R2 alone (which is a PLS output) is a reasonable metric for assessing model quality and no further analysis is necessary.
The answer to the third question is very difficult to assess, without conducting controlled experiments multiple times, with actual responders. This experiment would require responders to take multiple surveys (ranging from very accurate to totally random responses). Further analysis was not conducted, given the difficulty in making the respondents (who are employees of a corporate organization, in our case) answer multiple surveys, over a period of time.
Finally, moving on to the second research question, answers are derived by studying variations in R2. Specifically, the following sensitivity analyses were performed:
§ Sensitivity of overall R2 to the exclusion of specific constructs that take away from structural model completeness
§ Sensitivity of overall R2 to the exclusion of specific measures / questions that take away from the completeness of the measurement model
§ Sensitivity of R2 to the structural model formulation as a method for measuring the efficacy of the structural model
§ Sensitivity of R2 to the measurement model specification in order to measure the efficacy of the specification
Sensitivity
Of Overall R2 To The Exclusion Of Constructs
In this sensitivity analysis, the variation of R2 to the number of constructs in the model was studied. Apriori, for each construct an assumption was made on the % of respondents who agree that the given construct directly influences MAIN. Table 1 summarizes these apriori assumptions and the abbreviations that will be used henceforth for all the constructs, along with whether their measures are reflective or formative. Among those who agree, a percentage was established for respondents who Strongly Agree (SA) was also decided. The SA percentages are given in brackets. The establishment of the various percentages served as a case study for determining the relative importance of root causes.
|
Name of Construct |
Abbreviation |
% who agree (% who SA) |
|
1. Customer emphasis (Reflective) |
CEMP |
20% (10% SA) |
|
2. Managerial enthusiasm and participation (Reflective) |
MEMP |
55% (30% SA) |
|
3. Innovation skillsets (Formative) |
ISK |
40% (20% SA) |
|
4. Innovation enablers (Formative) |
IEN |
45% (25% SA) |
|
5. Focused innovation (Formative) |
IFO |
85% (45% SA) |
|
6. Knowledge creation for reuse (Formative) |
KCR |
90% (45% SA) |
|
7. Knowledge capture (Formative) |
KCAP |
65% (35% SA) |
|
8. Knowledge sharing (Formative) |
KSH |
80% (40% SA) |
|
9. Resource Allocation (Formative) |
RALL |
30% (15% SA) |
|
10. Team/ individual incentives (Reflective) |
TEAM |
70% (35% SA) |
|
11. Organizational incentives (Reflective) |
ORG |
10% (4% SA) |
|
12. External Forces (Reflective) |
EXT |
5% (2% SA) |
|
13. Innovation Throughput, the final dependent construct (Reflective) |
MAIN |
75% (40% SA) |
Table 1 Our a priori assumptions
for ABC Company
The input data set for running the Latent Variable Partial Least Squares (VPLS) software according to these assumptions was created to feed the software. The overall R2 value for MAIN was found out to be 0.85. Table 2 shows the R2 values for all the constructs.
|
R2
Values |
MAIN |
CEMP |
MEMP |
ISK |
IEN |
IFO |
KCR |
KCAP |
KSH |
RALL |
EXT |
ORG |
TEAM |
|
MAIN |
0 |
0 |
0.02 |
0 |
0 |
0 |
0.01 |
0 |
0.41 |
0 |
0 |
0.00 |
0.41 |
|
CEMP |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
MEMP |
0 |
0.11 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.05 |
0 |
|
ISK |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.59 |
0 |
0 |
0.11 |
|
IEN |
0 |
0 |
0.60 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
IFO |
0 |
0 |
0.23 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
KCR |
0 |
0 |
0 |
0.10 |
0.08 |
0.40 |
0 |
0 |
0 |
0 |
0 |
0 |
0.09 |
|
KCAP |
0 |
0 |
0 |
0 |
0.16 |
0 |
0.01 |
0 |
0 |
0 |
0 |
0 |
0.55 |
|
KSH |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.10 |
0 |
0 |
0 |
0 |
0.41 |
|
RALL |
0 |
0 |
0.32 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
EXT |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
ORG |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.43 |
0 |
0 |
|
TEAM |
0 |
0 |
0.48 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Table 2 R2 contributions of all constructs
Using these R2 contributions, the direct and indirect (hence total) effects of each construct on MAIN was computed. Figure 3 illustrates how to compute the direct and indirect effects of a construct. The percentages shown are the R2 contributions.

Using the method illustrated above, the total effects of every construct, was computed which is shown as a pie chart in Figure 4.

Figure 4 Total effects distribution
It is clear from Figure 4 that the major contributions to R2 come from three constructs: TEAM, MEMP, and KSH. The remaining contributions are only minor. Not including even one of these major constructs would reduce the value of R2 drastically as is observed in Figure 5, where the R2 values for MAIN are derived by adding 1 construct to another in two different sequences (until all 12 constructs were reached):
1. Minimum Total effect constructs first until all constructs are accounted for
2. Maximum Total effect constructs first until all constructs are accounted for

Figure 5 Variation of R2 values to # of constructs
For a given number of constructs, the minimum and maximum R2 values obtainable can be read from the graph. The graph also shows that steep increases in R2 are seen only when the most significant constructs are added. Therefore if the most significant constructs in the model are included, one can get a very high R2 (81%), using only three constructs in the model. On the other hand, if the three most significant constructs are not included, one will end up with only 65% R2 despite using as many as 9 constructs. However, since the most significant constructs cannot be identified a priori, one would benefit only with more constructs included in the model, which gives a better chance of including the most significant ones. Thus, our theoretical approach of including 12 conceptually hypothesized constructs in the model gave us a better chance of including the most significant ones.
Sensitivity Of Overall R2 To The
Exclusion Of Measures/ Questions
In this analysis, the variation of R2 to the exclusion of both measures and questions were studied. The most important measures were sequentially removed from constructs in the descending order of construct significance. For measures with more than one question, the variation of R2 with reduction in number of questions was studied (Questions were taken out from the most significant measures only.) By conducting this analysis, the efficacy of the measurement model can be studied and one can get a sense of the trade-off between asking too many questions and potentially reducing respondents due to the size of the questionnaire. Figure 6 shows the number of measures associated with every construct and the corresponding number of questions, for each measure.

Figure 6 Chart showing number of measures and questions for each construct
Every construct had a minimum of two measures and a maximum of upto three measures. The number of questions associated with every measure ranged from two to three, thereby making a total of four to six questions per construct.
The most important measures for each construct were identified, using the measurement model weights obtained from one of the LVPLS software outputs. Figure 7 depicts these measurement model weights, which helps identify the most important measures for each construct.

Figure 7 Chart showing measurement model weights
Based on the data from Figure 7, measures could be eliminated one by one from the constructs in two different sequences: most significant to the least significant and vice-versa. Figure 8 plots the variation of R2 vs number of questions asked, for both of these sequences.


Figure 8
Variations of R2 value with number of questions
One may ask as few as 34 questions or as many as 56 questions and achieve an R2 between 60% & 64% (Using a sequence based on first removing the most important measures from the most significant constructs). However if one asks the ‘right’ questions (corresponding to the most important measures in the significant constructs), one can jump from 64% to 84% R2 (Using a sequence based on first removing the most important measures in the most significant constructs) or even get 80% R2 with as little as 42 questions (Using a sequence based on first removing the most important measures in the least significant constructs). However, since it is not possible to identify the most significant construct or the most important measure apriori, one would benefit only with ‘logical’ constructs and measures (questions) included in the model which provide a better chance of including the most significant/important ones. Hence, an ‘intuitive’ approach to defining the measures, agreed to by the organization for whom the analysis is being conducted, was adopted. This approach is relied upon to provide us with a substantial liklihood of including the most important ones.
In connection with the study of variation of R2 with reduction in number of questions, a discussion is provided on why only two to three questions were used to capture a measure and why each construct had only two to three measures. The essence of most measures was captured using one question to bring to bear each of the social and knowledge characters[1], making a total of two questions for these measures. In some cases, measures were independent of the characters, but required other unique traits to be brought out through their own customized questions. With three such questions, the essence of these measures could be captured. Going beyond three questions might have led to a marginal increase in R2, but would have led to a number of questions that might reduce the survey response rate (and hence impacting statistical validity). On the other hand, a minimum of two questions per measure were used to maximize R2, as demonstrated in figure 9.

Figure 9 Chart showing R