Obtaining the same ANOVA results in R as in SPSS - the difficulties with Type II and Type III sums of squares
I calculated the ANOVA results for my recent experiment with R. In brief, I assumed that women perform poorer in a simulation game (microwolrd) if under stereotype threat than men. My students who assisted in the experiments used SPSS for their calculations. I realized that they obtained different results than I did, with the same model on the same data set. As I was new to R, my initial calculation, an analysis of covariance (ANCOVA) with the dependent variable microworld performance (MWP), the treatment factors gender and stereotype threat, and the covariate reasoning ability, looked like this:
I see two significant main effects of the treatment factors, a significant effect of the covariate, and a significant interaction effect. However, Quick-R tells me this:
What a difference: The main effect of the participants' gender on thir microworld performance does not reach statistical significance. However, that is still not what SPSS produces:
UNIANOVA MWP BY GENDER STTHREAT WITH reasonz
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/DESIGN=reasonz GENDER STTHREAT GENDER*STTHREAT.
In SPSS, the main effect of gender is still significant. I dug a little deeper and found another line I needed to add to the R command in order to get exactly the same result:
As you can see, these results are identical. But why all these differences? What does options(contrasts=c("contr.sum", "contr.poly")) actually do and what the heck are Type-III sums of squares? I surely did not learn about these things at my university. I thus did a little reading.
It turns out that the decision about which type of sums of squares to use is based on the question whether it is reasonable to report main effects in the presence of an interaction. Let's review the hypothesis of the experiment: It assumes that women exhibit a decrease in microworld performance under stereotype threat. This is an interaction hypothesis. An error bar plot (lines representing 1 SE) reveals that this is the case:
The plot indicates a significant interaction between gender and stereotype threat. The main effect of stereotype threat is obtained by averaging the performance scores of all participants (both male and female) over the two stereotype threat conditins. This will lead to a low average score under the stereotype threat condition because of the interaction, because the female participants score so extremely low unter stereotype threat and account for the lower average. Thus, it makes no sense to look at the main effect of stereotype threat if an interaction of stereotype threat * gender is present.
Looking for a main effect of stereotype threat under the presence of a significant interaction is a violation of the marginality principle that assumes that all terms to which a particular term is marginal are zero. Lower order terms are marginal to higher order terms, i.e. the main effects of two factors A and B are marginal to the interaction effect A*B. Thus, in this case, the marginality principle would assume that if we inspect and report main effects of gender and stereotype threat, the interaction of stereotype threat and gender is zero. That is not the case and the above example illustrates that - under the given hypothesis - it is useless to report the main effect of stereotype threat.
Now, the problom with Type-III sums of squares (also referred to as marginal sums of squares) is that they are "obtained by fitting each effect after all the other terms in the model, i.e. the Sums of Squares for each effect corrected for the other terms in the model. The marginal (Type III) Sums of Squares do not depend upon the order in which effects are specified in the model" (source). In the case with stereotype threat, that clearly doesn't make any sense: Reporting the Type III sum of squares (as SPSS does per default) for the main effect of stereotype threat means doing so while correcting for the interaction. But it is precisely this interaction that caused the main effect in the first place! Thus, Type-III sums of squares violate the principle of marginality and do not make any sense in the stereotype threat case. Even more so, Type-III sums of squares do "... NOT sum to the Sums of Squares for the model corrected for the mean". I wonder whether this renders the usual way of calculating a factor's effect size eta-square by dividing the SS of the factor by the total SS useless, too?
Anyway, coming back to the ominous contrasts=c("contr.sum", "contr.poly"): In order to obtain the correction for the rest of the factors in the model that Type-III SSs deliver, R needs to know how to balance the factors in the calculation of the SSs. Therefore, it requires a cotrast matrix with zero-sum columns (see here). The R-help for the options() command (?options()) tells us:
[,1]
1 1
2 -1
My first attempt at Type-III SSs in R above produced nonesense and differed from SPSS, because this wasn't specified.Without going into too much detail here (basically because I haven't yet understood everything myself), there is an alternative to the sequence-dependent Type-I SSs and the marginality-violating Type-III SSs: Type II sums of squares preserve the marginality principle. This is how to get them, and this example illustrates that they are diffrent from Type-III SSs and that they are - at least in this case - order independent:
SPSS can do the same by specifying /METHOD=SSTYPE(2) in the UNIANOVA syntax.
The remaining problem in the present case is the main effect of gender. It does make sense to investigate the effect of gender in the presenence of the interaction with stereotype threat, because it could be that women are generally poorer complex problem solvers than men and perform especially poor under stereotype threat on top of the general difference. In fact, the error bar above indicates that this is the case. This leaves me with one main effect that cannot be interpreted (stereotype threat) and another one that can be interpreted. Which SSs should I use? I am a bit lost.
I see two significant main effects of the treatment factors, a significant effect of the covariate, and a significant interaction effect. However, Quick-R tells me this:
WARNING: R provides Type I sequential SS, not the default Type III marginal SS reported by SAS and SPSS. In a nonorthogonal design with more than one term on the right hand side of the equation order will matter (i.e., A+B and B+A will produce different results)! We will need use the drop1( ) function to produce the familiar Type III results.I do not want order to matter and adjust my calculation accordingly:
What a difference: The main effect of the participants' gender on thir microworld performance does not reach statistical significance. However, that is still not what SPSS produces:
UNIANOVA MWP BY GENDER STTHREAT WITH reasonz
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/DESIGN=reasonz GENDER STTHREAT GENDER*STTHREAT.
In SPSS, the main effect of gender is still significant. I dug a little deeper and found another line I needed to add to the R command in order to get exactly the same result:
As you can see, these results are identical. But why all these differences? What does options(contrasts=c("contr.sum", "contr.poly")) actually do and what the heck are Type-III sums of squares? I surely did not learn about these things at my university. I thus did a little reading.
It turns out that the decision about which type of sums of squares to use is based on the question whether it is reasonable to report main effects in the presence of an interaction. Let's review the hypothesis of the experiment: It assumes that women exhibit a decrease in microworld performance under stereotype threat. This is an interaction hypothesis. An error bar plot (lines representing 1 SE) reveals that this is the case:
The plot indicates a significant interaction between gender and stereotype threat. The main effect of stereotype threat is obtained by averaging the performance scores of all participants (both male and female) over the two stereotype threat conditins. This will lead to a low average score under the stereotype threat condition because of the interaction, because the female participants score so extremely low unter stereotype threat and account for the lower average. Thus, it makes no sense to look at the main effect of stereotype threat if an interaction of stereotype threat * gender is present.
Looking for a main effect of stereotype threat under the presence of a significant interaction is a violation of the marginality principle that assumes that all terms to which a particular term is marginal are zero. Lower order terms are marginal to higher order terms, i.e. the main effects of two factors A and B are marginal to the interaction effect A*B. Thus, in this case, the marginality principle would assume that if we inspect and report main effects of gender and stereotype threat, the interaction of stereotype threat and gender is zero. That is not the case and the above example illustrates that - under the given hypothesis - it is useless to report the main effect of stereotype threat.
Now, the problom with Type-III sums of squares (also referred to as marginal sums of squares) is that they are "obtained by fitting each effect after all the other terms in the model, i.e. the Sums of Squares for each effect corrected for the other terms in the model. The marginal (Type III) Sums of Squares do not depend upon the order in which effects are specified in the model" (source). In the case with stereotype threat, that clearly doesn't make any sense: Reporting the Type III sum of squares (as SPSS does per default) for the main effect of stereotype threat means doing so while correcting for the interaction. But it is precisely this interaction that caused the main effect in the first place! Thus, Type-III sums of squares violate the principle of marginality and do not make any sense in the stereotype threat case. Even more so, Type-III sums of squares do "... NOT sum to the Sums of Squares for the model corrected for the mean". I wonder whether this renders the usual way of calculating a factor's effect size eta-square by dividing the SS of the factor by the total SS useless, too?
Anyway, coming back to the ominous contrasts=c("contr.sum", "contr.poly"): In order to obtain the correction for the rest of the factors in the model that Type-III SSs deliver, R needs to know how to balance the factors in the calculation of the SSs. Therefore, it requires a cotrast matrix with zero-sum columns (see here). The R-help for the options() command (?options()) tells us:
contrasts:As the treatment factors gender and stereotype threat are unordered factors, R will use contr.sum in order to construct a contrast matrix of the apropriate order (i.e., 2), because contrasts=c("contr.sum", "contr.poly") was specified. contr.sum(2) produces
the default contrasts used in model fitting such as with aov or lm. A character vector of length two, the first giving the function to be used with unordered factors and the second the function to be used with ordered factors. By default the elements are named c("unordered", "ordered"), but the names are unused.
[,1]
1 1
2 -1
My first attempt at Type-III SSs in R above produced nonesense and differed from SPSS, because this wasn't specified.Without going into too much detail here (basically because I haven't yet understood everything myself), there is an alternative to the sequence-dependent Type-I SSs and the marginality-violating Type-III SSs: Type II sums of squares preserve the marginality principle. This is how to get them, and this example illustrates that they are diffrent from Type-III SSs and that they are - at least in this case - order independent:
SPSS can do the same by specifying /METHOD=SSTYPE(2) in the UNIANOVA syntax.
The remaining problem in the present case is the main effect of gender. It does make sense to investigate the effect of gender in the presenence of the interaction with stereotype threat, because it could be that women are generally poorer complex problem solvers than men and perform especially poor under stereotype threat on top of the general difference. In fact, the error bar above indicates that this is the case. This leaves me with one main effect that cannot be interpreted (stereotype threat) and another one that can be interpreted. Which SSs should I use? I am a bit lost.
Labels: analysis of varance, R, spss, statistics