Analyzing Teaching Evaluations
PNAIRP Annual Meeting,
Teaching evaluations are an important part of faculty assessment for most institutions. This report discusses one method of analyzing teaching evaluations, using regression analysis to examine the variance in items measuring the overall teaching evaluation, and to determine what proportion of this variance is explained by individual variables. This analysis was commissioned by a faculty committee evaluating our current evaluation instrument, and was initially focused on looking at the effects of nonteaching variables such as gender, tenure status, and class size. The analysis was conducted using the full set of teaching evaluations from the previous 8 semesters. The individual teaching evaluation items are the unit of analysis, for a total N of 35,769.
Question 
N 
Mean 
St. Dev. 
Q1: course as a whole 
35,711 
4.87 
0.95 
Q2: course organization 
35,714 
4.88 
0.95 
Q3: clarification of
student responsibilities 
35,697 
4.91 
0.96 
Q4: value of course in
increasing analytical/interpretive skills 
35,565 
4.76 
1.08 
Q5: course stimulated
intellectual curiosity 
35,608 
4.77 
1.14 
Q6: value of the assigned
work 
35,404 
4.66 
1.01 
Q7: instructor’s ability to
present the subject matter clearly 
35,667 
5.00 
1.04 
Q8: instructor’s answers to
students’ questions 
35,632 
4.99 
1.01 
Q9: evaluation methods
(tests, papers, etc.) 
35,469 
4.66 
1.03 
Q10: value of instructors
comments on tests, papers, etc. 
35,203 
4.69 
1.08 
Q11: availability of extra
help when needed 
34,718 
5.07 
0.96 
Q12: overall teaching
ability 
35,646 
5.10 
0.99 
Scale: 6=excellent, 5=very good,
4=good, 3=fair, 2=poor, 1=very poor
Regression analysis allows us to isolate the effects of each variable when determining their impact on teaching evaluations. This analysis focuses on the two main questions on the evaluations: Q1 (the course as a whole) and Q12 (overall teaching ability). Two different models are used to predict the variance of Q1 and Q12: the first uses only course level, enrollment, gender, tenure status, years at Whitman, and academic division[1]. The second also includes the other items (Q2Q11) from the teaching evaluations[2].
Table 2 shows standardized coefficients for the regression analysis, along with the R^{2} value for each model. Standardized coefficients allow us to compare the relative size of the effect of each independent variable on the dependent variable, in comparison to the other variables in the model.

Dependent Variable 

Variable 
Q1: Course as a Whole 
Q12: Overall Teaching Ability 


Model 1 
Model 2 
Model 1 
Model 2 
Level 
.089 
.002 
.062 
.021 
Enrollment 
.003 
.003 
.004 
.000 
Female 
.027 
.012 
.021 
.018 
Tenured Faculty 
.123 
.023 
.125 
.026 
Years at Whitman 
.072 
.020 
.063 
.004 
Div 1: Social Science 
.104 
.026 
.101 
.014 
Div 3: Science 
.136 
.034 
.077 
.01 
Q2: Course Organization 

.157 

.076 
Q3: Clarification of Resp. 

.029 

.013 
Q4: Increase analytical
skills 

.159 

.079 
Q5: Stim intellectual
curiosity 

.278 

.085 
Q6: Value of assigned work 

.066 

.003 
Q7: Present sub matter
clearly 

.165 

.400 
Q8: Answers to student
questions 

.053 

.190 
Q9: Evaluation methods 

.114 

.059 
Q10: Value of written comments 

.028 

.081 
Q11: Availability of extra
help 

.002 

.092 
Rsquared 
0.035 
0.713 
0.026 
0.781 
* When predicting Q1, all
variables are significant at p<.05, except enrollment (both models), q8, and
q10.
** When predicting Q12, all variables
are significant at p<.05, except enrollment (both models) and years at
Whitman (model 2).
When looking at Table 2, it is important to note the small R^{2 }values for model 1. When predicting Q1, R^{2 }= .036, and when predicting Q12, R^{2 }= .026. This indicates that by themselves, course level, enrollment, gender, tenure status, years at Whitman, and academic division account for less than 4% of the overall variance of Q1 and less than 3% of the overall variance of Q12. By contrast, when we add in the other variables from the course evaluations (Q2Q12) our model predicts 71.3% and 78.1% of the variance for Q1 and Q12, respectively. While the following section will show that a very small part of this added variance is due to the interplay between nonteaching and teaching factors, we can argue that the “negative effects” of not being tenured or being female are more than overcome by, for example, presenting the subject matter clearly.
Thus, with this regression analysis, we can conclude that while factors such as class level, gender, and tenure status have an effect on teaching evaluations, overall these factors account for a very small proportion of the variance of the two most important questions on the teaching evaluations. To provide a few examples, the instructor’s ability to present the subject matter clearly, answer students’ questions, provide helpful comments on tests and papers, and enhance students interpretive or analytical skills were all much more important than the “nonteaching” factors.
From here an interesting question can be raised: what proportion of the variance of the overall teaching item is explained by each variable? The technique used to do this analysis is based on a few premises. First, the standardized coefficients reported in Table 2 represent the strength of association between the independent and dependent variable relative to the other variables in the model. Second, the independent variables together account for a portion of the variance of the dependent variable, and this variance is represented by the R^{2 }value. Thus, to isolate what each independent variable contributes to the overall variance of the teaching ability variable (Q12), the following steps were taken, summarized in Table 3:


Absolute Value 
Proportion of R^{2} 
Proportion DV Variance 
Variable

Beta

Abs Value (Step 1) 
Abs.Val./Sum (Step 3) 
Step 3* R^{2} (Step 4) 
Course Level 
.021 
.021 
.018 
.014 
Enrollment 
.000 
.000 
.000 
.000 
Female 
.018 
.018 
.015 
.012 
Tenure Track Faculty 
.026 
.026 
.022 
.017 
Years at Whitman 
.004 
.004 
.003 
.003 
Div. 1 
.014 
.014 
.012 
.009 
Div. 3 
.010 
.010 
.009 
.007 
Q2 
.076 
.076 
.065 
.051 
Q3 
.013 
.013 
.011 
.009 
Q4 
.079 
.079 
.067 
.053 
Q5 
.085 
.085 
.073 
.057 
Q6 
.003 
.003 
.003 
.002 
Q7 
.400 
.400 
.342 
.267 
Q8 
.190 
.190 
.162 
.127 
Q9 
.059 
.059 
.050 
.039 
Q10 
.081 
.081 
.069 
.054 
Q11 
.092 
.092 
.079 
.061 

Step 2: 
SUM = 1.171 
SUM=1.00 
SUM = .781 
Step 3 places the standardized values (Betas) into a scale ranging from 0 to 1, and shows the proportion of the R^{2 }value they account for, and step 4 then puts these variables into a scale that shows the proportion of variance explained. It is important to note the combined variance of the nonteaching factors in the rightmost column is about .062. This suggests that in Model 2 nonteaching factors actually account for about 6% of the variance, more than the 2.6% suggested by Model 1 in Table 2.
To make these findings a bit more meaningful, items Q2Q11 (excluding Q7) were placed into 3 different scales. These scales capture the underlying concepts of course organization and materials (Q2, Q3, Q6, and Q9), instructor dealings with students (Q8, Q10, Q11), and mental stimulation (Q4 and Q5). Because of its large impact on the overall teaching variable, Q7 (instructors ability to present the subject matter clearly) was given its own category. The proportions of variance explained in Table 3 were then used to create Figure 1. The “Unknown Factors” category represents the “leftovers” from the regression model R^{2}: 1.781 = ~22%.
This process was then repeated for Q1: Evaluation of the Course as a Whole. Results are found in Figure 2. Again, nonteaching factors such as gender account for a very small proportion of the variance (7%), while teaching ability, mental stimulation, and course organization account for over 60% of the variance.
Summary
While small mean differences exist between different groups of faculty, in the broader picture these gender or rank differences do not add up to much when we evaluate their impact on the overall course and teaching evaluations (Q1 and Q12). Rather, when evaluating the course as a whole, or evaluating the overall teaching ability of an instructor, other factors such as the clear presentation of the subject matter and mental stimulation are much more important.
One key component missing from this analysis is information about the student completing the evaluation. Whitman’s evaluation form does not ask for any information about the students. This is positive in that it helps students to feel like they have more anonymity in filling out the evaluation. However, it is possible that students find courses within their major more mentally stimulating, or students expecting to get a good grade in a course will evaluate it higher than students expecting to get a bad grade. Perhaps first year students are more generous in their evaluations than seniors, or women less critical than men. Until this information appears on the evaluation form, the extent to which student demographics make up the “unknown factors” will remain, for now, unknown.
[1]
[2] The variable Q1 (course as a whole) was not included in the analysis when predicting Q12 (overall teaching ability) and viceversa.