Analyzing Teaching Evaluations

Neal Christopherson, Whitman College

PNAIRP Annual Meeting, October 12-14, 2005

           

            Teaching evaluations are an important part of faculty assessment for most institutions.  This report discusses one method of analyzing teaching evaluations, using regression analysis to examine the variance in items measuring the overall teaching evaluation, and to determine what proportion of this variance is explained by individual variables.  This analysis was commissioned by a faculty committee evaluating our current evaluation instrument, and was initially focused on looking at the effects of non-teaching variables such as gender, tenure status, and class size.  The analysis was conducted using the full set of teaching evaluations from the previous 8 semesters. The individual teaching evaluation items are the unit of analysis, for a total N of 35,769.

            Whitman College uses a standardized teaching evaluation form consisting of 12 questions, each asking the student to rate the course or professor on a particular aspect as excellent, very good, good, fair, poor, or very poor.  For purposes of this analysis, these responses have been given numerical values from 1 (very poor) to 6 (excellent).  Descriptive statistics for each of these 12 items can be found in Table 1.  Mean scores are, for the most part, close to 5, or “very good.”

 

Table 1: Descriptive Statistics

Question

N

Mean

St. Dev.

Q1: course as a whole

35,711

4.87

0.95

Q2: course organization

35,714

4.88

0.95

Q3: clarification of student responsibilities

35,697

4.91

0.96

Q4: value of course in increasing analytical/interpretive skills

35,565

4.76

1.08

Q5: course stimulated intellectual curiosity

35,608

4.77

1.14

Q6: value of the assigned work

35,404

4.66

1.01

Q7: instructor’s ability to present the subject matter clearly

35,667

5.00

1.04

Q8: instructor’s answers to students’ questions

35,632

4.99

1.01

Q9: evaluation methods (tests, papers, etc.)

35,469

4.66

1.03

Q10: value of instructors comments on tests, papers, etc.

35,203

4.69

1.08

Q11: availability of extra help when needed

34,718

5.07

0.96

Q12: overall teaching ability

35,646

5.10

0.99

Scale: 6=excellent, 5=very good, 4=good, 3=fair, 2=poor, 1=very poor

 

Regression Analysis

            Regression analysis allows us to isolate the effects of each variable when determining their impact on teaching evaluations.  This analysis focuses on the two main questions on the evaluations: Q1 (the course as a whole) and Q12 (overall teaching ability).  Two different models are used to predict the variance of Q1 and Q12: the first uses only course level, enrollment, gender, tenure status, years at Whitman, and academic division[1].  The second also includes the other items (Q2-Q11) from the teaching evaluations[2].

            Table 2 shows standardized coefficients for the regression analysis, along with the R2 value for each model.  Standardized coefficients allow us to compare the relative size of the effect of each independent variable on the dependent variable, in comparison to the other variables in the model.

 

Table 2: Standardized Regression Coefficients

 

Dependent Variable

Variable

Q1: Course as a Whole

Q12: Overall Teaching Ability

 

Model 1

Model 2

Model 1

Model 2

Level

.089

.002

.062

-.021

Enrollment

.003

.003

.004

.000

Female

-.027

-.012

-.021

-.018

Tenured Faculty

.123

.023

.125

.026

Years at Whitman

-.072

-.020

-.063

-.004

Div 1: Social Science

-.104

-.026

-.101

-.014

Div 3: Science

-.136

-.034

-.077

.01

Q2: Course Organization

 

.157

 

.076

Q3: Clarification of Resp.

 

.029

 

.013

Q4: Increase analytical skills

 

.159

 

.079

Q5: Stim intellectual curiosity

 

.278

 

.085

Q6: Value of assigned work

 

.066

 

.003

Q7: Present sub matter clearly

 

.165

 

.400

Q8: Answers to student questions

 

.053

 

.190

Q9: Evaluation methods

 

.114

 

.059

Q10: Value of written comments

 

.028

 

.081

Q11: Availability of extra help

 

-.002

 

.092

R-squared

0.035

0.713

0.026

0.781

* When predicting Q1, all variables are significant at p<.05, except enrollment (both models), q8, and q10.

** When predicting Q12, all variables are significant at p<.05, except enrollment (both models) and years at Whitman (model 2).

  

When looking at Table 2, it is important to note the small R2 values for model 1.  When predicting Q1, R2 = .036, and when predicting Q12, R2 = .026.  This indicates that by themselves, course level, enrollment, gender, tenure status, years at Whitman, and academic division account for less than 4% of the overall variance of Q1 and less than 3% of the overall variance of Q12.  By contrast, when we add in the other variables from the course evaluations (Q2-Q12) our model predicts 71.3% and 78.1% of the variance for Q1 and Q12, respectively.  While the following section will show that a very small part of this added variance is due to the interplay between non-teaching and teaching factors, we can argue that the “negative effects” of not being tenured or being female are more than overcome by, for example, presenting the subject matter clearly.

            Thus, with this regression analysis, we can conclude that while factors such as class level, gender, and tenure status have an effect on teaching evaluations, overall these factors account for a very small proportion of the variance of the two most important questions on the teaching evaluations.  To provide a few examples, the instructor’s ability to present the subject matter clearly, answer students’ questions, provide helpful comments on tests and papers, and enhance students interpretive or analytical skills were all much more important than the “non-teaching” factors.

 

Further Analysis of Teaching Factors

            From here an interesting question can be raised: what proportion of the variance of the overall teaching item is explained by each variable?  The technique used to do this analysis is based on a few premises.  First, the standardized coefficients reported in Table 2 represent the strength of association between the independent and dependent variable relative to the other variables in the model.  Second, the independent variables together account for a portion of the variance of the dependent variable, and this variance is represented by the R2 value.  Thus, to isolate what each independent variable contributes to the overall variance of the teaching ability variable (Q12), the following steps were taken, summarized in Table 3:

  1. The absolute value for each standardized coefficient was computed
  2. The sum of the absolute values was computed;
  3. Each absolute value was divided by the sum in step 2;
  4. Each value in step 3 was multiplied by the R2 value (.781) 

 

Table 3: Proportion of Variance Explained for Overall Evaluation of Teaching

 
 

Absolute Value

Proportion of R2

Proportion DV Variance

Variable
Beta

Abs Value

(Step 1)

Abs.Val./Sum

(Step 3)

Step 3* R2

(Step 4)

Course Level

-.021

.021

.018

.014

Enrollment

.000

.000

.000

.000

Female

-.018

.018

.015

.012

Tenure Track Faculty

.026

.026

.022

.017

Years at Whitman

-.004

.004

.003

.003

Div. 1

-.014

.014

.012

.009

Div. 3

.010

.010

.009

.007

Q2

.076

.076

.065

.051

Q3

.013

.013

.011

.009

Q4

.079

.079

.067

.053

Q5

.085

.085

.073

.057

Q6

.003

.003

.003

.002

Q7

.400

.400

.342

.267

Q8

.190

.190

.162

.127

Q9

.059

.059

.050

.039

Q10

.081

.081

.069

.054

Q11

.092

.092

.079

.061

 

Step 2:

SUM = 1.171

SUM=1.00

SUM = .781

Step 3 places the standardized values (Betas) into a scale ranging from 0 to 1, and shows the proportion of the R2 value they account for, and step 4 then puts these variables into a scale that shows the proportion of variance explained.  It is important to note the combined variance of the non-teaching factors in the rightmost column is about .062.  This suggests that in Model 2 non-teaching factors actually account for about 6% of the variance, more than the 2.6% suggested by Model 1 in Table 2.

To make these findings a bit more meaningful, items Q2-Q11 (excluding Q7) were placed into 3 different scales.  These scales capture the underlying concepts of course organization and materials (Q2, Q3, Q6, and Q9), instructor dealings with students (Q8, Q10, Q11), and mental stimulation (Q4 and Q5).  Because of its large impact on the overall teaching variable, Q7 (instructors ability to present the subject matter clearly) was given its own category.  The proportions of variance explained in Table 3 were then used to create Figure 1.  The “Unknown Factors” category represents the “leftovers” from the regression model R2: 1-.781 = ~22%.

 

 

            This process was then repeated for Q1: Evaluation of the Course as a Whole.  Results are found in Figure 2.  Again, non-teaching factors such as gender account for a very small proportion of the variance (7%), while teaching ability, mental stimulation, and course organization account for over 60% of the variance.

 

Summary

While small mean differences exist between different groups of faculty, in the broader picture these gender or rank differences do not add up to much when we evaluate their impact on the overall course and teaching evaluations (Q1 and Q12).  Rather, when evaluating the course as a whole, or evaluating the overall teaching ability of an instructor, other factors such as the clear presentation of the subject matter and mental stimulation are much more important.

            One key component missing from this analysis is information about the student completing the evaluation.  Whitman’s evaluation form does not ask for any information about the students.  This is positive in that it helps students to feel like they have more anonymity in filling out the evaluation.  However, it is possible that students find courses within their major more mentally stimulating, or students expecting to get a good grade in a course will evaluate it higher than students expecting to get a bad grade.  Perhaps first year students are more generous in their evaluations than seniors, or women less critical than men.  Until this information appears on the evaluation form, the extent to which student demographics make up the “unknown factors” will remain, for now, unknown.



[1] Whitman College has 3 academic divisions: Social Science, Humanities, and Science.  Variables for academic division were coded 1 or 0.  For this analysis, variables for Division 1 (social sciences) and division 3 (science and math) are included, while division 2 (humanities) is the omitted reference category.  The coefficients for the d1 and d3 variables are thus in comparison to division 2.

[2] The variable Q1 (course as a whole) was not included in the analysis when predicting Q12 (overall teaching ability) and vice-versa.