Pilot Study of New Student Ratings of Instruction Instrument:

Analysis of Scores for Evidence of Validity

Report Prepared for
Student Ratings Committee
Georgia Southern University

Bryan W. Griffin
Department of Curriculum, Foundations, and Research
Georgia Southern University
Statesboro, GA 30460
January 24, 2001


Analysis of Scores for Evidence of Validity

Abstract. If scores from the new instrument are valid, then how these scores behave in various conditions should be predictable. Several hypotheses about the behavior of these scores were formulated and assessed with data derived from the new student ratings instrument. Each of the hypotheses was supported, to some extent, by the data. These results provide some evidence for the validity of scores from the new instrument, but this evidence is limited since no external criteria (e.g., valid measures of achievement or alternative measures of teaching effectiveness) were used.

The purpose of this report is to present analysis of scores obtained from the new student ratings of instruction instrument. Without external criteria to judge the validity of these scores, such as alternative measures of teaching effectiveness or standardized measures of student achievement, judgments about the instrument must be made with data derived from the instrument. While the analyses presented below provide some insight into the behavior of scores from this instrument, better evidence for validity can only be obtained by use of external criteria.

The data used below were collected at the end of the fall 2000 semester. Members of the Student Ratings Committee asked a limited number of instructors from each college (between 2 and 4) to participate in the pilot study. A total of 24 courses were surveyed for a total of 641 student responses.

The analyses that follow are based on the premise that valid scores should behave in a predictable manner. Predictions are derived from prior research on student ratings, theory, and logic. If most of the hypotheses presented below are supported by the data, then at least partial evidence for the validity of scores from the new ratings instrument will be established.

1. Distinction between Course/Student and Instructor/Instruction Items

The two primary areas assessed by the new instrument are Course/Student and Instructor/Instruction. In the first section students are asked to rate the degree to which they participated in learning and to which the course was challenging and stimulating. In the Instructor/Instruction section, students rate various characteristics of the instructor and instruction. One of the reasons for including items to assess Course/Student was to allow students to express views about the course and thus prevent the possible confounding of course and instruction assessments. Thus, the important point here is that Course/Student refers to course specific or student specific characteristics, and these characteristics should be distinguishable from Instructor/Instruction characteristics.

Hypothesis: If students can distinguish between Course/Student characteristics (items 1 through 7) and Instruction/Instructor characteristics (items 8 through 18), then scores from these items should form two distinct clusters.

To test this hypothesis, scores from each of the 7 items (1 through 7) in Course/Student and from each of the 11 items (8 through 18) in Instruction/Instruction were analyzed with factor analysis. If these two components of this instrument do produce distinct scores, then at least two factors should emerge from the data.

Analysis

Several differ approaches were used for extraction of factors including principle axis and maximum likelihood. Principal components analysis was also used. Rotation of factors was achieved in two ways, varimax and oblique. No matter which method employed, only two clear factors emerged. The eigenvalues were 7.6 and 3.5. The next largest eigenvalue was 0.96, which represents a distinct drop, thus two factors appear appropriate.

Results from the various extraction and rotation methods were nearly identical, so the results of principal axis factoring with varimax rotation are presented in Table 1. As noted above, 7 items were designed to assess Course/Student aspects and 11 to assess Instruction/Instructor. Of the 7 Course/Student items, 5 formed a clean, second factor. These items were 1 (student effort), 3 (intellectual challenge), 4 (seek outside help), 5 (course difficulty), and 6 (course workload). The only two items that did not group with these were 2 (how much did you learn) and 7 (overall rating of course). Note that item 2, while loading on the factor for Instruction/Instructor, had the weakest loading of all items on Instruction/Instructor factor. All of the 11 Instructor/Instruction items grouped together to form factor 1 and all displayed relatively strong loadings with the weakest loading greater than .65.

It is curious that item 7 (overall rating of the course) was grouped with the Instructor/Instruction items. There are several possible explanations for this. First, only item 7 of the seven Course/Student items used the same Likert step descriptors used for the Instructor/Instruction items (very poor to very good) instead of the relative descriptors used for the other six Course/Student items (much less to much more). Second, the other Course/Student items required students to make relative assessments in comparison to other course, and only item 7 asked for an absolute assessment (very good to very poor). Third, it is possible students cannot distinguish the particular information requested of this item from Instruction/Instructor characteristics. This explanation seems to hold less merit than the first two.

Table 1. Structure Matrix of Course/Student and Instruction/Instructor Data with Varimax Rotation and Principal Axis Factoring.

one Item

 

Factors

 

1

2

V18

 

.884

 

V12

 

.838

 

V11

 

.788

 

V15

 

.780

 

V8

 

.747

 

V16

 

.740

 

V9

 

.734

 

V7

CS

.733

 

V13

 

.728

 

V10

 

.712

 

V17

 

.682

 

V14

 

.666

 

V2

CS

.611

 

V5

CS

 

.863

V3

CS

 

.777

V1

CS

 

.728

V4

CS

 

.632

V6

CS

 

.582

Note. Loadings less than .30 are not reported. Course/Student items are marked with CS.

In summary, it appears that most of the Course/Student items are perceived by students to be distinguishable from the Instruction/Instructor ratings. This suggests that the instrument provides scores that form two relatively distinct constructs.

 

2. Overall Rating of Instructor and Ratings of Instruction

Often administrative decisions regarding instruction will be decided by the use of a single score (Abrami, d’Apollonia, & Rosenfield, 1997; McKeachie, 1997a). This score can be determined in a number of ways, such as the summation of scores from multiple items, factor analytic derivations, or perhaps the use of a single overall teaching effectiveness item. If the overall instructor-rating item (number 18) is to be used as a summary measure of teaching effectiveness, it must be shown that scores from this item correspond well with multiple aspects of teaching.

Hypothesis: If the overall rating of instruction item (item 18) provides a adequate representation of teaching effectiveness as perceived by students, then scores from this item should correspond well to the other teaching items (items 8 through 17).

To address this hypothesis, results from two analyses were considered. First, simple correlations among items 8 through 18 are presented in Table 2. Second, regression results, with item 18 treated as the criterion and items 8 through 17 as the predictors, are presented in Table 3.

Table 2. Correlations among Instructor/Instruction Items (n = 634)

one Item

Description

Item

18

8

9

10

11

12

13

14

15

16

17

18

Overall Instructor Rating

---

                   

8

Stressed Points

.63

---

                 

9

Instructor’s Preparation

.61

.53

---

               

10

Encouraged Participation

.63

.55

.51

---

             

11

Organization of Material

.66

.55

.71

.53

---

           

12

Clarity of Presentation

.73

.63

.65

.60

.72

---

         

13

Graded Activities

.63

.58

.46

.47

.55

.64

---

       

14

Instructor Impartial

.59

.45

.51

.45

.56

.50

.52

---

     

15

Instructor’s Helpfulness

.74

.53

.51

.64

.57

.60

.59

.64

---

   

16

Focused on Objectives

.62

.52

.57

.50

.64

.61

.53

.49

.58

---

 

17

Instructor’s Interest in Content

.61

.49

.57

.54

.53

.53

.43

.49

.54

.50

---

Note. All correlations statistically significant at .01 level.

The correlations in Table 2 demonstrate strong relationships between overall instructor rating and each of the instructor/instruction items with the weakest correlation at .59 and the strongest at .74. None of other items demonstrated a pattern of correlations with the remaining items that is as strong as the correlations for item 18.

Results of the regression (Table 3, below) show that the instructor/instruction items (8 through 17) predict about 73% of the variance in overall rating, with a multiple R of .86. The best predictors of overall rating appear to be (a) instructor’s helpfulness (item 15), (b) clarity of presentation (item 12), (c) instructor’s interest in content of course (item 17), and (d) degree to which important points were stressed (item 8).

Table 3. Regression of Overall Rating on Instructor/Instructional Items

Item

B

Std. Error

Beta

T

p-value

R2 Change

8

.11*

.035

.092

3.103

.002

.004

9

.05

.037

.046

1.393

.164

.001

10

.06

.032

.060

2.000

.046

.001

11

.03

.037

.030

.849

.396

.001

12

.23*

.035

.242

6.595

.000

.020

13

.08

.031

.074

2.438

.015

.003

14

.05

.031

.043

1.468

.143

.001

15

.31*

.034

.306

9.035

.000

.035

16

.06

.031

.058

1.938

.053

.001

17

.15*

.037

.114

4.015

.000

.007

Intercept

-.37*

.136

 

-2.740

.006

 

R = .86, R2 = .73, Adjusted R2 = .73, SEE = 0.54, n = 634 * p < .01

 

3. Prior Course Subject Interest and Overall Ratings of Instructor

Two items on the instrument assess students’ level of interest in the subject matter of the course prior to enrolling (item 21) and at the end of the course (item 22). Some researchers of students’ ratings argue that an important outcome of effective instruction is increased interest among students in content covered in a course (Abrami, d’Apollonia, & Rosenfield, 1997). If this is the case, then prior subject interest should show a much weaker relationship with ratings of the instructor than post-course subject interest.

Hypothesis: The relationship between prior subject interest (item 21) and overall rating of the instructor (item 18) should be weaker than the relationship between post subject interest (item 22) and overall rating of the instructor.

Correlations among items 18, 21, and 22 are presented in Table 4. Figure 1 (next page) also shows a plotting of the means for item 18 for each level of items 21 and 22.

Table 4. Correlations Among Items 18, 21, and 22

Items

18 Overall Rating

21 Prior Interest

22 Post Interest

18 Overall Rating

---

   

21 Prior Interest

.09*

---

 

22 Post Interest

.54*

.49*

---

* p < .05

As both Table 4 and Figure 1 show, interest in the subject matter after taking the course has a much stronger, positive relationship with instructor rating than does prior subject interest. This could be interpreted to mean that better instructors are able to generate more interest in course content among students than are weaker instructors.

 

Figure 1

 

4. Instructor Reputation and Student Ratings of Instruction

Research has shown that the reputation of an instructor can positively or negatively influence judgments students make about that instructor (Brady, 1994; Feldman & Prohaska, 1979; Kelley, 1950; McClelland, 1970; Perry, Abrami, Leventhal, & Check, 1979; Perry, Niemi, & Jones, 1974; Widmeyer & Loy, 1988). It is unclear how or whether reputation of a course influences student ratings of the instructor.

Hypothesis: (a) Overall ratings of the instructor should be lowest for students claiming to have heard negative information about the instructor prior to enrolling in the course, and highest for students claiming to have heard positive information about the instructor prior to enrolling in the course. (b) Overall ratings of the instructor should either be unrelated to reputation of the course, or if a pattern does emerge, the relationship should be weaker than the relationship between instructor reputation and ratings of the instructor.

Results of the ANOVAs are presented in Tables 5 and 6.

Table 5. Mean levels of Overall Instructor Rating by Categories of Instructor Reputation

Categories of Reputation

Mean

SD

N

Mostly Negative

3.16

1.17

32

Mixed or Neutral

3.57

1.04

88

Mostly Positive

4.33

.81

139

Heard Nothing

4.09

1.04

376

Note. ANOVA F = 18.97, df = 3,631, MSE = 1.004,
          p < .01, Eta2 = .083

Table 6. Mean levels of Overall Instructor Rating by Categories of Course Reputation

Categories of Reputation

Mean

SD

N

Mostly Negative

3.69

1.08

95

Mixed or Neutral

3.85

1.01

210

Mostly Positive

4.28

.98

130

Heard Nothing

4.20

1.03

200

Note. ANOVA F = 10.06, df = 3,631, MSE = 1.05,
          p < .01, Eta2 = .046

Scores from the overall instructor rating (item 18) item forms a clear pattern among the means for both instructor reputation (item 19) and course reputation (item 20). In general, the lowest mean rating occurs for the "mostly negative" group and the highest rating for the "mostly positive" group for both instructor and course reputation. It appears that overall rating of the instructor is related to both types of reputation, but the association is stronger for instructor reputation than for course reputation as indicated by Eta2 (.083 vs. .046) and differences between "mostly negative" and "mostly positive" group means (4.33-3.16 = 1.17 vs. 4.28 — 3.69 = 0.59). This pattern is consistent with the hypothesis.

 

5. Perceived Amount Learned and Ratings of Instruction

Multi-section validity studies are considered by many to be one of the best designs for evaluating the validity of student ratings (Abrami, d’Apollonia, & Rosenfield, 1997; Marsh & Dunkin, 1997). Student learning is the criterion by which ratings of instruction are evaluated in these designs. One should expect that better instructors tend to produce students with more knowledge, and many instructors consider learning or amount learned to be the "true" measure of good teaching.

While external, valid measures of student achievement are not present for the current analysis, one item was included on the instrument that requested students to assess their perceived level of learning in the course (item 2). This item will serve as a proxy for a more objective measure of achievement.

Hypothesis: A positive correlation should be expected between perceived amount learned (item 2) and each item for instructor/instruction.

Correlations for amount learned (item 2) and all instructor/instruction items were calculated and are presented in Table 7.

Table 7. Correlations among Instructor/Instruction Items (n = 634)

drbItem

Description

Item

2

8

9

10

11

12

13

14

15

16

17

18

2

Perceived Amount Learned

---

                     

8

Stressed Points

.51

---

                   

9

Instructor’s Preparation

.37

.53

---

                 

10

Encouraged Participation

.41

.55

.51

---

               

11

Organization of Material

.40

.55

.71

.53

---

             

12

Clarity of Presentation

.48

.63

.65

.60

.72

---

           

13

Graded Activities

.44

.58

.46

.47

.55

.64

---

         

14

Instructor Impartial

.32

.45

.51

.45

.56

.50

.52

---

       

15

Instructor’s Helpfulness

.41

.53

.51

.64

.57

.60

.59

.64

---

     

16

Focused on Objectives

.44

.52

.57

.50

.64

.61

.53

.49

.58

---

   

17

Instructor’s Interest in Content

.39

.49

.57

.54

.53

.53

.43

.49

.54

.50

---

 

18

Overall Instructor Rating

.53

.63

.61

.63

.66

.73

.63

.59

.74

.62

.61

---

Note. All correlations statistically significant at the .01 level.

All correlations are positive and statistically significant at the .01 level. The five items with the strongest correlations with perceived amount learned, in rank order, are:

  1. Overall instructor rating (r = .53)
  2. Important points stressed in class (r = .51)
  3. Clarity of presentation of course material (r = .48)
  4. Graded activities covered content taught (r = .44)
  5. Class stayed focused on course objectives (r = .44)

As the above list shows, overall instructor rating demonstrates the strongest correlation with perceived amount learned.


Conclusion

Results of the analyses provided here show evidence that scores obtained from the instructor/instruction items behave in a consistent manner with expectations. While the analyses and findings here are not the best indicators of validity for these scores, these results nevertheless suggest that ratings provided by the revised instrument behave much like ratings from other instruments that have been assessed and documented more thoroughly in the literature on student ratings of instruction. In sum, the preliminary results found here indicate some degree of validity for scores derived from the instructor/instruction items.


References

Abrami, P. C., d'Apollonia, S., & Rosenfield, S. (1997). The dimensionality of student ratings of instruction: What we know and what we do not. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 321-367). New York: Agathon.

Feldman, R. S., & Prohaska, T. (1979). The student as Pygmalion: Effect of student expectation on the teacher. Journal of Educational Psychology, 71, 485-493.

Kelley, H. H. (1950). The warm-cold variable in first impressions of persons. Journal of Personality, 18, 431-439.

Marsh, H. W., & Dunkin, M. J. (1997). Students' evaluations of university teaching: A multidimensional perspective. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 264-320). New York: Agathon.

Marsh, H. W., & Overall, J. U. (1979). Validity of students' evaluations of teaching: A

McClelland, J. N. (1970). The effect of student evaluations of college instruction upon subsequent evaluations. California Journal of Educational Research, 21, 88-95.

McKeachie, W. J. (1997a). Good teaching makes a difference–and we know what it is. In R. P. Perry & J. C. Smart (Eds.), Effective Teaching in Higher Education: Research and Practice (pp. 396-408). New York: Agathon.

McKeachie, W. J. (1997b). Student ratings: The validity of use. American Psychologist, 52, 1219-1225.

Perry, R. P., Abrami, P. C., Leventhal, L., & Check, J. (1979). Instructor reputation: An expectancy relationship involving student ratings and achievement. Journal of Educational Psychology, 71, 776-787.

Perry, R. P., Niemi, R. P., & Jones, K. (1974). Effect of prior teaching evaluations and lecture presentation on ratings of teaching performance. Journal of Educational Psychology, 66, 851-856.

Widmeyer, W. N., & Loy, J. W. (1988). When you're hot, you're hot! Warm-cold effects in first impressions of persons and teaching effectiveness. Journal of Educational Psychology, 80, 118-121


| Organization | Instruction | Students | Scholarship | Service | Faculty Personnel |
| Policies | Searches | Legal | Financial | News | Comments |

Last updated 1/24/01. This page has been accessed [an error occurred while processing this directive] times.