Faculty Center for Excellence in Teaching

Guidelines for using "Student Input to Teaching Evaluation" (SITE)

 

1997 Site Committee Report

Provided by the Faculty Center for Excellence in Teaching

Table of Contents

Additional Procedures as of 11/07/02

 


 

Preface to SITE booklet

Since this document was created in 1997 by a university committee (see end of report), some changes in SITE procedure have been introduced.

In addition, the variety of types of courses (e.g., web-based) has increased with concomitant adjustments in evaluation procedures. We decided to provide the original 1997 document with changes indicated and a superscript keyed to information about the changes (see footnotes). Changes may include deletions (indicated by strike-outs), or insertions (indicated by brackets).

Descriptions of additional procedural issues (not part of the committee report) for particular types of classes are found on the last few pages of this booklet.

- Center for Teaching and Learning (renamed as Faculty Center for Excellence in Teaching)

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Description


The Student Input to Teaching Evaluation (SITE) questionnaire is given for the purpose of obtaining general information on student perceptions of faculty teaching. The department head is responsible for assuring a process whereby a disinterested third party will administer the questionnaire to students. The student rating sheets are forwarded to Office of Institutional Research and are machine scored.
Faculty members will receive summary results relating to university and departmental core items, individually selected items2, and transcripts of any student comments. (Comments are being transcribed at the request of various student representatives to past committees on evaluation). Department heads will only receive summary results relating to university and departmental core items and student comments. Deans, the Vice-President for Academic Affairs and the President may request appraisal materials from the department heads. Individual faculty will have access to a record describing such a request.

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Use of the SITE

The SITE questionnaire serves two purposes. The first is as an administrative tool to assist in annual or periodic personnel actions. For this purpose faculty may be compared to one another. (The following section provides general information on the most scientifically and legally appropriate way to accomplish this).
The second purpose is for personal improvement in the classroom. Under this circumstance, the individual faculty member may select or generate an additional set of items pertinent to personal goals in the classroom.2

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Development of the SITE

The SITE is the product of periodic review by faculty committees with student representatives. The original source of questions was a pool of items from Purdue. The pool was originally selected in 1977 because it was the most flexible instrument available. The system was reviewed again in the 1980s and again selected. At that time the tool was to be used for personal development, not administrative evaluation. Another committee recommended it again a few years later, finding no other system to be any better. The latest committee review [1997] was done in response to the New Level document recommendations. Of seventeen benchmark and Kentucky schools contacted in 1996, the vast majority designed their own tool. A check of the Mental Measurement Yearbook, a review source for tests, revealed no comparable instrument with established validity and reliability. The instrument will be evaluated periodically and modified if needed to improve its psychometric qualities.

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Components of the SITE

The SITE may routinely include up to four [two]2 sets of items:
1. A required common group of items is included on all questionnaires (see Fig. 1) These were selected by a representative faculty committee.
2. Each department may generate a set of items for its faculty.
3. There are optional items that a faculty member may choose to include. A list of these items is available from the department head. Faculty may want to keep a list of the identifying numbers for their favorite choices to make preparing the request form easier from year to year. The request form is sent to faculty early in the fall semester. 2
4. Faculty may also create their own items. To do so, select items 193, 194, and 195 from the master list. On the day of the evaluation provide students with three questions numbered 1, 2 and 3. 2

Figure 1: Items Included on All SITE Questionnaires

My instructor displays a clear understanding of course topics.
My instructor displays interest in teaching this class.
My instructor is well prepared for class.
Performance measures (exams, assignments, etc.) are well-construted.
My instructor is actively helpful.
Overall, my instructor is effective.
[My instructor treats me fairly with regard to race, age, sex, religion,
national origin, disability, and sexual orientation.]1

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Interpretation

The appropriate evaluation of teaching should include multiple measures, in addition to student ratings, in order to effectively measure the complex act of teaching and to avoid biases inherent in any single measure. For example, additional measures could be samples of syllabi or exams, self or peer observations, knowledge of the field, course decisions, long term outcomes, etc. Students as a group, however, experience the instructor most directly for the greatest amount of time and have important information to contribute to the complex task of evaluating teaching.

According to reliability data collected at Western, student ratings tend to be highly consistent over time. (Budget & Management Information Office Office of Institutional Research)3 However, the standard error of measurement of the instrument is such that small differences between individuals are meaningless. Ratings are useful for identifying the general cluster of teachers who are perceived by students as effective or ineffective, but finer distinctions are not appropriate. For example, there is no meaningful difference between a 4.23 and a 3.94 based upon variability displayed on prior assessments at WKU. Variables such as student motivation for the class and the amount of variability in the student responses should be considered. It is inappropriate and inaccurate to rank order faculty by student rating and assign dollar amounts by fine gradations in order. The psychometric qualities of the scale do not support that use. The 1995 New Level committee on student ratings recommended that the administration receive appropriate and ongoing training in the interpretation of teaching evaluation measures to aid them in appropriately evaluating faculty. [For an explanation of the mathematical terms used on a SITE report, see section on “Understanding Terms.”]

The number of raters can alter the interpretation of ratings. Cashin (1995) recommended when fewer than 10 students responded to an item, any interpretation should be made with particular caution. When used as a part of personnel decisions, Cashin recommended ratings be used from 5 courses including 2 or more courses (with at least 15 students responding in each) from each type of term over at least 2 years.
(Cashin, William E., Student Ratings of Teaching: The Research Revisited. Idea Paper No. 32, a publication of Center for Faculty Evaluation & Development, Kansas State University, September 1995.)

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


Understanding Terms

Figure 2 below contains an example of the summary statistics associated with the ratings on a particular item. When faced with a group of numbers from a measure, one of the most reasonable first steps is to try to describe the collective characteristics of the numbers. One of the first questions you might ask is, “What is a typical number?” But what does “typical” mean? It could be the average number, more precisely known as the mean (the third column from the end). The mean is obtained by summing all the numbers and dividing by the total number of numbers. The mean has several advantages (e.g., every number is included in the calculation) but it has a significant disadvantage. The mean is very sensitive to extreme scores. One very atypical score can substantially alter the mean. For the numbers 1,5,5,5,5,5,5,5 the mean is 4.5.

Figure 2. Example of Item Summary Statistics
SA A N D SD Lower Mean Upper Std.
(5) (4) (3) (2) (1) Median Bound of Item Bound Dev.
--- --- --- --- --- --------- -------- -------- -------- --------
20 3 1 0 0 4.90 4.63 4.79 4.95 0.51
83.3 12.5% 4.2% 0.0% 0.0%


The median is another way to describe what is typical. It is not as sensitive to extremes. To calculate the median you find the point at which half the scores fall above and half fall below. For example, for the set 1,2,4,5 the median is 3 (even though 3 does not appear in the set). Two numbers fall above the 3 and two fall below. When there are duplicate cases (for example, 20 “5s”) the computations become more complex. If you are interested in further information there are a number of persons on campus who work with statistics who could provide assistance.


Under ideal circumstances, the mean and median are identical. When you have more numbers at one extreme or the other, then the measures of central tendency will be different. When the mean is lower than the median, a few atypically low numbers might be pulling the mean out of balance. When the mean is higher, then a few atypically high ratings are having an effect.

The second question you might ask about your numbers is “How variable are the numbers?” You may have two sets of numbers with identical means and medians (e.g., 6,6,6 and 1,6,11) but the first set is very uniform and the second is quite variable. By itself, neither the mean nor median fully describes these sets of numbers. To describe variability a measure known as the standard deviation (std.dev.) is used. It is essentially the average (standard) distance (deviation) that numbers are from the mean. In the first set of 3 numbers, the distance is small and the standard deviation will be small. In the second set, the sd will be large. Another way to think of the sd is how far from the mean the scores are spread. A small number indicates they are close together, a large number indicates scores are spread out and away from the mean.

For interpreting student ratings, the implication of a small sd is that the raters agreed— there was not much variation from one rater to the next. For a large sd the raters disagreed— perhaps the instructor appeals strongly to one type of student and that same style is aversive to another type of student.

Two other terms are used: Lower Bound and Upper Bound. These are the ends (range) of the confidence interval. The confidence interval is centered on the mean. Every measurement has error in it. These boundaries are a way of expressing how much error is in a particular measure. A wide range from lower to upper bound indicates more error; a narrower range means less. The bounds are calculated using the Standard Error of Measurement (SEM) which is based on the reliability estimate (an index of consistency in the questionnaire) and the standard deviation for the item. In combination, these terms estimate the upper and lower limits of the confidence interval. The practical implication of a confidence interval is that, for different faculty members within the same department, ratings that overlap within the confidence interval are statistically equivalent. If the range was 4.2 to 4.5 and Terry received a 4.2 mean rating while Leslie received a 4.5, they should be treated as equivalent. As a rough indicator, a difference between any two ratings of less than .3 (4.5 - 4.2) is not a “real” difference based upon data collected at Western. (Budget & Management Information Office Office of Institutional Research.)3 . The difference between instructors is likely to evaporate when assessed again. Use the range calculated for your unit.

One other caution: simply subtracting each faculty member's mean rating from another faculty member's rating is not justifiable, statistically. Large differences will occur by chance and are not necessarily meaningful. For best use in evaluation, a teaching rating should be considered as a gross measure over time and situations for an individual person in light of other information about performance. Simple comparisons are not justified.

You are likely to hear two other terms with regard to ratings: reliability and validity. Reliability refers to the consistency of measurement. There are several types of reliability. The confidence interval is one way of estimating reliability of measurement. Validity asks whether the rating is actually measuring what it claims to measure. This is a much more difficult question to answer. A student rating is only one piece of information about the complex act of teaching which is why a number of sources need to be used to evaluate teaching effectiveness.

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Example of a Self-Report Form:

Individual faculty may wish to use a report form similar to the following to provide additional information to administration in the interpretation of student ratings in any particular semester. Individuals may also find the form useful for personal development. This form, which may be copied, is based on one in Fink, L. “Improving the Evaluation of College Teaching,” from Wadsworth, E., (Ed.), A Handbook for New Practitioners. The Professional & Organizational Development (POD) Network in Higher Education, 1988, and we further alter it here for our needs.

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Faculty Report on Teaching


Professor_____________________ Term____________________
Course ______________________ Enrollment _______________

Factors
1. The quality of the students in this course this semester was:
(Circle One) Excellent Good Average Fair Poor
Comments:

2. With what level of motivation did students begin the term (e.g., was this a required course?)


3. What is your honest assessment of your own effectiveness as a teacher in this course? Were there any personal or professional situations that significantly affected your performance?


4. Were there any other factors (positive or negative) that affected either the effectiveness of the course or your performance as a teacher (i.e., new textbook, new objectives, etc.)?

General
A. My general assessment of my teaching in this course, compared to other courses I have taught is:
(Circle one) Excellent Good Average Fair Poor
Comments:

B. The grade distribution for this course was:
A__ B__ C__ D__ F__ FN__ P__ X__ W__ AU__ NG__

 

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Thank You:

Thanks are expressed to the following individuals who have contributed editing, oversight, or review of this document at various stages in its development:
John Bruni
Barbara Burch
Cathy Carey
Robert Cobb
Patricia Daniel
Carol Graham
Wayne Hoffman
Marvin Leavy
M.B. Lucas
Wayne Mason
Carl Martray
Joe Millichap
John Petersen
Betsy Shoenfelt
Larry Snyder
Jeffrey Yan, student representative
—Sally Kuhlenschmidt

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Additional Procedures as Administered as of 11/07/02.

Face-to-Face Classes
The Office of Institutional Research manages the distribution and scoring of SITE forms. If there are fewer than 5 enrollments in a class roughly two months before the end of the term (when the course list officially is closed) then OIR does not collect SITES. There are exceptions based on the request of the department head. These are mostly ITV sections. Original forms and statistical databases are retained for only 60 days following delivery of the reports in case of questions or requests. The data is then destroyed.

 

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Extended Campus Instructors
SITE evaluations are officially sent to the department head, not to the Extended Campus director. There may be sharing of information with the Extended Campus director under certain circumstances. --Department Head Retreat Summer 2002.

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

ITV Classrooms
The Bowling Green Sections follow the same procedures as other on campus classes. For off-campus sections, SITE forms (identical to the on campus SITE) are distributed through the Extended Campus offices.

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Correspondence Classes

Starting in Summer, 2002, the Correspondence office administers an internal evaluation of 4 questions about the instructor, 5 about course content and 3 about staff and office materials and a place for additional comments. It is given to students during the final examination by whoever the proctor is. The pertinent part of the information will be forwarded to instructors. To view a copy of the questionnaire, ask the Correspondence Office (4158).

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |


 

Totally web-delivered classes


In July 2000, the ReachU committee approved the web-delivered course evaluation form and process developed by the Student Rating Instrument subcommittee (Bob Berkhofer, Allan Heaps, Sally Kuhlenschmidt, Leroy Metze, John Stallard, Linda Todd Chair, Carol Wilson). Academic Technology devised an electronic submission process that was piloted in Fall 2000.

Academic Technology offers online student ratings instruments to all students in all courses designated as web courses on Topnet. Faculty members go to: http://atech.wku.edu/sri/instructor/. Faculty will receive an emailed notice during the semester detailing how they can add personalized questions to the standard instruments. The link to the each instrument is emailed to the students, who can click on the link and respond. Students can write in their comments on the web form, just as they do for the SITE in face-to-face classes. After grades have been turned in faculty are again emailed directions on how to access the instrument results.

Just as for the SITE, the information is not maintained beyond the 60 day period after e-mail notification of availability to faculty and department heads. Faculty and department heads will be expected to maintain copies printed from the web site.

The items on the web version are:

  • Overall, my instructor is effective.
  • The instructor challenged students to learn.
  • The instructor clearly outlined the time frame to complete course activities/tasks/assignments.
  • The instructor encouraged students to be actively engaged with course materials/activities.
  • The instructor provided feedback within time frame specified in course materials.
  • The instructor treated me fairly without regard to race, age, gender, religion, or disability.\
  • The instructor was readily available for interaction with students.

Additional Caution

There is no data at this time to support validity of inferences made from direct comparison of student ratings across the different assessment tools used in face-to-face versus web-delivered courses. It would be questionable at best to rank order faculty by comparing student ratings on these different instruments. As should be true across the variety of face-to-face instructional activities, evaluation of web-delivered instruction will entail professional judgement and consideration of factors such as the nature of the students and the particular challenges of the various teaching tasks for that course. It is recommended that multiple sources of information be considered such as instructor responsiveness to students, the nature of exams, and the clarity of learning objectives.

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |



 

Footnotes


1This item was recommended by the Ethnic Relations Task Force Plan of Action submitted to President Ransdell in December 1998. The Task Force composed the new item and it was added to the SITE in Fall of 1999 based on their recommendation (made May 14, 1999). The current (Fall 2002) iteration of this committee is University Diversity Advisory Committee and is appointed by President Ransdell. (Information provided by John Hardin, Co-Chair of UDA committee).

2These changes relate to a request by the Office of Institutional Research with the concurrence of the Council of Academic Deans. As of Fall 2000 the decision was to drop optional items because of a change in the hardware (the end of the IBM mainframes) and capacity to manage the cafeteria items and use of the optional items was low and decreasing over the years, particularly as departments adopted departmental core sets. The last year less than 2% of sections were using them. (Information provided by Jay Sloan of the Office of Institutional Research). Many faculty create their own forms and handle the personal development process separately from the SITE.

 

 

| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |