
Updated April, 2006
Guidelines ... for using
Student Input to Teaching Evaluation (SITE)
1997 Site Committee Report
Provided by the Faculty Center for Excellence in Teaching
Table of Contents
Additional Procedures as of 11/07/02
FaCET Home | Workshops | Teaching Funds | Teaching Tools | Instructor Group | Checkout | Use @ the Center | Support FaCET | About Us | WKU
Since this document was created in 1997 by a university committee (see end of report), some changes in SITE procedure have been introduced.
In addition, the variety of types of courses (e.g., web-based) has increased with concomitant adjustments in evaluation procedures. We decided to provide the original 1997 document with changes indicated and a superscript keyed to information about the changes (see footnotes). Changes may include deletions (indicated by strike-outs), or insertions (indicated by brackets).
Descriptions of additional procedural issues (not part of the committee report) for particular types of classes are found on the last few pages of this booklet.
- Center for Teaching and Learning
(renamed as Faculty Center for Excellence in Teaching)
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
The Student Input to Teaching Evaluation (SITE) questionnaire is given for the
purpose of obtaining general information on student perceptions of faculty teaching.
The department head is responsible for assuring a process whereby a disinterested
third party will administer the questionnaire to students. The student rating
sheets are forwarded to Office of Institutional Research and are machine scored.
Faculty members will receive summary results relating to university and departmental
core items, individually selected items2, and transcripts of any student
comments. (Comments are being transcribed at the request of various student
representatives to past committees on evaluation). Department heads will only
receive summary results relating to university and departmental core items and
student comments. Deans, the Vice-President for Academic Affairs and the President
may request appraisal materials from the department heads. Individual faculty
will have access to a record describing such a request.
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
The SITE questionnaire serves
two purposes. The first is as an administrative tool to assist in annual or
periodic personnel actions. For this purpose faculty may be compared to one
another. (The following section provides general information on the most scientifically
and legally appropriate way to accomplish this).
The second purpose is for personal improvement in the classroom. Under this
circumstance, the individual faculty member may select or generate an additional
set of items pertinent to personal goals in the classroom.2
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
The SITE is the product of periodic review by faculty committees with student representatives. The original source of questions was a pool of items from Purdue. The pool was originally selected in 1977 because it was the most flexible instrument available. The system was reviewed again in the 1980s and again selected. At that time the tool was to be used for personal development, not administrative evaluation. Another committee recommended it again a few years later, finding no other system to be any better. The latest committee review [1997] was done in response to the New Level document recommendations. Of seventeen benchmark and Kentucky schools contacted in 1996, the vast majority designed their own tool. A check of the Mental Measurement Yearbook, a review source for tests, revealed no comparable instrument with established validity and reliability. The instrument will be evaluated periodically and modified if needed to improve its psychometric qualities.
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
The SITE may routinely include
up to four [two]2 sets of items:
1. A required common group of items is included on all questionnaires (see Fig.
1) These were selected by a representative faculty committee.
2. Each department may generate a set of items for its faculty.
3. There are optional items that a faculty member may choose to include.
A list of these items is available from the department head. Faculty may want
to keep a list of the identifying numbers for their favorite choices to make
preparing the request form easier from year to year. The request form is sent
to faculty early in the fall semester. 2
4. Faculty may also create their own items. To do so, select items 193, 194,
and 195 from the master list. On the day of the evaluation provide students
with three questions numbered 1, 2 and 3. 2
My instructor displays a clear understanding of course topics. |
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
The appropriate evaluation of teaching should include multiple measures, in addition to student ratings, in order to effectively measure the complex act of teaching and to avoid biases inherent in any single measure. For example, additional measures could be samples of syllabi or exams, self or peer observations, knowledge of the field, course decisions, long term outcomes, etc. Students as a group, however, experience the instructor most directly for the greatest amount of time and have important information to contribute to the complex task of evaluating teaching.
According to reliability
data collected at Western, student ratings tend to be highly consistent over
time. (Budget & Management Information Office Office of Institutional
Research)3 However, the standard error of measurement of the instrument is such
that small differences between individuals are meaningless. Ratings are useful
for identifying the general cluster of teachers who are perceived by students
as effective or ineffective, but finer distinctions are not appropriate. For
example, there is no meaningful difference between a 4.23 and a 3.94 based upon
variability displayed on prior assessments at WKU. Variables such as student
motivation for the class and the amount of variability in the student responses
should be considered. It is inappropriate and inaccurate to rank order faculty
by student rating and assign dollar amounts by fine gradations in order. The
psychometric qualities of the scale do not support that use. The 1995 New Level
committee on student ratings recommended that the administration receive appropriate
and ongoing training in the interpretation of teaching evaluation measures to
aid them in appropriately evaluating faculty. [For an explanation of the mathematical
terms used on a SITE report, see section on “Understanding Terms.”]
The number of raters can
alter the interpretation of ratings. Cashin (1995) recommended when fewer than
10 students responded to an item, any interpretation should be made with particular
caution. When used as a part of personnel decisions, Cashin recommended ratings
be used from 5 courses including 2 or more courses (with at least 15 students
responding in each) from each type of term over at least 2 years.
(Cashin, William E., Student Ratings of Teaching: The Research Revisited. Idea
Paper No. 32, a publication of Center for Faculty Evaluation & Development,
Kansas State University, September 1995.)
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
Figure 2 below contains an example of the summary statistics associated with the ratings on a particular item. When faced with a group of numbers from a measure, one of the most reasonable first steps is to try to describe the collective characteristics of the numbers. One of the first questions you might ask is, “What is a typical number?” But what does “typical” mean? It could be the average number, more precisely known as the mean (the third column from the end). The mean is obtained by summing all the numbers and dividing by the total number of numbers. The mean has several advantages (e.g., every number is included in the calculation) but it has a significant disadvantage. The mean is very sensitive to extreme scores. One very atypical score can substantially alter the mean. For the numbers 1,5,5,5,5,5,5,5 the mean is 4.5.
Figure 2. Example of Item Summary
Statistics
SA A N D SD Lower Mean Upper Std.
(5) (4) (3) (2) (1) Median Bound of Item Bound Dev.
--- --- --- --- --- --------- -------- -------- -------- --------
20 3 1 0 0 4.90 4.63 4.79 4.95 0.51
83.3 12.5% 4.2% 0.0% 0.0%
The median is another way to describe what is typical. It is not as sensitive
to extremes. To calculate the median you find the point at which half the scores
fall above and half fall below. For example, for the set 1,2,4,5 the median
is 3 (even though 3 does not appear in the set). Two numbers fall above the
3 and two fall below. When there are duplicate cases (for example, 20 “5s”)
the computations become more complex. If you are interested in further information
there are a number of persons on campus who work with statistics who could provide
assistance.
Under ideal circumstances, the mean and median are identical. When you have
more numbers at one extreme or the other, then the measures of central tendency
will be different. When the mean is lower than the median, a few atypically
low numbers might be pulling the mean out of balance. When the mean is higher,
then a few atypically high ratings are having an effect.
The second question you might ask
about your numbers is “How variable are the numbers?” You may have two sets
of numbers with identical means and medians (e.g., 6,6,6 and 1,6,11) but the
first set is very uniform and the second is quite variable. By itself, neither
the mean nor median fully describes these sets of numbers. To describe variability
a measure known as the standard deviation (std.dev.) is used. It is essentially
the average (standard) distance (deviation) that numbers are from the mean.
In the first set of 3 numbers, the distance is small and the standard deviation
will be small. In the second set, the sd will be large. Another way to think
of the sd is how far from the mean the scores are spread. A small number indicates
they are close together, a large number indicates scores are spread out and
away from the mean.
For interpreting student ratings,
the implication of a small sd is that the raters agreed— there was not much
variation from one rater to the next. For a large sd the raters disagreed— perhaps
the instructor appeals strongly to one type of student and that same style is
aversive to another type of student.
Two other terms are used: Lower
Bound and Upper Bound. These are the ends (range) of the confidence interval.
The confidence interval is centered on the mean. Every measurement has error
in it. These boundaries are a way of expressing how much error is in a particular
measure. A wide range from lower to upper bound indicates more error; a narrower
range means less. The bounds are calculated using the Standard Error of Measurement
(SEM) which is based on the reliability estimate (an index of consistency in
the questionnaire) and the standard deviation for the item. In combination,
these terms estimate the upper and lower limits of the confidence interval.
The practical implication of a confidence interval is that, for different faculty
members within the same department, ratings that overlap within the confidence
interval are statistically equivalent. If the range was 4.2 to 4.5 and Terry
received a 4.2 mean rating while Leslie received a 4.5, they should be treated
as equivalent. As a rough indicator, a difference between any two ratings of
less than .3 (4.5 - 4.2) is not a “real” difference based upon data collected
at Western. (Budget & Management Information Office Office of Institutional
Research.)3 . The difference between instructors is likely to evaporate when
assessed again. Use the range calculated for your unit.
One other caution: simply subtracting
each faculty member’s mean rating from another faculty member’s rating is not
justifiable, statistically. Large differences will occur by chance and are not
necessarily meaningful. For best use in evaluation, a teaching rating should
be considered as a gross measure over time and situations for an individual
person in light of other information about performance. Simple comparisons are
not justified.
You are likely to hear two other terms with regard to ratings: reliability and validity. Reliability refers to the consistency of measurement. There are several types of reliability. The confidence interval is one way of estimating reliability of measurement. Validity asks whether the rating is actually measuring what it claims to measure. This is a much more difficult question to answer. A student rating is only one piece of information about the complex act of teaching which is why a number of sources need to be used to evaluate teaching effectiveness.
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
Individual faculty may wish to use a report form similar to the following to provide additional information to administration in the interpretation of student ratings in any particular semester. Individuals may also find the form useful for personal development. This form, which may be copied, is based on one in Fink, L. “Improving the Evaluation of College Teaching,” from Wadsworth, E., (Ed.), A Handbook for New Practitioners. The Professional & Organizational Development (POD) Network in Higher Education, 1988, and we further alter it here for our needs.
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
Professor_____________________ Term____________________
Course ______________________ Enrollment _______________
Factors
1. The quality of the students in this course this semester was:
(Circle One) Excellent Good Average Fair Poor
Comments:
2. With what level of motivation did students begin the term (e.g., was this a required course?)
3. What is your honest assessment of your own effectiveness as a teacher in
this course? Were there any personal or professional situations that significantly
affected your performance?
4. Were there any other factors (positive or negative) that affected either
the effectiveness of the course or your performance as a teacher (i.e., new
textbook, new objectives, etc.)?
General
A. My general assessment of my teaching in this course, compared to other courses
I have taught is:
(Circle one) Excellent Good Average Fair Poor
Comments:
B. The grade distribution for this
course was:
A__ B__ C__ D__ F__ FN__ P__ X__ W__ AU__ NG__
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
Thanks are expressed to
the following individuals who have contributed editing, oversight, or review
of this document at various stages in its development:
John Bruni
Barbara Burch
Cathy Carey
Robert Cobb
Patricia Daniel
Carol Graham
Wayne Hoffman
Marvin Leavy
M.B. Lucas
Wayne Mason
Carl Martray
Joe Millichap
John Petersen
Betsy Shoenfelt
Larry Snyder
Jeffrey Yan, student representative
—Sally Kuhlenschmidt
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
Face-to-Face
Classes
The Office of Institutional Research manages the distribution and scoring of
SITE forms. If there are fewer than 5 enrollments in a class roughly two months
before the end of the term (when the course list officially is closed) then
OIR does not collect SITES. There are exceptions based on the request of the
department head. These are mostly ITV sections. Original forms and statistical
databases are retained for only 60 days following delivery of the reports in
case of questions or requests. The data is then destroyed.
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
Starting in Summer, 2002, the Correspondence office administers an internal evaluation of 4 questions about the instructor, 5 about course content and 3 about staff and office materials and a place for additional comments. It is given to students during the final examination by whoever the proctor is. The pertinent part of the information will be forwarded to instructors. To view a copy of the questionnaire, ask the Correspondence Office (4158).
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
Totally web-delivered classes
In July 2000, the ReachU committee
approved the web-delivered course evaluation form and process developed by the
Student Rating Instrument subcommittee (Bob Berkhofer, Allan Heaps, Sally Kuhlenschmidt,
Leroy Metze, John Stallard, Linda Todd Chair, Carol Wilson). Academic Technology
devised an electronic submission process that was piloted in Fall 2000.
Academic Technology offers online student ratings instruments to all students in all courses designated as web courses on Topnet. Faculty members go to: http://atech.wku.edu/sri/instructor/. Faculty will receive an emailed notice during the semester detailing how they can add personalized questions to the standard instruments. The link to the each instrument is emailed to the students, who can click on the link and respond. Students can write in their comments on the web form, just as they do for the SITE in face-to-face classes. After grades have been turned in faculty are again emailed directions on how to access the instrument results.
Just as for the SITE, the information is not maintained beyond the 60 day period after e-mail notification of availability to faculty and department heads. Faculty and department heads will be expected to maintain copies printed from the web site.
The items on the web version are:
Additional Caution
There is no data at this time to
support validity of inferences made from direct comparison of student ratings
across the different assessment tools used in face-to-face versus web-delivered
courses. It would be questionable at best to rank order faculty by comparing
student ratings on these different instruments. As should be true across the
variety of face-to-face instructional activities, evaluation of web-delivered
instruction will entail professional judgement and consideration of factors
such as the nature of the students and the particular challenges of the various
teaching tasks for that course. It is recommended that multiple sources of information
be considered such as instructor responsiveness to students, the nature of exams,
and the clarity of learning objectives.
| FaCET Homepage | WKU Homepage | Top of Page | Booklet Index |
1This item was recommended by the
Ethnic Relations Task Force Plan of Action submitted to President Ransdell in
December 1998. The Task Force composed the new item and it was added to the
SITE in Fall of 1999 based on their recommendation (made May 14, 1999). The
current (Fall 2002) iteration of this committee is University Diversity Advisory
Committee and is appointed by President Ransdell. (Information provided by John
Hardin, Co-Chair of UDA committee).
2These changes relate to a request by the Office of Institutional Research with the concurrence of the Council of Academic Deans. As of Fall 2000 the decision was to drop optional items because of a change in the hardware (the end of the IBM mainframes) and capacity to manage the cafeteria items and use of the optional items was low and decreasing over the years, particularly as departments adopted departmental core sets. The last year less than 2% of sections were using them. (Information provided by Jay Sloan of the Office of Institutional Research). Many faculty create their own forms and handle the personal development process separately from the SITE.
3Name change of unit.
| |
|
This website is in compliance with Section 508 and W3C Priority-I guidelines. If you find it to be inaccessible, please contact Webmaster. E-Mail facet@wku.edu -- Phone (270) 745-6508 -- Fax (270) 745-6145. Write to the Center for Teaching & Learning, 1 Big Red Way, Bowling Green, KY 42101-3576 Last Modified September 29, 2006. All Contents Copyright © 2000, Site created July 1996 Western Kentucky University |