COURSE INFORMATION
POLICIES OUTLINE
& CALENDER FAQ
ASSIGNMENTS
(1), (2),
(3)
/ EXAMPLES & TIPS / MINI-ASSIGNMENTS
(ungraded)
MINI-ASSIGNMENTS (see course outline)
Grades (Final Grades Now Posted)
TESTS
Spring 2005 Test 1 / Spring 2005 Test 2 /
Fall 2004 Test 1
/ Fall
2004 Test 2/ Spring 2004
Final
RELATED WEBSITES
Java Applets/
Coin
Flipping Page / American
Statistical
Association/ Journal
of Statistics Education/
Dr. Brian Goff/414 Grise Hall
Phone (270)745-3855/brian.goff@wku.edu
Last Modified: January 4, 2005
Western Kentucky University
CONTACT INFORMATION
Dr. Brian Goff Grise 414/745-3855
Email: brian.goff@wku.edu / Office Hours for Spring 2005: T 10-11:30;
WTh 8:30-10:30; MTW 2-4
(Appointments & drop-ins welcome at other times)
RESOURCES
Hyperstats (online textbook
by David Lane) Go to Atomic
Dog Publishing and use Course ID (1516323605010)
to register & purchase. (Online access = $24.95; Online +
paperback = $45.00). Excel (available on WKUnet) & SPSS
Website: www.wku.edu/~goffbl/e206.htm
OBJECTIVES FOR STUDENTS
To gain a basic and practical understanding of how to collect and
analyze
data with emphasis on business applications
GRADING POLICIES
Tests
=
67% (3 -- including the final exam)
Assignments = 33% (3)
*****Assignment Grade Capped at 20% above Test Average*******
Mini-Assignments (Highly recommended but not graded)
Total = 100% (Also see classroom policies)
A>=90.0%; B=80.0-89.9%; C=70.0-79.9%; D=60.0-69.9%; F below 60.0%
Tests: You will need a scantron (in good condition), a pencil, and a calculator for each exam. NO early or makeup tests.. If a test is missed due to activities offically sponsored by WKU, an illness, or other special circumstances, I will increase the weight on the final exam to compensate for the missed exam. Discuss these special situations with me in advance if possible. Missing a test without a sufficient reason will result in a zero for that test. If weather (etc.) postpones a test or assignment deadline date, the test or deadline will move to the next class meeting. Material on tests will reflect class lecutres, mini-assignments, and assignments. Past tests are provided as aids but not as complete guides to current tests.
Assignments: We will have (3) out-of-class assignments
involving application of methods and use of computers. To receive full
credit full, work must adhere precisely to
the
instructions and must be turned in by the stated deadline.
Assignments
may be turned in early. Assignment deadlines are 5 minutes after class
begins on the stated date. A late assignment will have 10% deducted for
each business day that it is late (1 day begins immediately after
I collect the assignments). The computer output will count for
20%,
and the written answers will count for 80% of the grade. After a
preliminary grade is established, I will review the report for writing
and appearance with 0%-10% deducted from the preliminary
grade. ******Semester assignment averages are capped at 20%
above your semester test average ***********
Mini-Assignments: These are highly recommended but ungraded short assignments, usually done using the computer (Excel, SPSS). The practice gained on mini-assignments will be useful both on graded assignments and on tests.
MISCELLANEOUS & CLASSROOM POLICIES
Last day to drop with a "W" or change to audit is listed in Course
Bulletin. If you have an ADA covered disability requiring special
consideration,
please register with the ADA Compliance Office, and then see me.
Classroom Policies: Orderly
behavior
and respect for others who are speaking (including me) is expected. No
food or drink permitted. If late, enter with a minimum of
disturbance
and be seated in the nearest seat. Behavior that is inappropriate or
distracting
to other students or myself is not permitted. Individuals involved
in
incidents that significantly violate these policies will receive a
warning
and then will be notified of a letter grade reduction per subsequent
incident.
Weeks 1-2: Learning to Measure Reality
Data & variables (HS 1-0, 1-3, 1-4, 1-5 +
Excel/SPSS on transforming variables -- add notes to HS)
Parameters & statistics; Notation and
terminology
(HS 1-0, 1-1, 1-2)
Data collection methods: sampling methods, surveys, resampling and
more (Sampling
Methods incl. subsections)
Data entry and manipulation with Excel & SPSS (Mini-Assignment
1)
Weeks 3-4: Describing Charactistics of a
Single Variable
Graphical descriptions of data ( HS 2-0, 2-1,
2-6b, 2-6c, 2-6d + Mini-Assignment 2 Histogram
Applet)
Numerical descriptions of data center (HS 2-2
all subsections except 2-2d)
Numerical measures of variability (HS 2-3,
2-4)
Numerical measures of skew & kurtosis (HS
2-5 Descriptive
Statistics Applet)
Some Applications: Statistical Process Control (Supplement
1)
Assignment 1
Deadline Wednesday, February 2 (beginning of class)
Week 5: Test 1, Wednesday, February 9
Weeks 6-8: Measuring Relationships between
Variables
Scatterplots (HS 3-0, 3-1)
Correlation analysis (HS 3-2, 3-3, 3-4 + Correlation
Applet)
Regression analysis (HS 15-0, 15-2 + Regression
Applet) Mini-Assignment 3
Contingency tables & qualitative variables (HS 16-3a, 16-3b) Mini-Assignment
4
Week 8-10: Probability Primer
Basic probability concepts (HS 4-0, 4-1, 4-3,
4-4, 4-5 +Intro
to Probability& Basic
Axioms)
Conditional probabilities (HS 4-2 + Let's
Make a Deal PP; Let's
Make a Deal Applet)
Probability distributions (See Wikipedia
and particular PDs)
Normal, Binomial, t-distribution (4-6, 5-0, 5-2
+ Excel Mini-Assignment) Binomial
Applet; Normal
Distribution Applet
Expected Value (Definition--
first 3 paragraphs)
Mini-Assignment 5
Law of Large Numbers (the tendency for the
observed
frequency of an outcome to approach its expected value as the
number
of trials of the experiment increases -- SeeCoin
Flipping Applet)
Assignment 2 Deadline
Thursday, Wednesday, March 9 (beginning of class)
Week 11: Test 2, Wednesday, March 16
Week 12: Spring Break, Week of March 20
Weeks 13-15: Evaluating Sampling Errors &
Propositions
Parameters & Statistics one more time (HS 1-0, 1-1, 1-2)
Sampling Distributions & Central Limit Theorem (HS 6-0,6-1, 6-2,
6-4 + Sampling
Distriubtion Applet)
Estimating sampling error -- standard errors (HS 6-3)
Mini-Assignment 6
Confidence intervals in brief (HS 7-0, 7-1, 8-0, 8-1, 8-2, 8-10)
Evaluating hypotheses with p-values (probability values) (HS 9-0, 9-1,
9-2, 9-3, 9-4)
Weeks 15-17: Additional Business
Applications
+ Statistical "Literacy"
A few more examples of uses of statistics in business
Geometric mean (HS 2-2a);
Regression applications -- Stock market model (in-class); Cost analysis
(in-class); Demand analysis (in-class);
Defining probability problems correctly (Birthday
Problem Applet )
Translating conceptual ideas to operational measures & pitfalls
of not doing (in-class)
Evaluating Forecasters, Palm Readers, & Such (in-class); Looking
out for Junk Science American Council on Science & Health
Assignment
3 Deadline
Wednesday, April 20 (beginning of class)
Week 17: Review
& Prep for Final Exam
Week
18: Final Exam -- 8 AM, Thursday, May 5
One of the objectives of the
assignments is to develop and encourage skills in the clear, accurate
reporting of data-oriented reports. The writing in and appearance
of your reports matters. Here are some tips:
Questions:
3a. Briefly describe the data set used in the assignment and
measurement
of specific variables.
b. Based on your output, what would "typical" kilowatts used per day
be? Is the data set symmetric and what are outlying values?
c. Do winter months have higher gas usage than other months? Conduct
test where the null hypothesis is that winter month gas usage is the
same
as other months.
PRETTY GOOD ANSWERS
3. a. The data set consisted of 5 variables related to the monthly
electric and gas usage of residential utility customers from July 1990
to June 1998. The data was obtained from an SPSS data file on
WKUnet
(m:spss/utility). The variables included were a month identifier,
average
kilowatt hours used per day, gas thermal units used, average daily
temperature,
and number of days in the month.
b. Based on the histogram and descriptive statistics, typical kilowatt hours used per day ranged from about 30 to 40 per month. The mean was about 34 and the median was 37 with a standard deviation of 3. The data were skewed to the right with a few large outliers well beyond 50 kw hours per day.
c. The means for winter month gas usage (December, January, February) were about 30 percent higher than for other months. Based on a low p-value (0.005), a hypothesis that winter months and other months have the same mean could be rejected with strong confidence.
POOR ANSWERS
3. a. It had some variables about utility customers.
b. 34. A few big outliers. s.d. was 3.
c. winter had higher don't know what p-value means
Why are these poor answers?
Information
provided is far too little. No detail or sources about the
variables
used are provided in (a). Incomplete sentences used in (b) and
(c).
Pronoun (it) used without antecedent in (a). Abbreviation
used
for standard deviation in (b) without first defining it
somewhere.
Improper capitilization and punctuation in (c). Use of
contractions
such as "don't" in (c) is not appropriate in formal reports.
Empirical Rule: a summary of variation in outcomes for data that have roughly a bell-shaped (normal) shape and a means to indentify outliers. The Empirical Rule states: i) about 68 percent of the data will be between +/ 1 standard deviation from the mean, ii) about 95% of the data will be between +/- 2 standard deviations from the mean, and iii) about 99.9% of the data will be between +/- 3 standard deviations from the mean.
Chebeshev's Rule: a summary of variation in outcomes for data regardless of their shape and a means to identify outliers. The rule states: i) about 75% of of the data will be between +/- 2 standard deviations from the mean, ii) about 89% of the data will be between +/- 3 standard deviations from the mean, and iii) about 94% of the data will be between +/-4 standard deviations from the mean.
Standardize Units (z-units, z-values, z-scores): the name given to variable (X) which has been converted so that the mean is zero (0) and the standard deviation is one (1). This conversion is done by the formula Zi = (Xi - Mean)/Std. Deviation, where i refers to each individual item in the data set. This conversion eliminates whatever units were used to measure X, and it allows each data point to be easily evaluated in terms of how much it differs from the mean.
Statistical Process Control (SPC): The phrase and acronym applied to systematic methods of analyzing repetitive production processes using charts that track variation of the process, usually with the goal of monitoring and improving quality. Chapter 18 provides more details.
Process: The forces (inputs) working together to generate the outcomes of a variable; in manufacturing or provision of services these include equipment, tools, materials, people, and "environmental" influences such as weather or other events that determine the characteristics of a good or service being produced; the same kinds of forces generate outcomes personal settings also.
Process Variation: changes in the process that lead to quantitative or qualitative differences in the characteristics of a part, a good, or service being produced
Common Causes of (Process) Variation: sources of variation which are inherent (built-in) to the process design as it is currently configured; usual or normal process variation; sources of variation which have the potential to influence all process observations; variation only eliminated through redesign or improvement of design of the process; random variation
Assignable Causes of (Process) Variation: special or specific sources of variation which are not built into the design of the process; variation which does not influence all observations; variation which can be eliminated without altering the basic design
Control Charts: Graphs which record sample measurements of a
process -- usually a repetitive processs. X-Bar Charts are the simplest
and monitor variation in sample means of the process. The chart is used
if the variation in samples is likely due to assignable variation (high
variability or patterns) or merely to common (expected) variation.
1. Select a variable that you would like to investigate. It
might
be an item for sale such as gasoline prices at different locations or
the
price for an article of clothing from different online sources. If you
are interested in sports you might use home run data or player point
values
in fantasy football. Whatever the variable, it needs to be a
standardized
item (89 octane gas, Levi's 560 jeans, batting average, ...).
Collect
at least 30 observations for the variable. Record the value for
each
observation as well as an identifier for it. (See below).
You
may collect the data from online sources or you may use locations
around
town, but you may not use personal surveys. You must document the
source(s) of
your
data. If it is an online source, provide the
URL.
2. i) Enter the values for the variable into SPSS along with another variable to identify the source of the price. Remember to use labels for each variable. For example, your data spreadsheet might look like
ID
Price (ID and Price are the names of your variables;
the data are entered in the columns below)
SonyCC 500
JVCBB 450
....
....
ii) Create a statistical summary of your variable along with a
histogram
and boxplot.
ii) Edit your output. Place the title, TABLE 1: STATISTICS
ON (whatever your variable describes) at the top of your statistics
table.
Place the title FIGURE 1: HISTOGRAM ON (your variable) at the top
of your histogram. Plece the title FIGURE 2: BOXPLOT ON
(your
variable) at the top of your boxplot.
iii) Print both your output and the data spreadsheet..
3. On a separate sheet of paper, print or type answers to these
questions
in complete sentences. Explain
your
answers by making specific use of the pertinent statistics and graphs
that
you generate.
i) Describe the variable that you collected including the units of
measure, your method of sampling, and the source (you do not have to
provide
a detailed list of multiple sources -- just summarize).
ii) Using statistics that describle the center and variability of your
data, explain what would be typical and unusual prices.
iii) Using the skewness and kurtosis statistics as well as the
histogram,
explain other characteristics of the your price data.
iv) Describe the information provided by the boxplot.
v) Compute the standardized values for the first two observations in
your data set. Show your calculations.
Deadline = Wednesday, February 2,
(beginning
of class)
Remember: Place answers on a separate sheet at the back.
Staple or clip all sheets together. Your output should include
the
specified results -- no more and no less with answers typed or printed
very neatly. Make sure your name is clearly printed or typed on a
cover sheet.
1. Collect data on a two quantitative variables that you think would
be related to each other in some way. You should have, at least,
30 observations for each variable. These variables must be from a
documented source (no personal surveys).
Enter values for the two variables that you have collected into Excel,
making sure to provide names (labels) for each variable.
2. i) Create a linear regression analysis that includes the regression
output tables (remember to check "labels"), the table of predicted and
residual values (check "residual values"), and the graph showing the
fitted
regression line along with the actual scatterplot (check "line-fit
plot")..
(Refer to Mini-assignment #3 or Excel Help for assistance). Move
the graph so that it appears below the predicted/residual table (just
click
on the graph and keep holding down the mouse button as you move the
mouse).
ii) Edit your output in the following way. Make sure
to format the results so they fit within the columns properly.
Provide
a Title for the graph such as FIGURE 1: Scatterplot and Regression Line
for (your variables). Change the regression table title
to
TABLE 1: Regression Results for (your variables).
iii) Print your output. (Before printing, change the Page Setup
under "File" to "Landscape")
3. Answer the following questions on a separate sheet attached
to the back of your output.
i) Describe the objective of your study, the variables in your
data set (what they are; what units are used), and specifically
document
their source.
ii) Neatly write out the regression results in equation form.
iii) Explain the meaning of the coefficients in this equation
and describe, in numerical terms, how well this equation accounts for
different
values of your dependent variable.
iv) Show how the predicted and residual values are computed for the
observation in your data set (use the actual numbers). Then,
briefly
explain the meaning of these computations.
v) Explain the information presented on the graph (your Figure
1).
Deadline = Wednesday, March 9 (beginning
of
class; remember to use a staple or clip; print neatly or
type
your answers. Where responses are in words, use complete
sentences.)
1. Develop a proposition between two variables (measured in identical units) that you can test with data. The claim must be stated as an equality or inequality between the means or proportions of the two variables. These variables must be obtained from an observable source such as a local store, online, SPSS files, ... . Do not use surveys of people (students, friends, ...). Collect at least 20 observations for each variable.
Example: I could collect 20 fiction book prices and 20 non-fiction
book
prices from Amazon.com. and test the
proposition:
Average Price (in $) of Fiction Books = Average Price (in $) of
Non-fiction
Books;
(Note: This is the same thing as Avg. Price Fiction - Avg. Price
Non-fiction
= 0; The test must be between two variables that
are measured the same way such as in $, miles, gallons, lbs.,
....) Examples
of other variables would be financial variables on companies, points or
performance measures for sports teams or players, prices on products,
...
2. i) Enter your variables into two columns (with variable labels
ata
the top) in an Excel spreadsheet.
ii) Obtain descriptive statistics for each of your variables using
Tools/Data Analysis/Descriptive Statistics. Select "Summary
Statistics,"
"Confidence Level for the Mean," and "Labels" and click "OK." In the
output
worksheet, make sure to use Format/Column/Autofit to adjust column
widths
and Format/Cells to adjust the number of decimal places to four or
fewer.
iii) Select Tools/Data Analysis/t-test: two-sample assuming unequal
variances and put your two variables in the input ranges, select
"Labels,"
and select a value to use "hypothesized mean difference." (You select
this
value to fit the proposition that you are testing. You should
choose
a value that makes the test "interesting." For example, selecting
0 for the hypothesized difference between 93 octane and 89 octane gas
is
so obviously wrong to be of little interest. A value of 5 cents
might
be better.)
Click "OK."
iv) In the output worksheet, use Format/Columns/Autofit to make the
column width fit your table, and then print this output. Also
print
your data sheet.
3. Print or type answers to the following questions in complete
sentences
on a separate sheet attached to your output with a paper clip or
staple:
i) Explain the objective of your study. Describe your variables,
their units and the source(s) from which you obtained them.
ii) Clearly write out the proposition that you are testing (null
hypothesis) as an equation. Briefly explain your sampling
method.
What might be some possible sources of non-sampling error.
iii) Using the descriptive statistics that you generated in 2(ii),
briefly explain some of the important characteristics of your two
variables,
including the estimates of the sampling errors for the sample means
(sample
proportions).
iv) Write out an equation showing how the (approximate) 95% confidence
intervals are computed (use the actual numbers from your output -- your
numbers may vary slightly from those reported by Excel because of
rounding).
Write out an equation showing how the (aproximate) 99% confidence
intervals
are computed. (Again, use actual figures from your output).
v) What is the actual difference in the sample means (or
proportions)
for your two variables? Is this difference large enough for to be
considered (statistically) significantly different from the proposition
that you are testing? (Explain -- be specific!). If the means are
not significantly different, how might your study be redesigned to
increase
the likelihood of finding a signficant difference?
Deadline = Wednesday, April 20
(beginning
of class)
Objective: To gain familiarity generating and interpreting
univariate
statistical measures & graphics using SPSS
1. Access SPSS (For help, see SPSS
Help)
2. Retrieve the file Employee data.sav (m:/spss10/employee
data.sav -- this should be the default directory of
SPSS).
It provides data on 475 employees of a particular company. If you
have trouble finding the data, just employeedata.xls
and open into SPSS -- remember to tell SPSS that it is an Excel file.
(Note: you can see descriptions of the variables by pointing
at the variable labels in the top row or by clicking on "Variable View"
at the bottom of the spreadsheet).
2. Use Transform>Compute to create a new variable (Totexp) as
is the sum of Jobtime and Prevexp.
3. Generate a histogram, boxplot, and stem-and-leaf plot for
Current Salary (Salary) and Total Experience (Totexp) by clicking
Analyze>Descriptive
Statistics> Explore. Place Salary and Totexp in the "Dependent
List"
box. Under the "Display" selections, choose "Plots". Next,
click on the "Plots" button (between Statistics and Options) and select
"Histogram" along with Boxplot and stem-and-leaf which should already
by
selected.
4. In the output window, change the title from Explore
to Graphs for Salary and Experience and print your output.
(If you have trouble, come by my office).
5. Answer the following questions:
a. What are the characteristics of the
histograms
for salary and experience (center, spread, skewed which way)
b. For salary, what do the stem and leaf
numbers
under "frequency," "stem," and "leaf" mean?
d. For experience, what does the red box in
the boxplot indicate? What does the black lines above and below
the
red box mean? What are the values outside of the black lines?
Mini-Assignment 2
Objective: To gain practice
manipulating
data and generating descriptive statistics in Excel
1. Access Excel. (Either on your own PC or by logging on
to WKUnet and double-clicking the Excel icon on the standard student
network
desktop.)
2. Retrieve the file by clicking on this link: s&p1980a.xls
The file contains monthly values for the Standard & Poor’s 500
Index
from 1980 through March 2002.
****** Note that if you can open s&p1980a.xls into Excel by
clicking on the link. If the data analysis features of Excel will
not work. Save the file to a disk, close Excel, reopen Excel from
the desktop icon, and then open the file into Excel. ************
3. Create a new variable, labeled PCT, that will be percentage
changes in the monthly S&P 500 Index.
(To do this, type PCT in Cell D1. Click the cursor on Cell D3
and type the following formula to compute the percent change in the
S&P
500 Index -- include the = sign and parentheses
= ((c3 - c2)/c2)*100
Now, hit enter. Go back and highlight Cell D3 again, right click
the mouse, and click "Copy". Click on Cell D4 but keep holding
down
the left mouse button and drag the cursor until Column D is highlighted
all the way down to the bottom of the data. Release the left mouse
button
and right click the mouse and select "Paste." You should now have
a column of numbers showing monthly percentage changes.)
3.a. Compute descriptive measures for PCT. ( Plug in the appropriate
column into the "Input Range," click "Labels", and click "Summary
Statistics.")
b. Now, go back to the data worksheet and calculate standardized values
for PCT. To do this with Excel, you will need the mean and
standard
deviation from the Summary Statistics output. Then, on the
original
worksheet, type a label (such as std. values) at the top of an empty
column.
Highlight the cell in row 3 under the title and type the formula =
(mean
- D3)/standard deviation) where the mean and standard deviation
are
the numbers from your Summary Statistics output.
c. Print the output (after you resize the output columns to fit
properly and reformat the output column to present only 1 decimal
place).
4. Using the original data on the worksheet and the first 5
observations,
calculate the mean, median, standard deviation, and standardized value
(for the first observation). How closely do your calculations
match
those generated by Excel for all observations (allowing for rounding
differences)?
5. What do each of the descriptive statistics mean?
6. Try to draw a histogram (roughly) based on the descriptive
statistics.
If you have trouble, save your file on a floppy disk or CD and see
me.
1. Go to the Correlation Applet and practice matching the correct correlation coefficient with the appropriate scatterplot. Also, Try taking a listed correlation coefficient and then drawing a scatterplot that would reflect the correlation coefficient. See if your scatterplot matches the one provided by the applet for the coefficient.
2. Open the file airfares.xls into
Excel. The file includes several variables regarding roundtrip
air
fares to several major cities based on a sample of 21 cities collected
on a particular day for morning weekday departures with Saturday night
stay-over. The variables in the file are City, Fare (in $),
Distance
(to destination city in miles), Direct SWA (=1 if Southwest Airlines
flies
the route directly), and Fare per Mile (Fare divided by Distance).
2. Generate a scatterplot between Fare and Distance Click on
the chart icon button, select XY Scatter for chart type, and then use
the
chart wizard to generate a scatterplot that has distance on the x-axis
and fare on the y-axis. Print this scatterplot. Then
a) Draw a regression line in by hand that would
best fit through the middle of the data;
b) Write out the equation (Fare = intercept +
slope*Distance)
that would go with your regression line -- this does not have to be
exact.
3. Go back to the data worksheet and click Tools>Data
Analysis>Regression.
Use the pop-up windonw to generate a regression analysis with Fare as
the
dependent variable (y-variable) and Distance as the independent
variable
(x-variable). Remember to click the "Labels" box and also click
the
"Residuals" box.
a) Write out the regression output in equation
form.
Does this equation match yours from (2b)?
b) Calculate the first residual by hand. Use
the Excel-generated residuals to check your calculation (allowing for
rounding
differences)
c) What does a 1 mile increase in distance
predict for Fare? What about a 100 mile increase? What would Fare
be if distance were 0? Why is this really only a "hypothetical"
value?
1. Open the data file Accounts.xls
.
(It is an Excel data file). Variables are defined in the file.
2. Generate a crosstabulation with Account Supervisor as the
row variable and Account Irregularity as the Column Variable.
.
To do so, you must create a Pivot Table, put these two variable in
their
proper place, and then place the "COUNT" variable in the
body
of the table (refer to in-class instructions). Put the new table
on a new worksheet.
4. Print your Excel crosstab and the original data sheet.
5. Compute expected frequencies for each cell in the table and
then compute the Chi-Square statistic for the table.
1. Open Excel and compute probabilities for the following
situations:
a. Suppose that accounting errors due to erroneous data entry
have a 1% (0.01) probability on any given entry. Also,
suppose
that data entry errors are a binomial variable (error or no
error).
Compute the likelihood of zero errors given 150 entries.
(In Excel, highlight cell A1, click the fx
icon on tool bar; select “Function Category” = statistical; “Function
name
= BINOMDIST; fill in the following values in the blanks. Number_s
= 0, Trials = 150, Probability_s = 0.01, Cumulative = True.
b. Suppose that the average amount of time that customers are kept
on
hold by AOL customer support is 8 minutes with a standard deviation of
1.5 minutes. If wait time is normally distributed, calculate the
probability of waiting less than 10 minutes.
(In Excel, highlight cell A3, click click fx
icon on tool bar; select “Function Category” = statistical; “Function
name
= NORMDIST; fill in the following values in the blanks. X = 10,
Mean
= 8, Standard_Dev = 1.5, Cumulative = True.
Print the Excel Spreadsheet.
2. Solve the following problems:
a. If the probability of customers A and B returning an item
is 20% (0.20) for each one individually, what is the probability of
both
customers returning their items (assuming their actions are independent
of each other)?
b. Given the same information as in 1a, what is the probability
of one of the two people returning their item?
c. Suppose that return - no return is a binomial variable,
with a probability of 0.10 of return for any 1 person. What is
the
probability that either 0 or 1 people out of 6 people will return their
items? (Try to use Excel to compute this answer).
1. Open the file airfares.xls into Excel. The file includes several variables regarding roundtrip air fares to several major cities based on a sample of 21 cities collected on a particular day for morning weekday departures with Saturday night stay-over. The variables in the file are City, Fare (in $), Distance (to destination city in miles), Direct SWA (=1 if Southwest Airlines flies the route directly), and Fare per Mile (Fare divided by Distance).
2. i) Create descriptive statistics including a 95% confidence
interval
for Fare per mile by selecting Tools/Data Analysis/Descriptives
and
put the column for Fare per Mile in the “input range.”
Also, select Labels, Descriptive Statistics, and Confidence Interval
for Mean (95%). Click OK. Format the output using
Format/Column/Autofit.
ii) Repeat the prior step except Change the value in the window of
“confidence interval for mean” from 95% to 99%.
iii) Repeat the prior step except using including only Fare per Mile
for the first 10 cities.
3. Create a test of the proposition (hypothesis) that the Fare per Mile for the first ten cities equals the Fare per Mile for the next eleven cities. Use Tools>DataAnalysis>t-test: assuming unequal variaces. Put the address first 10 observations for Fare per Mile in the Variable 1 range and the address for next 11 observations in the Variable 2 range. Make the "hypothesized mean difference" equal to zero.
4. Print you output and try to do the following:
i) For the output from (2i), circle the standard deviation and standard
error of the mean. Draw line from each out to the margin, and
neatly
write a brief explanation of each in the margin. Aslo, show how the
standard
error of the mean is computed.
ii) Using a simple number line illustration, show the width of
the 95% confidence interval for the mean versus the width of the 99%
confidence
interval for the mean.
iii) Provide equations showing how the 95% and 99% confidence intervals
are computed. Why is the 99% interval wider?
iv) What happened to the width of the interval when only 10
cities
are used -- why?
v) From (3), explain the meaning of the
"two-tailed
p-value"; what does it say about the hypothesis?
vi) Why is there a t-statistic reported along with the p-value?
3a. The data we collected was borrowed from the WKUNet SPSS file Cars.sav. We used the first thirty units of horsepower measured of the 406 entered data. The data consisted of two variables: the qualitative number assigned to each car and the quantitative measure of each car's horsepower.
b. According to our data (Table 1), the typcial horsepower or approximate center is about 160. The mean of horsepower is 143.4 with the median of the data being 150 horsepower. This tells us that the data is somewhat symmetrical because the two numbers are close together. The standard deviation or how different on average the data is from the mean is approximately 48. Maximum values are 225 horsepower and minumum values are 46.
c. The descriptives chart produced by SPSS (Table 1) gives a skewness of .053 and kurtosis of -.789. Skewness, or how symmetric the dati is, if it is between -1 and 1, it is nearly symmetric. Since this data falls within that range, the data for the cars' horsepower is nearly symmetric. The kurtosis measures the outliers of the given data. If the kurtosis is near zero, the graph has normal extreme values. The horsepower measured, in our assignment, has normal extreme values. Because of the large "clump" of data shown in the histogram (Figure 1), the horsepower data is unimodal.
d. The first point on the box plot (Figure 2), or center line, is at 143.4. The bottom line of the box, or first quartile, is defined at 90; the third quartile, or top line of the boax, is at 170. The boxplot shows the interquartile range, which are the values of the third quartile -- first quartile is 80.
e. Standardized values can be computed by the formula
(variable
value - mean)/standard deviation. For the first observation, the
formula gives
(130-143.4)/48 = -0.28.
For the second observation, the formula gives
(165-143.4)/48 = 0.45.
Accessing SPSS on WKUnet
After you turn on the computer and the log in window will appear.
Fill in the blanks for your WKU Email address and your WKU Email
Password.
BEFORE PROCEEDING, click the “Novell Account”
button
in the lower right hand corner where it says, "The default
Novell
account is Student. To Change click Novell Account." After
you click this, two additional blanks will appear for Novell Account
and
Novell Password. Change the Novell Account from Student to
SPSS.
Leave the Novell Password blank. When the main desktop appears,
double
click the SPSS folde in the left window. Then click the SPSS 10
icon
in the right window (not the SPSS production icon). The SPSS menu
and data spreadsheet should appear.
3.a. Create a table showing the average pct change in the
S&P
500 Index for each year by using Data>Pivot Table and Pivot Chart
Report.
When Pivot Table Wizard opens you will see
“Step 1 of 3"Window : make sure “Microsoft Excel list or
database”
is checked and click “Next.”
“Step 2 of 3" Window: click the Range icon, and highlight all
three columns data (A, B, C) and then click “Next.”
“Step 3 of 3" Window: click "Layout", drag the “Year” button
into the area that says “Row”, and then click and drag the “s&p500"
value into the area that says “Data”. Double click on the button
that now says “Count of s&p 500.” In the new window,
“Options,”
then choose Summarize by “Average”, select Show data as “% of”, and
choose
Base Item as “1980.” Select OK, OK, select “New Worksheet” and
click
“Finish.”
b. Once the pivot table is created, create a chart based on it
by clicking on the chart icon in the pivot table dialog box next to the
pivot table. Print the chart and then print the Worksheet
containing
your new table (File>Print or click printer icon). If you run
into
trouble that you cannot solve, save your file to a floppy disk, and
come
see me.