STT 422 EXAM 1
Due Monday Apr 08, 11:59 pm, 25% of Final Grade = 80 points
There are total 8 questions.
Each question has subparts.
1 Problem 1
Consider the data set bank wage.csv. Using R or otherwise answer the following questions:
- (2 points) Plot wages versus LOS and circle the outlier with the highest value of wage. (Drop
this observation for remaining parts.)
- (1 point) Find the least squares regression line for the regression of wages on LOS.
- (4 points) Give the significance test for the slope of LOS. (Clearly mention the hypothesis test,
test statistic, pvalue and conclusion).
- (3 points) Give a 95% prediction interval at LOS=55.
- Problem 2
Consider the data set student gpa.csv. Consider a regression model for predicting GPA using IQ,
gender and self-concept. Using R or otherwise answer the following questions:
- (4 points) Give the F-statistic for testing
H0 : βIQ = βgender = βselfconcept = 0
Also provide the degrees of freedom for this F-statistic.
- (4 point) Run correlation tests to check if GPA is correlated to
- Problem 3
Consider the data set biomarkers.csv. Consider a regression model for predicting VO+ using OC,
TRAP and VO-. Using R or otherwise answer the following questions:
- (2 points) Give the statistical model for this including all assumptions.
- (2 point) Give the multiple regression regression line to predict VO+ from OC, TRAP and
- (4 points) Make a table with t-statistics and pvalues for all the explanatory variables. Which
is the least significant variable among OC, TRAP and VO-.
- (4 points) Consider the full model and the one without the least significant variable. Give the
anova table to compare these two models.
- Problem 4
Do people from different cultures experience emotions differently? Here is a summary of the data:
Are the means same across different cultures?
- (2 points) Should you use a pooled standard deviation? If yes, what is its value?
- (4 points) Construct an ANOVA table for this problem.
- (2 points) State the hypothesis test for this problem.
- (2 points) Provide the p-value for hypothesis test in part 3.
- Problem 5
Consider the data set price promotion.csv. Using R or otherwise answer the following questions.
- (2 points) Construct a contrast which can compare the average of promotions 1 and 7 to the
average of promotions 3 and 5.
- (3 points) Give a 95% confidence interval for the contrast in part 1.
- (4 points) Use the Bonferroni or another multiple-comparisons procedure to compare different
price promotion groups.
- Problem 6
Consider the data set intervene program.csv. Using R or otherwise answer the following questions.
- (3 points) Plot the means. Do you think there is an interaction between Group and Time.
- (2 points) Give an estimate for the main effect of group 1.
- (4 points) Construct the two way anova model for this problem with group and time as the
- (2 points) Can you accept the hypothesis that there is a main effect of time?
- Problem 7
Consider the data set plants1.csv. Using R or otherwise answer the following questions.
- (4 points) Find the means for each species-by-water combination. Plot these means versus
water for the four species, connecting the means for each species by lines.
- (2 points) Give the interaction effect between species level 1 and water level 6.
- (4 points) Give the two-way analysis of variance with species and water as factors.
- Problem 8
A study of 170 franchise firms classified each firm as to whether it was successful or not. Attached is
- (2 points) What proportion of exclusive territory firms are successful?
- (2 points) Find the log odds for the answer in part 1.)
- (6 points) Let x = 1 for exclusive territories and x = 0 for other territories. Using R or
(a) (3 points) The fitted logistic regression model.
(b) (3 points) Odds ratio for exclusive territory versus no exclusive territory.