STT 422 EXAM 1

Due Monday Apr 08, 11:59 pm, 25% of Final Grade = 80 points

There are total 8 questions.

Each question has subparts.

1

1 Problem 1

Consider the data set bank wage.csv. Using R or otherwise answer the following questions:

- (2 points) Plot wages versus LOS and circle the outlier with the highest value of wage. (Drop

this observation for remaining parts.) - (1 point) Find the least squares regression line for the regression of wages on LOS.
- (4 points) Give the significance test for the slope of LOS. (Clearly mention the hypothesis test,

test statistic, pvalue and conclusion). - (3 points) Give a 95% prediction interval at LOS=55.
- Problem 2

Consider the data set student gpa.csv. Consider a regression model for predicting GPA using IQ,

gender and self-concept. Using R or otherwise answer the following questions: - (4 points) Give the F-statistic for testing

H0 : βIQ = βgender = βselfconcept = 0

Also provide the degrees of freedom for this F-statistic. - (4 point) Run correlation tests to check if GPA is correlated to

(a) IQ

(b) GENDER - Problem 3

Consider the data set biomarkers.csv. Consider a regression model for predicting VO+ using OC,

TRAP and VO-. Using R or otherwise answer the following questions: - (2 points) Give the statistical model for this including all assumptions.
- (2 point) Give the multiple regression regression line to predict VO+ from OC, TRAP and

VO-. - (4 points) Make a table with t-statistics and pvalues for all the explanatory variables. Which

is the least significant variable among OC, TRAP and VO-. - (4 points) Consider the full model and the one without the least significant variable. Give the

anova table to compare these two models.

2 - Problem 4

Do people from different cultures experience emotions differently? Here is a summary of the data:

Are the means same across different cultures? - (2 points) Should you use a pooled standard deviation? If yes, what is its value?
- (4 points) Construct an ANOVA table for this problem.
- (2 points) State the hypothesis test for this problem.
- (2 points) Provide the p-value for hypothesis test in part 3.
- Problem 5

Consider the data set price promotion.csv. Using R or otherwise answer the following questions. - (2 points) Construct a contrast which can compare the average of promotions 1 and 7 to the

average of promotions 3 and 5. - (3 points) Give a 95% confidence interval for the contrast in part 1.
- (4 points) Use the Bonferroni or another multiple-comparisons procedure to compare different

price promotion groups. - Problem 6

Consider the data set intervene program.csv. Using R or otherwise answer the following questions. - (3 points) Plot the means. Do you think there is an interaction between Group and Time.

3 - (2 points) Give an estimate for the main effect of group 1.
- (4 points) Construct the two way anova model for this problem with group and time as the

factors. - (2 points) Can you accept the hypothesis that there is a main effect of time?
- Problem 7

Consider the data set plants1.csv. Using R or otherwise answer the following questions. - (4 points) Find the means for each species-by-water combination. Plot these means versus

water for the four species, connecting the means for each species by lines. - (2 points) Give the interaction effect between species level 1 and water level 6.
- (4 points) Give the two-way analysis of variance with species and water as factors.
- Problem 8

A study of 170 franchise firms classified each firm as to whether it was successful or not. Attached is

the data. - (2 points) What proportion of exclusive territory firms are successful?
- (2 points) Find the log odds for the answer in part 1.)
- (6 points) Let x = 1 for exclusive territories and x = 0 for other territories. Using R or

otherwise.

(a) (3 points) The fitted logistic regression model.

(b) (3 points) Odds ratio for exclusive territory versus no exclusive territory.

