Survey Sampling

Statistics 4234/5234 — Fall 2018

Take-home final exam

The following problems are due to Room 203 Mathematics between 7:10pm and 8:00pm on

Tuesday, December 18. You can also submit your paper to the course mailbox in Room 904

SSW, any time before 7:00pm on Tuesday, December 18.

You are not to discuss these problems with anyone other than the instructor, nor consult any

published or on-line reference other than Sampling: Design and Analysis; by Sharon L. Lohr.

Please refer to the Homework requirements section of the Course Information document posted

at the beginning of the course. A portion of your score on the final exam will be based on

presentation; any paper that fails to comply with items 2 through 8 of those requirements will

not earn the presentation points.

- Consider the population of size N = 50 given in the file FinalPop.csv, in the “Data”

folder on Courseworks. Verify that the population mean and variance are ˉyU = 14.882

and S - = 0.977457, respectively.

(a) Find the design-based bias and MSE of the sample mean for an SRS of size n = 10,

that is, find E(ˉy yˉU ) and E[(ˉy yˉU )

2

].

(b) Consider a stratified random sample of 2 units drawn from each of the 5 strata,

defined as units 1–10, 11–20, 21–30, 31–40, and 41–50. Find the design-based bias

and MSE of ˉystr =

1

5

P5

h=1 yˉh; that is, find E(ˉystr yˉU ) and E[(ˉystr yˉU )

2

].

Now suppose that the population Y1, Y2, . . . , YN is itself a random sample of size N = 50

from a normally distributed superpopulation with mean μ = 15 and variance σ - = 1.

(c) Find the model-based bias and MSE of the estimator in part (a), that is, find

EM(ˉy yˉU ) and EM[(ˉy yˉU )

2

].

(d) Find the model-based bias and MSE of the estimator in part (b), that is, find

EM(ˉystr yˉU ) and EM[(ˉystr yˉU )

2

].

1 - The data for this problem are in the SDaA package. Enter

library(SDaA)

Data <- counties[,c(2,3,5,17)]

and Data will contain total population and number of veterans for a random sample of - of the 3141 counties in the United States. The total population at the time of this

data set was 255,077,536.

(a) Using ratio estimation, find an approximate 95% confidence interval for the total

number of veterans in the United States. Report your answer in millions of veterans,

rounded to the nearest 10,000.

(b) Using regression estimation, find an approximate 95% confidence interval for the total

number of veterans in the United States. Report your answer in millions of veterans,

rounded to the nearest 10,000.

(c) Assume the population values are themselves a random sample from a superpopulation

in which

Yi = β0 + β1xi + εi

where E(εi) = 0 and V (εi) = σ

2

. Find the model-based standard error of the estimate

you computed for part (b). How does it compare to the design-based SE?

(d) Assume the population values are themselves a random sample from a superpopulation

in which

Yi = β1xi + εi

where E(εi) = 0 and V (εi) = σ

2xi

. Find the model-based standard error of the

estimate you computed for part (a). How does it compare to the design-based SE? - A fisherman is interested in N, the number of fish in a certain pond. He catches 100 fish,

tags them, and throws them back. A few days later, he returns and catches 80 fish, of

which 18 are tagged.

(a) Find the maximum likelihood estimate of N, along with its standard error.

(b) Explain in plain English what your answer to part (a) means, particularly the standard

error. The fisherman does not know or care about maximum likelihood theory,

or sampling distributions, or any such things — he just wants to know how many

fish are in the pond. Help him.

2

(c) Find an approximate 90% confidence interval for N by inverting the acceptance region

of a level .10 Pearson’s chi-square test for independence between the binary variables

In first day’s catch and In second day’s catch. - The file statepop.csv, available in the “Data” folder on Courseworks, lists the 1992

population for the 50 states plus the District of Columbia. The file counties.csv contains

the number of counties for a sample of size 12 with replacement, with probabilities

proportional to population.

(a) Estimate the total number of counties in the United States, and find the standard

error of your estimate.

(b) With California being sampled three times and New Jersey twice, there were nine

distinct states in the sample. Writing your estimate in part (a) as t?=

P

i∈R wiQiti

with R = {CA, CO, CT, MA, MO, NJ, TN, VA, WI} and ti = number of counties

in state i, find wiQi

for each of those nine states. What is P

i∈R wiQi

for this sample?

What is its expected value over repeated random sampling? - Investigators selected a random sample of 200 teenagers from a population of 2000 for a

survey of screen time on smartphones and other handheld devices; the overall response rate

was 75%. A follow-up sample was taken of 10 of the 50 nonrespondents, with responses

obtained from all 10.

In the data file ScreenTime.csv, the variable Group takes the value 1 for respondents, 2

for nonrespondents included in the follow-up survey, and 3 for nonrespondents not in the

follow-up sample; the variable Minutes gives that individual’s average daily screen time

in minutes.

Give an approximate 95% confidence interval for the average screen time per day among

these 2000 teenagers.

WX：codehelp