In this article, we will explore tests to compare two groups of dependent (i.e. paired) quantitative data: Wilcoxon signed rank test and paired Student t-test. The key difference between these tests is that Wilcoxon’s test is a nonparametric test and t test is a parametric test. In the following, we will explore the consequences of this difference.
Let’s consider the sleep dataset. The data set compares the effects of two hypnotic drugs (i.e. hypnotics) by providing changes in sleep time after drug administration compared with baseline:
_ extra_ Represents an increase / decrease (positive / negative) in sleep compared to baseline measurements_ Group_ Indicates a drug_ ID_ Indicates the patient ID. For clarity, I will_ Group_ Rename to_ Drug:
## extra group ID ## 1 0.7 1 1 ## 2 -1.6 1 2 ## 3 -0.2 1 3 ## 4 -1.2 1 4 ## 5 -0.1 1 5 ## 6 3.4 1 6 ## 7 3.7 1 7 ## 8 0.8 1 8 ## 9 0.0 1 9 ## 10 2.0 1 10 ## 11 1.9 2 1 ## 12 0.8 2 2 ## 13 1.1 2 3 ## 14 0.1 2 4 ## 15 -0.1 2 5 ## 16 4.4 2 6 ## 17 5.5 2 7 ## 18 1.6 2 8 ## 19 4.6 2 9 ## 20 3.4 2 10
Note that the sleep dataset contains two measurements for each patient. Therefore, it is suitable for displaying pairing tests, such as the test we are dealing with.
What are we testing?
Suppose we work in a pharmaceutical company. These are the data just obtained from clinical trials. Now, we have to decide which two drugs you should launch for the market. A reasonable way to choose drugs is to identify drugs that perform better.
In order to intuitively understand the effectiveness of these two drugs, let’s draw the corresponding values:
The figure shows that the median increase in sleep time of drug 1 is close to 0, while the median increase of drug 2 is close to 2 hours. Therefore, according to these data, it seems that drug 2 is more effective than drug 1. However, we still need to determine whether our findings are statistically significant.
The null hypothesis tested was that there was no difference in additional sleep time between the two drugs. Since we want to know whether drug 2 is better than drug 1, we do not need a two tailed test (to test whether any drug has superior performance), but a one tailed test. Therefore, the alternative hypothesis is that drug 2 is better than drug 1.
Wilcoxon signed rank test
Because the test statistics are based on the ranking rather than the measurement itself, the Wilcoxon signed rank test can be considered to test the change in the median between the two groups.
To perform a verification in R, we can use this
wilcox.test。 However, we must set it clearly_ Pairing_ Parameter to indicate that we are processing matching observations. To specify a one tailed inspection, we will_ Substitute_ Parameter set to_ Bigger. In this way, the alternative tested is whether drug 2 is associated with increased sleep duration rather than drug 1.
wilcox(x, y, paired = TRUE, alternative = "greater"
Before we get the results, we should investigate the two warnings generated by performing the test.
Warning 1: relationship
The first warning occurs because the test ranks the differences between pairs of additional values. If two pairs have the same difference, they will be tied in the ranking. We can verify this by calculating the difference between pairs
x - y ## \[1\] 1.2 2.4 1.3 1.3 0.0 1.0 1.8 0.8 4.6 1.4
It was found that the third and fourth pairs had the same difference of 1.3. Why is juxtaposition a problem? The levels assigned to a juxtaposition relationship are based on the average of the levels they span. Therefore, if there are many parallels, it will reduce the expressiveness of test statistics and make Wilcoxon test inappropriate. Since we have only one juxtaposition here, this is not a problem.
Warning 2: zero value
The second warning relates to pairs with a difference of 0. This is the case from the fifth patient in the sleep dataset (see above). Why is zero a problem? Remember that the null hypothesis is right, and the difference is centered on 0. However, the observed difference of exactly 0 does not provide us with any information to reject zero. Therefore, these pairs are discarded when calculating the test statistics. If this is the case for many pairings, the statistical effect of the test will be greatly reduced. Again, this is not a problem because there is only one zero value.
The main result of the test is its p value, which can be obtained in the following ways:
res$p.value ## \[1\] 0.004545349
Since the p value is less than the significance level of 5%, this means that we can reject the invalid hypothesis. Therefore, we tend to accept the alternative hypothesis that drug 2 is better than drug 1.
T-test of paired students
Paired Student’s t test is a parameter test for two groups of paired quantitative measurement methods. Here, the parameter means that the t-test assumes that the average difference between samples is normally distributed. The test relies on determining the mean difference between the measurements of the two groups, X ¯ D greater than μ d， μ D is usually set to 0 to find out if there are any differences.
In R, we can use the t.test function for paired t-test. Note that t.test assumes that population variation is unequal. In this case, the test is also called Welch’s t test. To obtain the original t-test assuming equal population variance, we can directly set the equal.var parameter to true. Here, we will only use the default settings.
print(t.result$p.value) ## \[1\] 0.001416
Similarly, the p value is less than 0.05. Therefore, we tend to accept another hypothesis: drug 2 has a greater increase in average sleep time than drug 1.
Check the hypothesis of student’s t-test
T-test requires that the sample mean is normal distribution. According to the central limit theorem, when there are enough samples, the sample mean in the population is close to the normal distribution. Therefore, as long as there are enough samples, even for non normal measurement, it can meet the hypothesis of t-test. Since sleep data contains only 10 paired measurements, there should be reason to worry. Therefore, we should check whether the difference between the measured values is normal distribution to verify whether the t-test is effective.
ggplot(diff.df, aes(x = diff))
Looking at the histogram, the data seems to be fairly uniform rather than normally distributed. For closer observation, we use the Q-Q diagram to compare the difference with the expected value of the normal distribution.
The QQ chart shows that the difference is quite consistent with the normal model except for the heavy tail. Therefore, we can conclude that the hypothesis of t-test is fully satisfied. However, we are still uncertain whether the t-test is the most appropriate choice for these data.
Summary: Wilcoxon signed rank test and paired Student t-test
In this analysis, Wilcoxon signed rank test and paired Student t-test led to rejection of the null hypothesis. But in general, which test is more appropriate? The answer is that it depends on several criteria:
- Assumptions:The student’s t-test is the test of comparing the average value, while the Wilcoxon test is the sorting of test data. For example, if you are analyzing data with many outliers, such as personal wealth (a few Billionaires will greatly affect the results), the Wilcoxon test may be more appropriate.
- Explanation:Although the confidence interval can also be calculated for the Wilcoxon test, it may be more natural to debate the confidence interval of the mean in the t-test than the hypothesis of the Wilcoxon test.
- Implementation of assumptions: For small-scale samples, the hypothesis of Student t-test may not be satisfied. In this case, it is often safer to choose a nonparametric test. However, if the hypothesis of t-test is satisfied, its statistical ability is greater than Wilcoxon test.
Due to the small sample size of sleep data set, I prefer Wilcoxon’s test for these data.
What kind of test would you use?