## preface

As we all know, you can’t boast when you are better this year than last year. Please put out the frequency and quality of your posts on developepper!

Facing the data of this year and last year, maybe you need a statistical test method

## Inter group difference test, finally someone made it clear!

**Homogeneity of variance**

That is, the variance is equal, which needs to be met in t-test and analysis of variance. In the comparison of two groups and multiple groups, the meaning of homogeneity of variance is easy to understand. It is nothing more than comparing the variance of each group to see if the variance of each group is about the same size. If the difference is too large, it is considered that the variance is uneven or unequal. If there is little difference, it is considered that the variance is homogeneous or equal. Of course, this so-called difference is large or small, which needs statistical test, so there is variance homogeneity test.

## Normal distribution test

In t-test and analysis of variance, the samples are required to be samples from normal distribution. On this premise, the statistical test can be carried out on the mean value of the sample. The purpose of the test is to judge whether the two samples come from the random sampling results of the same population or from completely different samples. In addition, it should be noted that if the sample size is greater than 30, the mean value of the sample also approximately obeys the normal distribution, which is why we can also use the t-test.

## Parametric test and nonparametric test

Inter group difference test, finally someone made it clear!

Difference between parametric test and nonparametric test:

1 **Parametric test is a hypothesis made for parameters and nonparametric test is a hypothesis made for overall distribution, which is an important feature to distinguish parametric test from nonparametric test.**For example, the t-test for the comparison of two samples is to judge whether the mean values of the populations represented by the two samples are different, which belongs to the parameter test. The rank sum test (wilcoxcon test and Mann Whitney test) for the comparison of two samples is to judge whether the positions of the populations represented by the two samples are different (that is, whether the variable values of the two populations have unknown deviation of tendency), which naturally belongs to the nonparametric test.

The fundamental difference between the two is that the parameter test should use the population information (population distribution, some parameter characteristics of the population such as variance) to infer the population parameters with the population distribution and sample information; Nonparametric test does not need to use the information of the population (population distribution, some parameter characteristics of the population such as variance) to infer the population distribution with the sample information.

3. Parameter test can only be used for equidistant data and proportional data, and non parameter test is mainly used for counting data. It can also be used for isometric and proportional data, but the accuracy will be reduced.

**How to understand nonparametric testing**

**Parameter test**Usually hypothetical**The population obeys normal distribution, and the sample statistics obey t distribution**Some unknown parameters in the population distribution, such as population mean, population variance and population standard deviation, are statistically inferred. If the distribution of the population is unknown and the sample size is small, it is impossible to use the central limit theorem to test the parameters and infer the concentration trend and dispersion of the population. At this time, the nonparametric test can be used. The nonparametric test does not assume the overall distribution, but directly infers the overall distribution from the analysis of samples.

Compared with parametric test, nonparametric test has a wide range of applications, especially for small sample data, unknown or skew population distribution, uneven variance and mixed samples.

Nonparametric testing is widely used, but the accuracy of parametric testing is higher.

## SPSS Sego

SPSS was used for various tests

**Variance and t-test**The difference is that for X of t test, it can only be divided into two categories, such as men and women. If x is 3 categories, such as undergraduate or below, undergraduate or above; Only analysis of variance can be used at this time.

**Analysis of variance (ANOVA)**, also known as “ANOVA”, is invented by R.A. Fisher and is used to test the significance of the difference between the mean of two or more samples.

##
**Difference between Chi square test and Fisher exact test**

**All are unordered classification variables**

① Chi square test

Chi square test is often used to analyze the correlation between disordered categorical variables, and can also be used to analyze the relationship between binary categorical variables. However, this test can only analyze the relevant statistical significance and can not reflect the correlation strength. Therefore, we often combine Cramer’s V test to indicate the correlation strength.

② Fisher exact test

Fisher exact test can be used to test the correlation between any R * C data, but it is most commonly used to analyze 2 * 2 data, that is, the correlation between two binary variables. Unlike chi square test, which can only fit approximate distribution, Fisher exact test can analyze accurate distribution and is more suitable for analyzing small sample data. However, like Chi square test, this test can only analyze the relevant statistical significance and can not reflect the correlation strength.

## Some theorems of normal distribution

(1) The average of the average of all possible samples with capacity n randomly selected from the population is equal to the average of the population.

(2) From a normal population, all possibilities with a randomly selected capacity of n**Sample averageDistribution of**It is also normally distributed.

(3) Although the population is not normally distributed, if the sample size is large, it reflects the population μ and σ of**Sample average**The sampling distribution is also close to the normal distribution.

If the original data conforms to the normal distribution, t-test is recommended. If the deviation is large, nonparametric test is recommended. If the sample size is large, both test methods are acceptable.