Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Time:2021-8-8

Original link:http://tecdat.cn/?p=22805 

Why do I need dummy variables?

Most data can be measured by numbers, such as height and weight. However, variables such as gender, season and location cannot be measured numerically. Instead, we use dummy variables to measure them.

Example: Gender

Let’s assume that the effect of X on y is different between men and women.

For men, y = 10 + 5x + ey = 10 + 5x + e

For women, y = 5 + X + ey = 5 + X + E.

Where e is the random effect and the average value is zero. Therefore, in the real relationship between Y and X, gender affects both intercept and slope.

First, let’s generate the data we need.

#True slope, male = 5, female = 1
Ifelse (d $gender = = 1,   10+5*d$x+e,5+d$x+e)

First, we can look at the relationship between X and Y and color the data by gender.

plot(data=d)

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Obviously, the relationship between Y and X should not be described by a single line. We need two: one for men and one for women.

If we only return y to X and gender, the result is

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

The estimated coefficient of X is incorrect.

The correct setting should be such that gender can affect both intercept and slope.

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Or use the following method to add a dummy variable.

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

The model shows that for women (gender = 0), the estimated model is y = 5.20 + 0.99x; For men (gender = 1), the estimated relationship is y = 5.20 + 0.99x + 4.5 + 4.02x, that is, y = 9.7 + 5.01x, which is quite close to the real relationship.

Next, let’s try two dummy variables: gender and location

Dummy variables for gender and location

Gender is not important, but location is important

Let’s get some data, in which gender is not important, but location will be important.

Draw to see the relationship between X and y, color the data by gender, and separate by location.

plot(d,grid~location)

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

The effect of gender on y seems to be significant. But when you compare the Chicago data with the Toronto data, the intercept is different and the slope is different.

If we ignore the impact of gender and location, the model will be

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

R-squared is quite low.

We know that gender is not important, but we still add it to see if it will be different.

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

As expected, the impact of gender is not significant.

Now let’s look at the impact of location

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

The impact of location is great. But our model setup basically means that the position will only change the intercept.

What if the position changes the intercept and slope at the same time?

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

You can also try this.Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Gender is not important, and location changes intercept and slope.

Gender is not important, and location changes intercept and slope

Now let’s get some data that are important for gender and location. Let’s start at two places.

Ifelse (d $gender = = "0"  &  D $location = = "Toronto",   1+1*d$x+e,
+                      Ifelse (d $gender = = "1"  &  D $location = = "Chicago",   20+2*d$x+e,
+                             Ifelse (d $gender = = "0"  &  D $location = = "Chicago",   2+2*d$x+e,NA))))
Plot (D, x, y, color = gender ~ location)

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression modelExtension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Gender and location are important, five locations

Finally, let’s try a model with five locations.

+                      Ifelse (d $gender = = "1"  &  D $location = = "Chicago",   2+10*d$x+e,
+                             Ifelse (d $gender = = "0"  &  D $location = = "Chicago",   2+2*d$x+e,
+                                    Ifelse (d $gender = = "1"  &  D $location = = "New York", 3 + 15 * D $X + e,
+                                           Ifelse (d $gender = = "0"  &  D $location = = "New York", 3 + 5 * D $X + e,
+                                                  Ifelse (d $gender = = "1"  &  D $location = = "Beijing", 8 + 30 * D $X + e,
+                                                         Ifelse (d $gender = = "0"  &  D $location = = "Beijing", 8 + 2 * D $X + e,
+                                                                Ifelse (d $gender = = "1"  &  D $location = = "Shanghai",
plot(   x. Y, color = gender  ~ (location)

Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression modelExtension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Therefore, if you think that some factors (gender, location, season, etc.) may affect your explanatory variables, set them as dummy variables.


Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Most popular insights

1.Application case of multiple logistic regression in R language

2.Implementation of panel smooth transfer regression (PSTR) analysis case

3.Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.Case study of Poisson Poisson regression model in R language

5.Hosmer lemeshow goodness of fit test in R language regression

6.Implementation of lasso regression, ridge ridge regression and elastic net model in R language

7.Implementation of logistic regression in R language

8.Python uses linear regression to predict stock price

9.How does R language calculate IDI and NRI in survival analysis and Cox regression