# Extension data tecdat: R language Econometrics: application of dummy variables (dummy variables) in linear regression model

Time：2021-8-8

### Why do I need dummy variables?

Most data can be measured by numbers, such as height and weight. However, variables such as gender, season and location cannot be measured numerically. Instead, we use dummy variables to measure them.

## Example: Gender

Let’s assume that the effect of X on y is different between men and women.

For men, y = 10 + 5x + ey = 10 + 5x + e

For women, y = 5 + X + ey = 5 + X + E.

Where e is the random effect and the average value is zero. Therefore, in the real relationship between Y and X, gender affects both intercept and slope.

First, let’s generate the data we need.

``````#True slope, male = 5, female = 1
Ifelse (d \$gender = = 1,   10+5*d\$x+e,5+d\$x+e)``````

First, we can look at the relationship between X and Y and color the data by gender.

``plot(data=d)`` Obviously, the relationship between Y and X should not be described by a single line. We need two: one for men and one for women.

If we only return y to X and gender, the result is The estimated coefficient of X is incorrect.

The correct setting should be such that gender can affect both intercept and slope. Or use the following method to add a dummy variable. The model shows that for women (gender = 0), the estimated model is y = 5.20 + 0.99x; For men (gender = 1), the estimated relationship is y = 5.20 + 0.99x + 4.5 + 4.02x, that is, y = 9.7 + 5.01x, which is quite close to the real relationship.

Next, let’s try two dummy variables: gender and location

## Dummy variables for gender and location

### Gender is not important, but location is important

Let’s get some data, in which gender is not important, but location will be important.

Draw to see the relationship between X and y, color the data by gender, and separate by location.

``plot(d,grid~location)`` The effect of gender on y seems to be significant. But when you compare the Chicago data with the Toronto data, the intercept is different and the slope is different.

If we ignore the impact of gender and location, the model will be R-squared is quite low.

We know that gender is not important, but we still add it to see if it will be different. As expected, the impact of gender is not significant.

Now let’s look at the impact of location The impact of location is great. But our model setup basically means that the position will only change the intercept.

What if the position changes the intercept and slope at the same time? You can also try this. Gender is not important, and location changes intercept and slope.

### Gender is not important, and location changes intercept and slope

Now let’s get some data that are important for gender and location. Let’s start at two places.

``````Ifelse (d \$gender = = "0"  &  D \$location = = "Toronto",   1+1*d\$x+e,
+                      Ifelse (d \$gender = = "1"  &  D \$location = = "Chicago",   20+2*d\$x+e,
+                             Ifelse (d \$gender = = "0"  &  D \$location = = "Chicago",   2+2*d\$x+e,NA))))``````
``Plot (D, x, y, color = gender ~ location)``          ### Gender and location are important, five locations

Finally, let’s try a model with five locations.

``````+                      Ifelse (d \$gender = = "1"  &  D \$location = = "Chicago",   2+10*d\$x+e,
+                             Ifelse (d \$gender = = "0"  &  D \$location = = "Chicago",   2+2*d\$x+e,
+                                    Ifelse (d \$gender = = "1"  &  D \$location = = "New York", 3 + 15 * D \$X + e,
+                                           Ifelse (d \$gender = = "0"  &  D \$location = = "New York", 3 + 5 * D \$X + e,
+                                                  Ifelse (d \$gender = = "1"  &  D \$location = = "Beijing", 8 + 30 * D \$X + e,
+                                                         Ifelse (d \$gender = = "0"  &  D \$location = = "Beijing", 8 + 2 * D \$X + e,
+                                                                Ifelse (d \$gender = = "1"  &  D \$location = = "Shanghai",``````
``plot(   x. Y, color = gender  ~ (location)``  Therefore, if you think that some factors (gender, location, season, etc.) may affect your explanatory variables, set them as dummy variables. Most popular insights

## VBS tutorial: objects – match objects

Match object Provides access to read-only properties that match regular expressions. explain MatchObjects can only pass throughRegExpObjectExecuteMethod, which actually returnsMatchA collection of objects. be-allMatchObject properties are read-only. When a regular expression is executed, it may produce zero or moreMatchObject. eachMatchObject provides access to the string found by regular expression search, the length of the string, […]