Monthly Archives: February 2016

[STATA] Creating Dummy Variable Techniques

You could type

. gen young = 0
. replace young = 1 if age<25

. gen young = (age<25)
This statement does the same thing as the first two statements. age<25 is an expression, and Stata evaluates it; returning 1 if the statement is true and 0 if it is false.

If you have missing values in your data, it would be better if you type

. gen young = 0
. replace young = 1 if age<25
. replace young = . if missing(age)

. gen young = (age<25) if !missing(age)
Stata treats a missing value as positive infinity, so the expression age<25 evaluates to 0, not missing, when age is missing. (If the expression were age>25, the expression would evaluate to 1 when age is missing.)

You do not have to type the parentheses around the expression.

. gen young = age<25 if !missing(age)
is good enough. Here are some more illustrations of generating dummy variables:

. gen male = sex==1

. gen top = answer==”very much”

. gen eligible = sex==”male” & (age>55 | (age>40 & enrolled))
In the above line, enrolled is itself a dummy variable—a variable taking on values zero and one. We could have typed & enrolled==1, but typing & enrolled is good enough.

Just as Stata returns 1 for true and 0 for false, Stata assumes that 1 means true and that 0 means false.