#Data Generation Here we generate a simple relationship between height and strength, where IAT’s residuals only come from random noise and thus shouldn’t be identified as biased by IAT. In other words, there’s no “direct” correlation between gender and height. ##strength
set.seed(123)
f_e=data.frame(strength=rnorm(150000,62,6),label='female')
m_e=data.frame(strength=rnorm(300000,68,10),label='male') #males still have higher strength in general
employee=rbind(f_e,m_e)
##proxy: height While starting with different strengths between male and female groups, the correlation between height and strength is perfect, so IAT should say that gender doesn’t play a role in deciding the height cutoff.
set.seed(123)
f_u=rnorm(150000,0,3)
for (i in 1:nrow(f_e)){
f_e$height[i]=f_e$strength[i]^2*0.018+f_u[i]
}
m_u=rnorm(300000,0,3)
for (i in 1:nrow(m_e)){
m_e$height[i]=m_e$strength[i]^2*0.018+m_u[i]
}
employee=rbind(f_e,m_e)
#IAT
first_step=lm(height~strength, data=employee)
cut_employee = employee %>% mutate(g_binary = if_else(label == 'female',1,0))
cut_employee$resid=first_step$residuals
second_step=lm(resid~g_binary, data=cut_employee)
summary(second_step)
Call:
lm(formula = resid ~ g_binary, data = cut_employee)
Residuals:
Min 1Q Median 3Q Max
-1.807 -1.375 -0.549 0.379 39.994
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.389125 0.003942 -98.71 <2e-16 ***
g_binary 1.167374 0.006828 170.97 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.159 on 449998 degrees of freedom
Multiple R-squared: 0.061, Adjusted R-squared: 0.06099
F-statistic: 2.923e+04 on 1 and 449998 DF, p-value: < 2.2e-16