#Data Generation Here we generate a simple relationship between height and strength, where IAT’s residuals only come from random noise and thus shouldn’t be identified as biased by IAT. In other words, there’s no “direct” correlation between gender and height. ##strength

set.seed(123)
f_e=data.frame(strength=rnorm(150000,62,6),label='female')
m_e=data.frame(strength=rnorm(300000,68,10),label='male') #males still have higher strength in general
employee=rbind(f_e,m_e)

##proxy: height While starting with different strengths between male and female groups, the correlation between height and strength is perfect, so IAT should say that gender doesn’t play a role in deciding the height cutoff.

set.seed(123)
f_u=rnorm(150000,0,3)
for (i in 1:nrow(f_e)){
  f_e$height[i]=f_e$strength[i]^2*0.018+f_u[i]
}
m_u=rnorm(300000,0,3)
for (i in 1:nrow(m_e)){
  m_e$height[i]=m_e$strength[i]^2*0.018+m_u[i]
}
employee=rbind(f_e,m_e)

#IAT

first_step=lm(height~strength, data=employee)
cut_employee = employee %>% mutate(g_binary = if_else(label == 'female',1,0))
cut_employee$resid=first_step$residuals
second_step=lm(resid~g_binary, data=cut_employee)
summary(second_step)

Call:
lm(formula = resid ~ g_binary, data = cut_employee)

Residuals:
   Min     1Q Median     3Q    Max 
-1.807 -1.375 -0.549  0.379 39.994 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.389125   0.003942  -98.71   <2e-16 ***
g_binary     1.167374   0.006828  170.97   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.159 on 449998 degrees of freedom
Multiple R-squared:  0.061, Adjusted R-squared:  0.06099 
F-statistic: 2.923e+04 on 1 and 449998 DF,  p-value: < 2.2e-16