Machine Learning (ML) algorithms provide powerful data-driven tools for approximating high-dimensional and/or non-linear nuisance functions of the confounders without making assumptions on the true functional form ex-ante.
- Speaker
- Date
- Wednesday 28 Feb 2024, 13:00 - 14:00
- Type
- Seminar
- Room
- 3.09
- Building
- Polak Building
In this paper, we develop estimators of causal parameters for panel data models which allow for non-linear effects of the confounding regressors, and investigate the performance of these estimators using well-known ML algorithms (i.e., LASSO, classification and regression trees, gradient boosting, and random forests). We use Double Machine Learning (DML) by Chernozhukov et al. (2018) for the estimation of the homogeneous treatment effect in panel data models with unobserved individual heterogeneity (or fixed effects) and no unobserved confounding by extending Robinson (1988)’s partially linear regression model.
Alternative approaches
We develop three alternative approaches for handling the fixed effects by adapting the within-group estimator, first-difference estimator, and correlated random effect estimator (Mundlak, 1978) to non-linear models. Using Monte Carlo simulations, we find that conventional least squares estimators can perform well even if the data generating process is non-linear and smooth, but there are substantial performance gains in terms of bias reduction under a process where the true effect of the regressors is non-linear and discontinuous. However, for the same scenarios, we also find inference to be problematic for tree-based learners, despite extensive hyperparameter tuning, because these lead to highly non-normal distributions of the estimator and severely under-estimated sampling variance.
Finally, we provide an illustrative example of DML for observational panel data showing the impact of the introduction of the national minimum wage on voting behaviour in the UK.