文献来源
Double/debiased machine learning
- 张涛,李均超.网络基础设施、包容性绿色增长与地区差距——基于双重机器学习的因果推断[J].数量经济技术经济研究,2023,40(04):113-135.
- 中国知网【数据+Stata】
下载包
在Stata中运行:
ssc install ddml, replace
ssc install pystacked, replace
并且在cmd中运行:
示例代码
运行环境:Stata 18 MP
样本分割比例为 1:4(kfolds(5)),采用随机森林算法(rf)对主回归和辅助回归进行预测求解,
cd "C:\Download\数据"
use data, clear
gl Y PR
gl X Edu Constru Urban Pass Fre Inv Inter Fis Unemp Size Consump Sci Cap Edu2 Constru2 Urban2 Pass2 Fre2 Inv2 Inter2 Fis2 Unemp2 Size2 Consump2 Sci2 Cap2 i.year i.id
gl D Broadband
set seed 42 
ddml init partial, kfolds(5)
ddml E[D|X]: pystacked $D $X, type(reg) method(rf)
ddml E[Y|X]: pystacked $Y $X, type(reg) method(rf)
ddml crossfit
ddml estimate, robust
得到结果
运行时间约为3分钟:
Cross-fitting E[y|X] equation: PR
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: Broadband
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec  r     Y learner     D learner         b        SE
 opt  1  Y1_pystacked  D1_pystacked     0.049  ( 0.057)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X]  = Y1_pystacked_1                         Number of obs   =      2820
D-E[D|X,Z]= D1_pystacked_1
------------------------------------------------------------------------------
             |               Robust
          PR | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   Broadband |    .049493   .0567812     0.87   0.383    -.0617961     .160782
       _cons |   .0100916   .0126023     0.80   0.423    -.0146085    .0347917
------------------------------------------------------------------------------
期刊排版
演示了第8列的结果:
 张涛和李均超(2023)
张涛和李均超(2023)更多其他变式参见原文及其附录代码。
报错分析
采用Stata 17可能出现:
Cross-fitting fold 1 unrecognized command
该问题暂无解决方式。
(完)