文献来源
机器学习一般采用Python/R/Matlab实现,而用Stata的文章少之又少。特此整理以供参考!
Hancevic和Sandoval(2023)遵循理论和文献构建用于检查光伏(PV)采用概率背后决定因素的变量。作者采用数据驱动的方法,使用机器学习方法来获得光伏采用的最佳预测指标,建立一个能够做出良好预测的模型,并增加对光伏采用的潜在驱动因素的理解。这些变量中有许多是冗余的,并且高度相关。
作者使用交叉验证lasso、自适应lasso、弹性网的线性和逻辑模型进行预测,即选择与一个数据集中的结果相关的变量,并测试相同的变量是否预测另一个数据集中的结果。
- 交叉验证lasso(CV lasso)选择 \lambda^* (lasso惩罚参数)并找到最小化样本外预测误差的模型,也称为交叉验证函数。作者使用100个 \lambda 的默认网格,它们是均匀间隔的。
- 自适应lasso(adaptive lasso)由多个lasso组成,每个lasso步骤使用交叉验证选择一个 \lambda^* 。在每次lasso之后,去掉系数为0的变量,并给予剩余变量惩罚权重,旨在将小系数驱动为零。作者使用两个lasso的默认值,因为通常选择的 \lambda^* 在第二个lasso之后不会改变。
- 弹性网(elastic net)通过使用更一般的惩罚项来扩展lasso,该惩罚项是lasso使用的绝对值惩罚和脊回归使用的平方惩罚的混合,使得系数估计对高度相关协变量的存在更加稳健(Zou和Hastie,2005)。为了拟合模型,作者指定(0.2,0.4,0.6,0.8)作为候选 \alpha 的集合和 \lambda 值的默认网格。对 (\alpha, \lambda) 值的组合集执行CV,并选择CV函数最小的对。
由于作者想要在一个不用于拟合模型的样本上评估预测,首先将样本随机分成训练(75%)和验证(25%)样本。接下来,在训练样本上拟合模型,并在验证样本上测试它们的预测,即检验拟合优度并比较它们的样本外预测能力。
此外,作者还使用跨越式算法(leaps-and-bounds algorithm)(Furnival和Wilson,2000;Lindsey和Sheather,2010)进行线性回归的变量选择,该算法根据信息标准解决了最具预测性的变量子集。然而,该算法的主要特性并不适用于样本外模型选择问题(Gluzmann和Panigo,2015)。作者将这一过程作为结果的一部分进行比较,因为它曾在住宅文献中使用过(Davidson等,2014)。
示例代码
下载包:
关键函数:
vl
:管理变量列表splitsample
:将数据分成随机样本lasso
:用lasso模型进行预测和模型选择elasticnet
:用弹性网模型进行预测和模型选择lassocoef
:lasso模型估算结果后显示系数lassogof
:lasso模型预测后的拟合优度vselect
:实现跨越式算法
运行代码:
cd C:\Download\1-s2.0-S0140988323004772-mmc1\Replication3
use Base_Est_Final_L_ML, clear
eststo clear
* 变量列表
vl create allvars = (est_tarifa_v3 est_lntrab est_mipyme_mi est_mipyme_p est_propio comercio comercio_men ///
est_opwkend est_op4dom est_oplun_vie est_oplun_sab est_oplun_dom est_techo_ext_v1 est_elevador ///
est_ampliacion_v1 est_colinda_v1 est_aisl_estructura est_aisl_techo est_aisl_pared est_aisl_ventana ///
est_aisl_puerta est_aislamiento est_AC est_calef_ele est_calefaccion est_espacio_oficina est_espacio_clientes ///
est_espacio_deposito mun_ags est_luz_sensor est_luzext_sensor est_luzint_sensor est_computer est_impresora ///
est_equip_oficina est_servidor est_generador est_regulador est_ventilador est_refri_com est_refri est_cocina ///
est_refri_dom est_micro est_cafe est_tv est_bomba est_gasdiesel est_cambio_g1ene est_conoce est_kwh_conoce ///
est_gasto_conoce est_AC_meses_6 est_AC_tempf est_calentsol est_hab3 est_accionH_panel est_accionH_apagar)
* 样本分割
splitsample, generate(sample1) split(.75 .25) rseed(13531)
la def slabel 1 "Training" 2 "Validation"
la values sample1 slabel
ta sample1, m
* 交叉验证
lasso logit est_pv $allvars if sample1==1, nolog rseed(13531)
est store logit_cv
lasso linear est_pv $allvars if sample1==1, nolog rseed(13531)
est store linear_cv
* 自适应
lasso logit est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)
est store logit_adp
lasso linear est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)
est store linear_adp
* 弹性网
elasticnet logit est_pv $allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)
est store logit_enet
elasticnet linear est_pv
$allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)
est store linear_enet
* 选择变量
lassocoef logit_cv linear_cv logit_adp linear_adp logit_enet linear_enet, ///
sort(coef, standardized)
* 拟合优度
lassogof logit_cv logit_adp logit_enet, over(sample1) postselection
lassogof logit_cv logit_adp logit_enet, over(sample1)
lassogof linear_cv linear_adp linear_enet, over(sample1) postselection
* 跨越式算法 (所选变量的并集) 生成表11的最后一列:
vl create selectedvars = (est_tarifa_v3 est_lntrab est_mipyme_mi est_mipyme_p est_propio comercio_men ///
est_opwkend est_oplun_vie est_oplun_dom est_aisl_techo est_aisl_ventana est_calef_ele est_calefaccion ///
est_luz_sensor est_luzint_sensor est_generador est_regulador est_ventilador est_tv est_bomba ///
est_gasdiesel est_conoce est_gasto_conoce est_AC_meses_6 est_calentsol est_hab3 est_accionH_panel)
vselect est_pv $selectedvars, best
得到结果
sample1 | Freq. Percent Cum.
------------+-----------------------------------
Training | 588 75.00 75.00
Validation | 196 25.00 100.00
------------+-----------------------------------
Total | 784 100.00
.
.
. * Cross-validation
. * ----------------
.
. *logit
. lasso logit est_pv $allvars if sample1==1, nolog rseed(13531)
Lasso logit model No. of obs = 582
No. of covariates = 58
Selection: Cross-validation No. of CV folds = 10
--------------------------------------------------------------------------
| No. of Out-of-
| nonzero sample CV mean
ID | Description lambda coef. dev. ratio deviance
---------+----------------------------------------------------------------
1 | first lambda .0376531 0 -0.0118 .4974787
11 | lambda before .0148511 15 0.0124 .4855901
* 12 | selected lambda .0135318 16 0.0139 .4848208
13 | lambda after .0123297 16 0.0136 .4849987
16 | last lambda .009327 22 0.0070 .4882351
--------------------------------------------------------------------------
* lambda selected by cross-validation.
. est store logit_cv
.
. *linear
. lasso linear est_pv $allvars if sample1==1, nolog rseed(13531)
Lasso linear model No. of obs = 582
No. of covariates = 58
Selection: Cross-validation No. of CV folds = 10
--------------------------------------------------------------------------
| No. of Out-of- CV mean
| nonzero sample prediction
ID | Description lambda coef. R-squared error
---------+----------------------------------------------------------------
1 | first lambda .0376531 0 -0.0059 .062887
13 | lambda before .0123297 16 0.0065 .0621143
* 14 | selected lambda .0112343 17 0.0065 .062111
15 | lambda after .0102363 22 0.0061 .0621378
20 | last lambda .0064287 28 0.0030 .0623309
--------------------------------------------------------------------------
* lambda selected by cross-validation.
. est store linear_cv
.
.
. * Adaptive
. * --------
.
. *logit
. lasso logit est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)
Lasso logit model No. of obs = 582
No. of covariates = 58
Selection: Adaptive No. of lasso steps = 2
Final adaptive step results
--------------------------------------------------------------------------
| No. of Out-of-
| nonzero sample CV mean
ID | Description lambda coef. dev. ratio deviance
---------+----------------------------------------------------------------
17 | first lambda .6635375 0 -0.0056 .494403
45 | lambda before .0490402 11 0.0789 .4528947
* 46 | selected lambda .0446836 11 0.0790 .4528263
47 | lambda after .040714 11 0.0785 .4530514
116 | last lambda .0000664 16 0.0490 .4675954
--------------------------------------------------------------------------
* lambda selected by cross-validation in final adaptive step.
. est store logit_adp
.
. *linear
. lasso linear est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)
Lasso linear model No. of obs = 582
No. of covariates = 58
Selection: Adaptive No. of lasso steps = 2
Final adaptive step results
--------------------------------------------------------------------------
| No. of Out-of- CV mean
| nonzero sample prediction
ID | Description lambda coef. R-squared error
---------+----------------------------------------------------------------
21 | first lambda .2273381 0 -0.0040 .0627684
56 | lambda before .0087605 13 0.0419 .0598975
* 57 | selected lambda .0079823 13 0.0421 .0598853
58 | lambda after .0072732 13 0.0420 .0598922
108 | last lambda .0000694 17 0.0350 .0603292
--------------------------------------------------------------------------
* lambda selected by cross-validation in final adaptive step.
. est store linear_adp
.
.
. * Elastic net
. * -----------
.
. *logit
. elasticnet logit est_pv $allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)
Elastic net logit model No. of obs = 582
No. of covariates = 58
Selection: Cross-validation No. of CV folds = 10
-------------------------------------------------------------------------------
| No. of Out-of-
| nonzero sample CV mean
alpha ID | Description lambda coef. dev. ratio deviance
---------------+---------------------------------------------------------------
0.800 |
1 | first lambda .1882653 0 -0.0051 .4942004
18 | last lambda .0470663 0 -0.0110 .4970868
---------------+---------------------------------------------------------------
0.600 |
19 | first lambda .1882653 0 -0.0051 .4942004
32 | last lambda .0627551 0 -0.0100 .4966058
---------------+---------------------------------------------------------------
0.400 |
33 | first lambda .1882653 0 -0.0051 .4942004
42 | last lambda .0857702 2 -0.0113 .4972324
---------------+---------------------------------------------------------------
0.200 |
43 | first lambda .1882653 0 -0.0072 .4952196
59 | lambda before .0474719 26 0.0312 .4763224
* 60 | selected lambda .0470663 27 0.0313 .4763048
61 | lambda after .0428851 29 0.0312 .4763404
64 | last lambda .032441 33 0.0259 .4789148
-------------------------------------------------------------------------------
* alpha and lambda selected by cross-validation.
. est store logit_enet
.
. *linear
. elasticnet linear est_pv $allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)
Elastic net linear model No. of obs = 582
No. of covariates = 58
Selection: Cross-validation No. of CV folds = 10
-------------------------------------------------------------------------------
| No. of Out-of- CV mean
| nonzero sample prediction
alpha ID | Description lambda coef. R-squared error
---------------+---------------------------------------------------------------
0.800 |
1 | first lambda .1882653 0 -0.0025 .0626783
18 | last lambda .0470663 0 -0.0058 .0628851
---------------+---------------------------------------------------------------
0.600 |
19 | first lambda .1882653 0 -0.0025 .0626783
33 | last lambda .0571801 2 -0.0080 .0630211
---------------+---------------------------------------------------------------
0.400 |
34 | first lambda .1882653 0 -0.0025 .0626783
43 | last lambda .0857702 2 -0.0079 .0630115
---------------+---------------------------------------------------------------
0.200 |
44 | first lambda .1882653 0 -0.0054 .0628602
57 | lambda before .0627551 16 0.0077 .062041
* 58 | selected lambda .0571801 19 0.0079 .0620259
59 | lambda after .0521004 21 0.0076 .0620477
65 | last lambda .032441 28 0.0048 .0622229
-------------------------------------------------------------------------------
* alpha and lambda selected by cross-validation.
. est store linear_enet
.
.
.
. * Results (selected variables & goodness of fit)
. * ----------------------------------------------
. * Note: Table 11 contains the following results:
.
.
. * Selected variables
. * ------------------
.
. lassocoef logit_cv linear_cv logit_adp linear_adp logit_enet linear_enet, ///
> sort(coef, standardized)
----------------------------------------------------------------------------------------
| logit_cv linear_cv logit_adp linear_adp logit_enet linear_enet
------------------+---------------------------------------------------------------------
_cons | x x x x x x
est_mipyme_p | x x x x x x
est_propio | x x x x x x
est_regulador | x x x x x x
est_tv | x x x x x x
est_calentsol | x x x x x x
est_luzint_sensor | x x x x x x
comercio_men | x x x x x x
est_ventilador | x x x x x x
est_calef_ele | x x x x x x
est_bomba | x x x x x x
est_luz_sensor | x x x x
est_opwkend | x x x x x x
est_aisl_techo | x x x x x
est_hab3 | x x x x x
est_gasdiesel | x x x x
est_gasto_conoce | x x x x
est_accionH_panel | x x x
est_oplun_vie | x x
est_calefaccion | x
est_AC_meses_6 | x
est_oplun_dom | x x
est_mipyme_mi | x
est_lntrab | x
est_conoce | x
est_tarifa_v3 | x
est_aisl_ventana | x
est_generador | x
----------------------------------------------------------------------------------------
Legend:
b - base level
e - empty cell
o - omitted
x - estimated
.
. * Goodness of fit
. * ---------------
.
. *Logit: postselection coefficients (Not reported in the paper. See footnote 23)
. lassogof logit_cv logit_adp logit_enet, over(sample1) postselection
Postselection coefficients
-------------------------------------------------------------
| Deviance
Name sample1 | Deviance ratio Obs
------------------------+------------------------------------
logit_cv |
Training | .3871364 0.2126 582
Validation | .5684995 0.1374 196
------------------------+------------------------------------
logit_adp |
Training | .3931777 0.1944 588
Validation | .6020389 0.0866 196
------------------------+------------------------------------
logit_enet |
Training | .3740188 0.2393 582
Validation | .5549257 0.1580 196
-------------------------------------------------------------
.
. *Logit: penalized coefficients
. lassogof logit_cv logit_adp logit_enet, over(sample1)
Penalized coefficients
-------------------------------------------------------------
| Deviance
Name sample1 | Deviance ratio Obs
------------------------+------------------------------------
logit_cv |
Training | .4198183 0.1461 582
Validation | .5908799 0.1035 196
------------------------+------------------------------------
logit_adp |
Training | .4010848 0.1782 588
Validation | .5929706 0.1003 196
------------------------+------------------------------------
logit_enet |
Training | .4171508 0.1516 582
Validation | .5842422 0.1136 196
-------------------------------------------------------------
.
. *Linear: postselection coefficients
. lassogof linear_cv linear_adp linear_enet, over(sample1) postselection
Postselection coefficients
-------------------------------------------------------------
Name sample1 | MSE R-squared Obs
------------------------+------------------------------------
linear_cv |
Training | .0551813 0.1174 582
Validation | .0805131 0.1213 196
------------------------+------------------------------------
linear_adp |
Training | .0554396 0.1132 582
Validation | .0809207 0.1169 196
------------------------+------------------------------------
linear_enet |
Training | .0551286 0.1182 582
Validation | .0801774 0.1250 196
-------------------------------------------------------------
Response : est_pv
Selected predictors: est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_tec
> ho est_tv est_calef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_
> dom est_AC_meses_6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend
> est_bomba est_lntrab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce est_luz_sensor
> est_oplun_vie
Optimal models:
# Preds R2ADJ C AIC AICC BIC
1 .0288639 71.88853 120.1136 120.1446 129.4271
2 .0457239 58.13183 107.4848 107.5366 121.455
3 .057392 48.92366 98.90898 98.9867 117.5359
4 .0686766 40.07437 90.53296 90.64191 113.8166
5 .0792588 31.85912 82.6352 82.78066 110.5756
6 .0861862 26.83056 77.75122 77.93847 110.3483
7 .0919189 22.85007 73.8454 74.07978 111.0992
8 .097318 19.17037 70.19481 70.48164 112.1053
9 .1024355 15.74561 66.75925 67.10389 113.3265
10 .1057263 13.90499 64.88789 65.29574 116.1119
11 .1077097 13.19593 64.14553 64.62197 120.0262
12 .1097064 12.47983 63.38628 63.93674 123.9237
13 .1119995 11.51404 62.36218 62.9921 127.5563
14 .1138601 10.9238 61.71135 62.4262 131.5622
15 .115093 10.87461 61.60781 62.41307 136.1154
16 .1155846 11.46141 62.1538 63.05499 141.3182
17 .1158985 12.20088 62.85459 63.85722 146.6757
18 .1159928 13.12819 63.74725 64.8569 152.2251
19 .1159156 14.20163 64.78951 66.01173 157.924
20 .1156514 15.4338 65.99494 67.33533 163.7862
21 .1150141 16.98192 67.52695 68.99114 169.9749
22 .1142892 18.60286 69.13415 70.72778 176.2389
23 .1134968 20.2793 70.79871 72.52743 182.5601
24 .1124984 22.12814 72.64194 74.51145 189.0601
25 .1114506 24.01637 74.52602 76.54202 195.6009
26 .1102783 26.00727 76.51658 78.6848 202.2482
27 .1091006 28 78.50903 80.83523 208.8974
predictors for each model:
1 : est_calentsol
2 : est_calentsol est_mipyme_p
3 : est_calentsol est_hab3 est_mipyme_p
4 : est_calentsol est_hab3 est_regulador est_mipyme_p
5 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p
6 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_tv
7 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv
8 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_calef_ele
> est_opwkend
9 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_opwkend
10 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_luzint_sensor est_opwkend
11 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele comercio_men est_luzint_sensor est_opwkend
12 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi comercio_men est_oplun_dom est_luzint_sensor
13 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi comercio_men est_accionH_panel est_oplun_dom est_luzint_sensor
14 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_luzint_se
> nsor
15 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor
16 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel
17 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana
18 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_gasto_conoce
19 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_opwkend est_gasto_conoce
20 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_gasto_conoce
21 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_lntrab est_ga
> sto_conoce
22 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_gasto_conoce
23 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce
24 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3
25 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce
26 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce est_luz_sensor
27 : est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce est_luz_sensor est_oplun_vie
期刊排版
1:最佳预测指标
Hancevic, P. I., & Sandoval, H. H. (2023).表11包含了每个过程获得的结果。由于主要目标是建立一个适合预测的模型,通常只报告每种方法选择的变量,而不提供系数或p值。要考虑的最重要的问题是哪个模型在样本外预测方面表现更好。表的底部报告了三种lasso方法的样本内(训练样本)和样本外(验证样本)预测性能。专注于使用样本外均方误差(MSE)和偏差分别作为线性和非线性模型预测能力的度量。对于线性模型,使用后选择系数估计来计算拟合优度,而对于logit模型,使用惩罚系数估计。
后选择系数(post-selection coefficients)是通过采用lasso选择的协变量并使用无惩罚估计器(例如,OLS或逻辑回归)重新估计系数来估计的。惩罚系数是用套索估算的收缩系数。在线性模型中,在大多数情况下,选择后系数理论上比惩罚系数更有利于预测(Belloni和Chernozhukov,2013;Belloni等,2012)。然而,在非线性模型中使用它们没有理论依据。因此在logit模型中使用惩罚系数。如果使用后选择系数,主要结果保持不变。
如表所示,三种lasso方法产生的样本外均方差和偏差值非常相似。线性模型的误差误差在小数点后第四位之前存在差异,而非线性模型的误差误差在小数点后第二位之前存在差异。无论使用哪种模型,自适应套索具有最高的样本外MSE和偏差,因此性能最差,而弹性网具有最低的MSE和偏差,性能最好。
在线性模型中,弹性网选择的变量最多,共20个。交叉验证lasso选择这些变量的一个子集(18个变量),自适应lasso选择这个子集的一个子集(14个变量)。弹性网比交叉验证lasso选择更多的变量并不奇怪,因为一些变量是冗余的,因此高度相关。弹性网结合了岭回归和lasso。当变量彼此高度相关时,lasso惩罚会降低许多相关变量,而脊惩罚会缩小相关变量之间的系数(Cameron和Trivedi,2022)。自适应套索将选择CV套索变量的子集,因为它应用了第二步交叉验证lasso,从而删除了更多的变量。在非线性模型中也观察到类似的模式。弹性网选择最大数量的变量,交叉验证lasso选择这些变量的一个子集,自适应lasso选择这个子集的一个子集。然而,在这种情况下,弹性网选择的变量数量要大得多,总共有28个。
表11中的最后一列根据基于AIC标准的跨越式方法报告了变量的最佳子集。该方法选择了15个变量,除了两个变量外,这些变量与其他三种采用线性模型的方法所选择的变量重叠。
有趣的是,这四个程序倾向于从主要类别中选择至少一个变量:公司特征、建筑特征、供暖系统、电气设备库存,以及从行为变量集和与态度相关的变量集中选择。也就是说,没有单一类别的变量主导选择,这表明采用光伏系统的决定更加复杂,因为必须考虑业务的多个方面。
虽然对于预测来说,重要的是一组变量作为一个整体,但值得注意的是,数据驱动方法所选择的大多数变量与之前研究光伏采用决定因素的文献所选择的变量一致。这种重叠为对这些决定因素的结果提供了进一步的信心。特别是,建筑所有权状态、周末运营、贸易部门运营以及电加热系统、电视、电压调节器/稳定器和太阳能热水器的存在都是由机器学习方法选择的变量,这些变量对光伏的采用具有重要而积极的影响。
2:潜在环境影响
Hancevic, P. I., & Sandoval, H. H. (2023).本节的目的是量化太阳能电池板对环境的影响。这是通过估算来自电网的电力消耗节省以及相关排放的减少来实现的。然后,将观察到的节省与如果所有采用太阳能电池板有利可图的公司都采用太阳能电池板所产生的潜在节省进行比较。此外,我们用货币来计算节约,使用当前的电价以及不同的社会边际成本来纠正零售定价中现有的扭曲。从社会角度看,电力用户支付的边际价格可能远低于或高于社会边际成本,造成这种扭曲的因素多种多样。市场的分销和商业化部分提供了规模经济。此外,时变的管制关税与生产成本不匹配。此外,电价并没有充分考虑到发电过程中空气污染的外部性成本。电力定价的这些方面各有不同的方向,有时甚至可以相互抵消。本文中电力的社会边际成本基于与Borenstein和Bushnell(2022)以及Hancevic和Sandoval(2022)类似的方法,其中社会边际成本包括三个主要组成部分:电力的私人边际成本(节点价格),与空气污染相关的外部性成本,以及分配和商业化成本。
表12第1行显示了样本公司二氧化碳当量排放量的年减少吨数和总排放量的百分比。第(I)列显示了归因于观察到的采用者的减少。(II)到(IV)栏也包括潜在的采用者,即在美国,那些不采用太阳能电池板的公司将从投资太阳能电池板中获利。具体来说,第(II)列考虑了在当前电价方案下的盈利采用,而第(III)和(IV)列考虑了电力的社会成本,其中碳税分别为每吨二氧化碳50美元和100美元。表12的第2行以货币形式显示了相关的成本节约。计算社会成本节约的(III)和(IV)列考虑了当地边际价格(即节点价格)、分配成本和排放的外部性成本。
在抽样的公司中,目前采用的太阳能光伏系统减少了大约12%的总排放量。如果把潜在的采用者也包括在内,在目前的电价方案下,减排可达到79%,而在电价反映电力的社会边际成本时,减排可达到59%至98%。以货币计算的社会储蓄每年约为500万比索(第I栏)。第II-IV栏再次考虑到潜在的采用者。根据目前的电价方案,每年可节省4500万比索以上的私人费用,而每年可节省2900万至5500万比索的社会费用。
文献列表
- Belloni, A., & Chernozhukov, V. (2013).Least squares after model selection in high-dimensional sparse models. Bernoulli, 19(2), 521–547.
- Belloni, A., et al. (2012). Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain. Econometrica, 80(6), 2369–2429.
- Borenstein S., Bushnell J.B. (2022). Do two electricity pricing wrongs make a right? Cost recovery, externalities, and efficiency. American Economic Journal: Economic Policy, 14 (4) , pp. 80-110.
- Cameron A.C., Trivedi P.K. (2022). Microeconometrics Using Stata (Vol. 2). Stata Press College Station, TX.
- Davidson, C., Drury, E., Lopez, A., Elmore, R., & Margolis, R. (2014).Modeling photovoltaic diffusion: an analysis of geospatial datasets. Environmental Research Letters, 9(7), 074009.
- Furnival, G. M., & Wilson, R. W. (2000).Regressions by Leaps and Bounds. Technometrics, 42(1), 69.
- Gluzmann, P., & Panigo, D. (2015).Global Search Regression: A New Automatic Model-selection Technique for Cross-section, Time-series, and Panel-data Regressions. The Stata Journal: Promoting Communications on Statistics and Stata, 15(2), 325–349.
- Hancevic, P. I., & Sandoval, H. H. (2022). Low-income energy efficiency programs and energy consumption. Journal of Environmental Economics and Management, 113.
- Hancevic, P. I., & Sandoval, H. H. (2023). Solar panel adoption among Mexican small and medium-sized commercial and service businesses
- Appendix B. Supplementary data【数据+Stata】
- Lindsey, C., & Sheather, S. (2010). Variable Selection in Linear Regression. The Stata Journal: Promoting Communications on Statistics and Stata, 10(4), 650–669.
(完)