社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  机器学习算法

Stata学习:如何用机器学习获得最佳预测指标?

Stata与R学习 • 1 年前 • 260 次点击  

文献来源

机器学习一般采用Python/R/Matlab实现,而用Stata的文章少之又少。特此整理以供参考!

Hancevic和Sandoval(2023)遵循理论和文献构建用于检查光伏(PV)采用概率背后决定因素的变量。作者采用数据驱动的方法,使用机器学习方法来获得光伏采用的最佳预测指标,建立一个能够做出良好预测的模型,并增加对光伏采用的潜在驱动因素的理解。这些变量中有许多是冗余的,并且高度相关。

作者使用交叉验证lasso、自适应lasso、弹性网的线性和逻辑模型进行预测,即选择与一个数据集中的结果相关的变量,并测试相同的变量是否预测另一个数据集中的结果。

  • 交叉验证lasso(CV lasso)选择 \lambda^* (lasso惩罚参数)并找到最小化样本外预测误差的模型,也称为交叉验证函数。作者使用100个 \lambda 的默认网格,它们是均匀间隔的。
  • 自适应lasso(adaptive lasso)由多个lasso组成,每个lasso步骤使用交叉验证选择一个 \lambda^* 。在每次lasso之后,去掉系数为0的变量,并给予剩余变量惩罚权重,旨在将小系数驱动为零。作者使用两个lasso的默认值,因为通常选择的 \lambda^* 在第二个lasso之后不会改变。
  • 弹性网(elastic net)通过使用更一般的惩罚项来扩展lasso,该惩罚项是lasso使用的绝对值惩罚和脊回归使用的平方惩罚的混合,使得系数估计对高度相关协变量的存在更加稳健(Zou和Hastie,2005)。为了拟合模型,作者指定(0.2,0.4,0.6,0.8)作为候选 \alpha 的集合和 \lambda 值的默认网格。对 (\alpha, \lambda) 值的组合集执行CV,并选择CV函数最小的对。

由于作者想要在一个不用于拟合模型的样本上评估预测,首先将样本随机分成训练(75%)和验证(25%)样本。接下来,在训练样本上拟合模型,并在验证样本上测试它们的预测,即检验拟合优度并比较它们的样本外预测能力。

此外,作者还使用跨越式算法(leaps-and-bounds algorithm)(Furnival和Wilson,2000;Lindsey和Sheather,2010)进行线性回归的变量选择,该算法根据信息标准解决了最具预测性的变量子集。然而,该算法的主要特性并不适用于样本外模型选择问题(Gluzmann和Panigo,2015)。作者将这一过程作为结果的一部分进行比较,因为它曾在住宅文献中使用过(Davidson等,2014)。

示例代码

下载包:

ssc install vselect

关键函数:

  • vl:管理变量列表
  • splitsample:将数据分成随机样本
  • lasso:用lasso模型进行预测和模型选择
  • elasticnet:用弹性网模型进行预测和模型选择
  • lassocoef:lasso模型估算结果后显示系数
  • lassogof:lasso模型预测后的拟合优度
  • vselect:实现跨越式算法

运行代码:

cd C:\Download\1-s2.0-S0140988323004772-mmc1\Replication3
use Base_Est_Final_L_ML, clear
eststo clear

* 变量列表
vl create allvars = (est_tarifa_v3 est_lntrab est_mipyme_mi est_mipyme_p est_propio comercio comercio_men ///
    est_opwkend est_op4dom est_oplun_vie est_oplun_sab est_oplun_dom est_techo_ext_v1 est_elevador ///
    est_ampliacion_v1 est_colinda_v1 est_aisl_estructura est_aisl_techo est_aisl_pared est_aisl_ventana ///
    est_aisl_puerta est_aislamiento est_AC est_calef_ele est_calefaccion est_espacio_oficina est_espacio_clientes ///
    est_espacio_deposito mun_ags est_luz_sensor est_luzext_sensor est_luzint_sensor est_computer est_impresora ///
    est_equip_oficina est_servidor est_generador est_regulador est_ventilador est_refri_com est_refri est_cocina ///
    est_refri_dom est_micro est_cafe est_tv est_bomba est_gasdiesel est_cambio_g1ene est_conoce est_kwh_conoce ///
    est_gasto_conoce est_AC_meses_6 est_AC_tempf est_calentsol est_hab3 est_accionH_panel est_accionH_apagar)

* 样本分割
splitsample, generate(sample1) split(.75 .25) rseed(13531)
la def slabel 1 "Training" 2 "Validation"
la values sample1 slabel
ta sample1, m

* 交叉验证
lasso logit est_pv $allvars if sample1==1, nolog rseed(13531)
est store logit_cv
lasso linear est_pv $allvars if sample1==1, nolog rseed(13531)
est store linear_cv

* 自适应
lasso logit est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)
est store logit_adp
lasso linear est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)
est store linear_adp

* 弹性网
elasticnet logit est_pv $allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)
est store logit_enet
elasticnet linear est_pv


    
 $allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)
est store linear_enet

* 选择变量
lassocoef logit_cv linear_cv logit_adp linear_adp logit_enet linear_enet, ///
    sort(coef, standardized)

* 拟合优度
lassogof logit_cv logit_adp logit_enet, over(sample1) postselection
lassogof logit_cv logit_adp logit_enet, over(sample1) 
lassogof linear_cv linear_adp linear_enet, over(sample1) postselection

* 跨越式算法 (所选变量的并集) 生成表11的最后一列:
vl create selectedvars = (est_tarifa_v3 est_lntrab est_mipyme_mi est_mipyme_p est_propio comercio_men ///
    est_opwkend est_oplun_vie est_oplun_dom est_aisl_techo est_aisl_ventana est_calef_ele est_calefaccion ///
    est_luz_sensor est_luzint_sensor est_generador est_regulador est_ventilador est_tv est_bomba ///
    est_gasdiesel est_conoce est_gasto_conoce est_AC_meses_6 est_calentsol est_hab3 est_accionH_panel)
vselect est_pv $selectedvars, best

得到结果




    





    
    sample1 |      Freq.     Percent        Cum.
------------+-----------------------------------
   Training |        588       75.00       75.00
 Validation |        196       25.00      100.00
------------+-----------------------------------
      Total |        784      100.00

. 
. 
. * Cross-validation
. * ----------------
. 
. *logit
. lasso logit est_pv $allvars if sample1==1, nolog rseed(13531)

Lasso logit model                           No. of obs        =        582
                                            No. of covariates =         58
Selection: Cross-validation                 No. of CV folds   =         10

--------------------------------------------------------------------------
         |                                No. of      Out-of-
         |                               nonzero       sample      CV mean
      ID |     Description      lambda     coef.   dev. ratio     deviance
---------+----------------------------------------------------------------
       1 |    first lambda    .0376531         0      -0.0118     .4974787
      11 |   lambda before    .0148511        15       0.0124     .4855901
    * 12 | selected lambda    .0135318        16       0.0139     .4848208
      13 |    lambda after    .0123297        16       0.0136     .4849987
      16 |     last lambda     .009327        22       0.0070     .4882351
--------------------------------------------------------------------------
* lambda selected by cross-validation.

. est store logit_cv

. 
. *linear
. lasso linear est_pv $allvars if sample1==1, nolog rseed(13531)

Lasso linear model                          No. of obs        =        582
                                            No. of covariates =         58
Selection: Cross-validation                 No. of CV folds   =         10

--------------------------------------------------------------------------
         |                                No. of      Out-of-      CV mean
         |                               nonzero       sample   prediction
      ID |     Description      lambda     coef.    R-squared        error
---------+----------------------------------------------------------------
       1 |    first lambda    .0376531         0      -0.0059      .062887
      13 |   lambda before    .0123297        16       0.0065     .0621143
    * 14 | selected lambda    .0112343        17       0.0065      .062111
      15 |    lambda after    .0102363        22       0.0061     .0621378
      20 |     last lambda    .0064287        28       0.0030     .0623309
--------------------------------------------------------------------------
* lambda selected by cross-validation.

. est store linear_cv

. 
. 
. * Adaptive
. * --------
. 
. *logit
. lasso logit est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)

Lasso logit model                          No. of obs         =        582
                                           No. of covariates  =         58
Selection: Adaptive                        No. of lasso steps =          2

Final adaptive step results
--------------------------------------------------------------------------
         |                                No. of      Out-of-
         |                               nonzero       sample      CV mean
      ID |     Description      lambda     coef.   dev. ratio     deviance
---------+----------------------------------------------------------------
      17 |    first lambda    .6635375         0      -0.0056      .494403
      45 |   lambda before    .0490402        11       0.0789     .4528947
    * 46 | selected lambda    .0446836        11       0.0790     .4528263
      47 |    lambda after     .040714        11       0.0785     .4530514
     116 |     last lambda    .0000664        16       0.0490     .4675954
--------------------------------------------------------------------------
* lambda selected by cross-validation in final adaptive step.

. est store logit_adp

. 
. *linear
. lasso linear est_pv $allvars if sample1==1, nolog select(adaptive) rseed(13531)

Lasso linear model                         No. of obs         =        582
                                           No. of covariates  =         58
Selection: Adaptive                        No. of lasso steps =          2

Final adaptive step results
--------------------------------------------------------------------------
         |                                No. of      Out-of-      CV mean
         |                               nonzero       sample   prediction
      ID |     Description      lambda     coef.    R-squared        error
---------+----------------------------------------------------------------
      21 |    first lambda    .2273381         0      -0.0040     .0627684
      56 |   lambda before    .0087605        13       0.0419     .0598975
    * 57 | selected lambda    .0079823        13       0.0421     .0598853
      58 |    lambda after    .0072732        13       0.0420     .0598922
     108 |     last lambda    .0000694        17       0.0350     .0603292
--------------------------------------------------------------------------
* lambda selected by cross-validation in final adaptive step.

. est store linear_adp

. 
. 
. * Elastic net
. * -----------
. 
. *logit
. elasticnet logit est_pv $allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)

Elastic net logit model                          No. of obs        =        582
                                                 No. of covariates =         58
Selection: Cross-validation                      No. of CV folds   =         10

-------------------------------------------------------------------------------
               |                               No. of      Out-of-
               |                              nonzero       sample      CV mean
alpha       ID |     Description      lambda    coef.   dev. ratio     deviance
---------------+---------------------------------------------------------------
0.800          |
             1 |    first lambda    .1882653        0      -0.0051     .4942004
            18 |     last lambda    .0470663        0      -0.0110     .4970868
---------------+---------------------------------------------------------------
0.600          |
            19 |    first lambda    .1882653        0      -0.0051     .4942004
            32 |     last lambda    .0627551        0      -0.0100     .4966058
---------------+---------------------------------------------------------------
0.400          |
            33 |    first lambda    .1882653        0      -0.0051     .4942004
            42 |     last lambda    .0857702        2      -0.0113     .4972324
---------------+---------------------------------------------------------------
0.200          |
            43 |    first lambda    .1882653        0      -0.0072     .4952196
            59 |   lambda before    .0474719       26       0.0312     .4763224
          * 60 | selected lambda    .0470663       27       0.0313     .4763048
            61 |    lambda after    .0428851       29       0.0312     .4763404
            64 |     last lambda     .032441       33       0.0259     .4789148
-------------------------------------------------------------------------------
* alpha and lambda selected by cross-validation.

. est store logit_enet

. 
. *linear
. elasticnet linear est_pv $allvars if sample1==1, nolog alpha(.2 .4 .6 .8) rseed(13531)

Elastic net linear model                         No. of obs        =        582
                                                 No. of covariates =         58
Selection: Cross-validation                      No. of CV folds   =         10

-------------------------------------------------------------------------------
               |                               No. of      Out-of-      CV mean
               |                              nonzero       sample   prediction
alpha       ID |     Description      lambda    coef.    R-squared        error
---------------+---------------------------------------------------------------
0.800          |
             1 |    first lambda    .1882653        0      -0.0025     .0626783
            18 |     last lambda    .0470663        0      -0.0058     .0628851
---------------+---------------------------------------------------------------
0.600          |
            19 |    first lambda    .1882653        0      -0.0025     .0626783
            33 |     last lambda    .0571801        2      -0.0080     .0630211
---------------+---------------------------------------------------------------
0.400          |
            34 |    first lambda    .1882653        0      -0.0025     .0626783
            43 |     last lambda    .0857702        2      -0.0079     .0630115
---------------+---------------------------------------------------------------
0.200          |
            44 |    first lambda    .1882653        0      -0.0054     .0628602
            57 |   lambda before    .0627551       16       0.0077      .062041
          * 58 | selected lambda    .0571801       19       0.0079     .0620259
            59 |    lambda after    .0521004       21       0.0076     .0620477
            65 |     last lambda     .032441       28       0.0048     .0622229
-------------------------------------------------------------------------------
* alpha and lambda selected by cross-validation.

. est store linear_enet

. 
. 
. 
. * Results (selected variables & goodness of fit)
. * ----------------------------------------------
. * Note: Table 11 contains the following results:
. 
. 
. * Selected variables
. * ------------------
. 
. lassocoef logit_cv linear_cv logit_adp linear_adp logit_enet linear_enet, ///
>     sort(coef, standardized)

----------------------------------------------------------------------------------------
                  | logit_cv  linear_cv  logit_adp  linear_adp  logit_enet  linear_enet 
------------------+---------------------------------------------------------------------
            _cons |     x         x          x           x           x           x      
     est_mipyme_p |     x         x          x           x           x           x      
       est_propio |     x         x          x           x           x           x      
    est_regulador |     x         x          x           x           x           x      
           est_tv |     x         x          x           x           x           x      
    est_calentsol |     x         x          x           x           x           x      
est_luzint_sensor |     x         x          x           x           x           x      
     comercio_men |     x         x          x           x           x           x      
   est_ventilador |     x         x          x           x           x           x      
    est_calef_ele |     x         x          x           x           x           x      
        est_bomba |     x         x          x           x           x           x      
   est_luz_sensor |     x         x                                  x           x      
      est_opwkend |     x         x          x           x           x           x      
   est_aisl_techo |     x         x                      x           x           x      
         est_hab3 |     x         x                      x           x           x      
    est_gasdiesel |     x         x                                  x           x      
 est_gasto_conoce |     x         x                                  x           x      
est_accionH_panel |               x                                  x           x      
    est_oplun_vie |                                                  x           x      
  est_calefaccion |                                                  x     
   est_AC_meses_6 |                                                  x     
    est_oplun_dom |                                                  x           x      
    est_mipyme_mi |                                                  x     
       est_lntrab |                                                  x     
       est_conoce |                                                  x     
    est_tarifa_v3 |                                                  x     
 est_aisl_ventana |                                                  x     
    est_generador |                                                  x     
----------------------------------------------------------------------------------------
Legend:
  b - base level
  e - empty cell
  o - omitted
  x - estimated

. 
. * Goodness of fit
. * ---------------
. 
. *Logit: postselection coefficients (Not reported in the paper. See footnote 23)
. lassogof logit_cv logit_adp logit_enet, over(sample1) postselection

Postselection coefficients
-------------------------------------------------------------
                        |                 Deviance
Name            sample1 |    Deviance        ratio        Obs
------------------------+------------------------------------
logit_cv                |
               Training |    .3871364       0.2126        582
             Validation |    .5684995       0.1374        196
------------------------+------------------------------------
logit_adp               |
               Training |    .3931777       0.1944        588
             Validation |    .6020389       0.0866        196
------------------------+------------------------------------
logit_enet              |
               Training |    .3740188       0.2393        582
             Validation |    .5549257       0.1580        196
-------------------------------------------------------------

. 
. *Logit: penalized coefficients
. lassogof logit_cv logit_adp logit_enet, over(sample1) 

Penalized coefficients
-------------------------------------------------------------
                        |                 Deviance
Name            sample1 |    Deviance        ratio        Obs
------------------------+------------------------------------
logit_cv                |
               Training |    .4198183       0.1461        582
             Validation |    .5908799       0.1035        196
------------------------+------------------------------------
logit_adp               |
               Training |    .4010848       0.1782        588
             Validation |    .5929706       0.1003        196
------------------------+------------------------------------
logit_enet              |
               Training |    .4171508       0.1516        582
             Validation |    .5842422       0.1136        196
-------------------------------------------------------------

. 
. *Linear: postselection coefficients
. lassogof linear_cv linear_adp linear_enet, over(sample1) postselection

Postselection coefficients
-------------------------------------------------------------
Name            sample1 |         MSE    R-squared        Obs
------------------------+------------------------------------
linear_cv               |
               Training |    .0551813       0.1174        582
             Validation |    .0805131       0.1213        196
------------------------+------------------------------------
linear_adp              |
               Training |    .0554396       0.1132        582
             Validation |    .0809207       0.1169        196
------------------------+------------------------------------
linear_enet             |
               Training |    .0551286       0.1182        582
             Validation |    .0801774       0.1250        196
-------------------------------------------------------------

Response :             est_pv
Selected predictors:   est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_tec
> ho est_tv est_calef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_
> dom est_AC_meses_6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend 
> est_bomba est_lntrab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce est_luz_sensor 
> est_oplun_vie

Optimal models: 

   # Preds     R2ADJ         C       AIC      AICC       BIC
         1  .0288639  71.88853  120.1136  120.1446  129.4271
         2  .0457239  58.13183  107.4848  107.5366   121.455
         3   .057392  48.92366  98.90898   98.9867  117.5359
         4  .0686766  40.07437  90.53296  90.64191  113.8166
         5  .0792588  31.85912   82.6352  82.78066  110.5756
         6  .0861862  26.83056  77.75122  77.93847  110.3483
         7  .0919189  22.85007   73.8454  74.07978  111.0992
         8   .097318  19.17037  70.19481  70.48164  112.1053
         9  .1024355  15.74561  66.75925  67.10389  113.3265
        10  .1057263  13.90499  64.88789  65.29574  116.1119
        11  .1077097  13.19593  64.14553  64.62197  120.0262
        12  .1097064  12.47983  63.38628  63.93674  123.9237
        13  .1119995  11.51404  62.36218   62.9921  127.5563
        14  .1138601   10.9238  61.71135   62.4262  131.5622
        15   .115093  10.87461  61.60781  62.41307  136.1154
        16  .1155846  11.46141   62.1538  63.05499  141.3182
        17  .1158985  12.20088  62.85459  63.85722  146.6757
        18  .1159928  13.12819  63.74725   64.8569  152.2251
        19  .1159156  14.20163  64.78951  66.01173   157.924
        20  .1156514   15.4338  65.99494  67.33533  163.7862
        21  .1150141  16.98192  67.52695  68.99114  169.9749
        22  .1142892  18.60286  69.13415  70.72778  176.2389
        23  .1134968   20.2793  70.79871  72.52743  182.5601
        24  .1124984  22.12814  72.64194  74.51145  189.0601
        25  .1114506  24.01637  74.52602  76.54202  195.6009
        26  .1102783  26.00727  76.51658   78.6848  202.2482
        27  .1091006        28  78.50903  80.83523  208.8974

predictors for each model:

1  :  est_calentsol
2  :  est_calentsol est_mipyme_p
3  :  est_calentsol est_hab3 est_mipyme_p
4  :  est_calentsol est_hab3 est_regulador est_mipyme_p
5  :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p
6  :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_tv
7  :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv
8  :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_calef_ele 
> est_opwkend
9  :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_opwkend
10 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_luzint_sensor est_opwkend
11 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele comercio_men est_luzint_sensor est_opwkend
12 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi comercio_men est_oplun_dom est_luzint_sensor
13 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi comercio_men est_accionH_panel est_oplun_dom est_luzint_sensor
14 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_luzint_se
> nsor
15 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor
16 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel
17 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana
18 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_gasto_conoce
19 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_opwkend est_gasto_conoce
20 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_gasto_conoce
21 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_lntrab est_ga
> sto_conoce
22 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_gasto_conoce
23 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce
24 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3
25 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce
26 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce est_luz_sensor
27 :  est_calentsol est_hab3 est_propio est_regulador est_mipyme_p est_aisl_techo est_tv est_cal
> ef_ele est_mipyme_mi est_ventilador comercio_men est_accionH_panel est_oplun_dom est_AC_meses_
> 6 est_luzint_sensor est_gasdiesel est_aisl_ventana est_generador est_opwkend est_bomba est_lnt
> rab est_calefaccion est_gasto_conoce est_tarifa_v3 est_conoce est_luz_sensor est_oplun_vie

期刊排版

1:最佳预测指标

Hancevic, P. I., & Sandoval, H. H. (2023).

表11包含了每个过程获得的结果。由于主要目标是建立一个适合预测的模型,通常只报告每种方法选择的变量,而不提供系数或p值。要考虑的最重要的问题是哪个模型在样本外预测方面表现更好。表的底部报告了三种lasso方法的样本内(训练样本)和样本外(验证样本)预测性能。专注于使用样本外均方误差(MSE)和偏差分别作为线性和非线性模型预测能力的度量。对于线性模型,使用后选择系数估计来计算拟合优度,而对于logit模型,使用惩罚系数估计。

后选择系数(post-selection coefficients)是通过采用lasso选择的协变量并使用无惩罚估计器(例如,OLS或逻辑回归)重新估计系数来估计的。惩罚系数是用套索估算的收缩系数。在线性模型中,在大多数情况下,选择后系数理论上比惩罚系数更有利于预测(Belloni和Chernozhukov,2013;Belloni等,2012)。然而,在非线性模型中使用它们没有理论依据。因此在logit模型中使用惩罚系数。如果使用后选择系数,主要结果保持不变。

如表所示,三种lasso方法产生的样本外均方差和偏差值非常相似。线性模型的误差误差在小数点后第四位之前存在差异,而非线性模型的误差误差在小数点后第二位之前存在差异。无论使用哪种模型,自适应套索具有最高的样本外MSE和偏差,因此性能最差,而弹性网具有最低的MSE和偏差,性能最好。

在线性模型中,弹性网选择的变量最多,共20个。交叉验证lasso选择这些变量的一个子集(18个变量),自适应lasso选择这个子集的一个子集(14个变量)。弹性网比交叉验证lasso选择更多的变量并不奇怪,因为一些变量是冗余的,因此高度相关。弹性网结合了岭回归和lasso。当变量彼此高度相关时,lasso惩罚会降低许多相关变量,而脊惩罚会缩小相关变量之间的系数(Cameron和Trivedi,2022)。自适应套索将选择CV套索变量的子集,因为它应用了第二步交叉验证lasso,从而删除了更多的变量。在非线性模型中也观察到类似的模式。弹性网选择最大数量的变量,交叉验证lasso选择这些变量的一个子集,自适应lasso选择这个子集的一个子集。然而,在这种情况下,弹性网选择的变量数量要大得多,总共有28个。

表11中的最后一列根据基于AIC标准的跨越式方法报告了变量的最佳子集。该方法选择了15个变量,除了两个变量外,这些变量与其他三种采用线性模型的方法所选择的变量重叠。

有趣的是,这四个程序倾向于从主要类别中选择至少一个变量:公司特征、建筑特征、供暖系统、电气设备库存,以及从行为变量集和与态度相关的变量集中选择。也就是说,没有单一类别的变量主导选择,这表明采用光伏系统的决定更加复杂,因为必须考虑业务的多个方面。

虽然对于预测来说,重要的是一组变量作为一个整体,但值得注意的是,数据驱动方法所选择的大多数变量与之前研究光伏采用决定因素的文献所选择的变量一致。这种重叠为对这些决定因素的结果提供了进一步的信心。特别是,建筑所有权状态、周末运营、贸易部门运营以及电加热系统、电视、电压调节器/稳定器和太阳能热水器的存在都是由机器学习方法选择的变量,这些变量对光伏的采用具有重要而积极的影响。

2:潜在环境影响

Hancevic, P. I., & Sandoval, H. H. (2023).

本节的目的是量化太阳能电池板对环境的影响。这是通过估算来自电网的电力消耗节省以及相关排放的减少来实现的。然后,将观察到的节省与如果所有采用太阳能电池板有利可图的公司都采用太阳能电池板所产生的潜在节省进行比较。此外,我们用货币来计算节约,使用当前的电价以及不同的社会边际成本来纠正零售定价中现有的扭曲。从社会角度看,电力用户支付的边际价格可能远低于或高于社会边际成本,造成这种扭曲的因素多种多样。市场的分销和商业化部分提供了规模经济。此外,时变的管制关税与生产成本不匹配。此外,电价并没有充分考虑到发电过程中空气污染的外部性成本。电力定价的这些方面各有不同的方向,有时甚至可以相互抵消。本文中电力的社会边际成本基于与Borenstein和Bushnell(2022)以及Hancevic和Sandoval(2022)类似的方法,其中社会边际成本包括三个主要组成部分:电力的私人边际成本(节点价格),与空气污染相关的外部性成本,以及分配和商业化成本。

表12第1行显示了样本公司二氧化碳当量排放量的年减少吨数和总排放量的百分比。第(I)列显示了归因于观察到的采用者的减少。(II)到(IV)栏也包括潜在的采用者,即在美国,那些不采用太阳能电池板的公司将从投资太阳能电池板中获利。具体来说,第(II)列考虑了在当前电价方案下的盈利采用,而第(III)和(IV)列考虑了电力的社会成本,其中碳税分别为每吨二氧化碳50美元和100美元。表12的第2行以货币形式显示了相关的成本节约。计算社会成本节约的(III)和(IV)列考虑了当地边际价格(即节点价格)、分配成本和排放的外部性成本。

在抽样的公司中,目前采用的太阳能光伏系统减少了大约12%的总排放量。如果把潜在的采用者也包括在内,在目前的电价方案下,减排可达到79%,而在电价反映电力的社会边际成本时,减排可达到59%至98%。以货币计算的社会储蓄每年约为500万比索(第I栏)。第II-IV栏再次考虑到潜在的采用者。根据目前的电价方案,每年可节省4500万比索以上的私人费用,而每年可节省2900万至5500万比索的社会费用。

文献列表

  1. Belloni, A., & Chernozhukov, V. (2013).Least squares after model selection in high-dimensional sparse models. Bernoulli, 19(2), 521–547.
  2. Belloni, A., et al. (2012). Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain. Econometrica, 80(6), 2369–2429.
  3. Borenstein S., Bushnell J.B. (2022). Do two electricity pricing wrongs make a right? Cost recovery, externalities, and efficiency. American Economic Journal: Economic Policy, 14 (4) , pp. 80-110.
  4. Cameron A.C., Trivedi P.K. (2022). Microeconometrics Using Stata (Vol. 2). Stata Press College Station, TX.
  5. Davidson, C., Drury, E., Lopez, A., Elmore, R., & Margolis, R. (2014).Modeling photovoltaic diffusion: an analysis of geospatial datasets. Environmental Research Letters, 9(7), 074009.
  6. Furnival, G. M., & Wilson, R. W. (2000).Regressions by Leaps and Bounds. Technometrics, 42(1), 69.
  7. Gluzmann, P., & Panigo, D. (2015).Global Search Regression: A New Automatic Model-selection Technique for Cross-section, Time-series, and Panel-data Regressions. The Stata Journal: Promoting Communications on Statistics and Stata, 15(2), 325–349.
  8. Hancevic, P. I., & Sandoval, H. H. (2022). Low-income energy efficiency programs and energy consumption. Journal of Environmental Economics and Management, 113.
  9. Hancevic, P. I., & Sandoval, H. H. (2023). Solar panel adoption among Mexican small and medium-sized commercial and service businesses
    1. Appendix B. Supplementary data【数据+Stata】
  10. Lindsey, C., & Sheather, S. (2010). Variable Selection in Linear Regression. The Stata Journal: Promoting Communications on Statistics and Stata, 10(4), 650–669.

(完)

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/160703
 
260 次点击