Stata：面板混合选择模型-cmxtmixlogit

👇 连享会 · 推文导航 | www.lianxh.cn

🍎 Stata：Stata基础 | Stata绘图 | Stata程序 | Stata新命令
📘 论文：数据处理 | 结果输出 | 论文写作 | 数据分享
💹 计量：回归分析 | 交乘项-调节 | IV-GMM | 时间序列 | 面板数据 | 空间计量 | Probit-Logit | 分位数回归
⛳ 专题：SFA-DEA | 生存分析 | 爬虫 | 机器学习 | 文本分析
🔃 因果：DID | RDD | 因果推断 | 合成控制法 | PSM-Matching
🔨 工具：工具软件 | Markdown | Python-R-Stata
🎧 课程：最新专题 | 计量专题 | 关于连享会

🍓 课程推荐：2024 机器学习与因果推断专题
主讲老师：司继春 (上海对外经贸大学) ；张宏亮（浙江大学）
课程时间：2024 年 11 月 9-10 日；16-17日
课程咨询：王老师 18903405450（微信）

课程特色 · 2024机器学习与因果推断：

懂原理、会应用。本次课程邀请了两位老师合作讲授，目的在于最大限度地实现理论与应用的有机结合。为期四天的课程，分成两个部分：第一部分讲解常用的机器学习算法和适用条件，以及文本分析和大语言模型；第二部分通过精讲 4-6 篇发表于 Top 期刊的论文，帮助大家理解各类机器学习算法的应用场景，以及它们与传统因果推断方法的巧妙结合。
以 Top 期刊论文为范例。目前多数人的困惑是不清楚如何将传统因果推断方法与机器学习结合起来。事实上，即便是 MIT 和 Harvard 的大牛们也都在「摸着石头过河」。为此，通过论文精讲和复现来学习这部分内容或许是目前最有效的方式了。张宏亮老师此前在浙江大学按照这一模式教授了「因果推断和机器学习」课程，效果甚佳：学生们能够逐渐建立起研究设计的理念，并在构造识别策略时适当地嵌入机器学习方法。

作者：丁雅文 (北京大学)
邮箱: 1901111380@pku.edu.cn

编者按：本文部分内容参考如下资料，特此致谢！
Source: Joerg Luedicke. 2019. Performing and interpreting discrete choice analyses in Stata. -PDF-

1. 简介
2. 命令介绍
3 案例实操
4. 参考文献
5. 相关推文

温馨提示： 文中链接在微信中无法生效。请点击底部「阅读原文」。或直接长按/扫描如下二维码，直达原文：

1. 简介

离散选择模型 (discrete choice model, DCM) 是研究个体选择行为强有力的分析工具，目前应用较为广泛的 Stata 命令包括 logit、probit、mlogit、nlogit、ologit 等，详情可参考连享会专题推文「Probit-Logit」。

相比条件 logit 模型面临的 IIA 假定与现实不符、难以处理个体偏好异质性等问题，混合 logit 模型通过允许其中一个或多个参数随机分布，对标准的条件 logit 模型进行了拓展。Stata 16 则为离散选择模型引入了一套全新的 cm 命令来实现混合 logit 模型，该命令可以进行各种灵活的边际效应分析，功能更加强大。本篇推送将对这套 cm 命令进行系统性的实操介绍。

2. 命令介绍

Stata 16 为离散选择模型引入的全新 cm 系列命令，主要包括 cmclogit、cmmprobit、cmroprobit、cmrologit、cmmixlogit 和 cmxtmixlogit，因此可以很方便分析任何选择模型的结果。

在开始进行实证分析之前，首先要对数据进行 cmset，即宣布数据是 choice model data。其中：

cmset caseidvar altvar [, force] 表示数据为 cross-sectional choice model data；
cmset panelvar timevar altvar [, tsoptions force] 则表示数据为 panel choice model data。

. use http://www.stata-press.com/data/r16/transport.dta, clear
(Transportation choice data)

. cmset id t alt
note: case identifier _caseid generated from id and t.
note: panel by alternatives identifier _panelaltid generated from id and alt.

                    Panel data: Panels id and time t
              Case ID variable: _caseid
         Alternatives variable: alt
Panel by alternatives variable: _panelaltid (strongly balanced)
                 Time variable: t, 1 to 3
                         Delta: 1 unit

接下来可以使用 cmchoiceset、cmtab、cmsample 等命令对数据进行描述性统计分析。

. **tabulate choice sets
. cmchoiceset

Tabulation of choice-set possibilities
 Choice set |      Freq.     Percent        Cum.
------------+-----------------------------------
    1 2 3 4 |      1,500      100.00      100.00
------------+-----------------------------------
      Total |      1,500      100.00
Note: Total is number of cases.

其中，cmsample 用来检查样本被排除的原因：

.   preserve 
.   replace trcost=. in 5                 
.   replace alt=. in 2                    
.   replace choice=0 if t==3 & id==1     
.   replace income=1 in 1                 
.   cmsample trcost trtime, choice(choice) casevars(age income)

              Reason for exclusion |      Freq.     Percent        Cum.
-----------------------------------+-----------------------------------
             observations included |      5,988       99.80       99.80
     alternatives variable missing |          4        0.07       99.87
             choice variable all 0 |          4        0.07       99.93
casevars not constant within case* |          4        0.07      100.00
-----------------------------------+-----------------------------------
                             Total |      6,000      100.00
* indicates an error

.   restore

在进行完上述分析之后，便可使用下列命令进行各种离散选择模型的实证分析：

cmclogit：conditional logit model (MaFadden’s choice model)
cmmixlogit：mixed logit model
cmxtmixlogit：panel data mixed logit model
cmmporbit：muitinomial probit model
cmroprobit：rank-ordered probit model
cmrologit：rank-ordered logit model

3 案例实操

本部分介绍以 cmxtmixlogit 为例。其中， cmxtmixlogit 命令为 Stata 16 的一项新功能，用来拟合面板数据的混合 logit 模型。下面我们将以 transport.dta 数据为例，来介绍 cmxtmixlogit 命令的使用。首先，运行 cmxtmixlogit 命令分析各种交通出行的成本对人们选择交通方式的影响：

. webuse transport.dta, clear
. cmset id t alt
. cmxtmixlogit choice trcost, casevars(age income) random(trtime) nolog

Mixed logit choice model                     Number of obs        =      6,000
                                             Number of cases      =      1,500
Panel variable: id                           Number of panels     =        500
Time variable: t                             Cases per panel: min =          3
                                                              avg =        3.0
                                                              max =          3
Alternatives variable: alt                   Alts per case:   min =          4
                                                              avg =        4.0
                                                              max =          4
Integration sequence:      Hammersley
Integration points:               594             Wald chi2(8)    =     432.68
Log simulated-likelihood = -1005.9899             Prob > chi2     =     0.0000
------------------------------------------------------------------------------
      choice | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
alt          |
      trcost |     -0.839      0.044   -19.13   0.000       -0.925      -0.753
      trtime |     -1.509      0.264    -5.71   0.000       -2.026      -0.991
-------------+----------------------------------------------------------------
/Normal      |
   sd(trtime)|      1.946      0.259                         1.498       2.527
-------------+----------------------------------------------------------------
Car          |  (base alternative)
-------------+----------------------------------------------------------------
Public       |
         age |      0.154      0.067     2.29   0.022        0.022       0.286
      income |     -0.382      0.035   -10.98   0.000       -0.450      -0.313
       _cons |     -0.576      0.352    -1.64   0.102       -1.265       0.113
-------------+----------------------------------------------------------------
Bicycle      |
         age |      0.206      0.085     2.43   0.015        0.040       0.373
      income |     -0.523      0.046   -11.28   0.000       -0.613      -0.432
       _cons |     -1.137      0.446    -2.55   0.011       -2.012      -0.263
-------------+----------------------------------------------------------------
Walk         |
         age |      0.310      0.107     2.89   0.004        0.100       0.519
      income |     -0.902      0.069   -13.14   0.000       -1.036      -0.767
       _cons |     -0.418      0.561    -0.75   0.456       -1.517       0.681
------------------------------------------------------------------------------

接着，我们就可以运行 margins 命令进行边际效应分析。margins 命令的运行较为灵活。下面举几个例子来具体说明 margins 命令的用法。

例 1：当样本年收入为 30000 美元时，人们选择各种交通方式的期望概率。

. margins, at (income=3)

Predictive margins                                       Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
At: income = 3
------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    _outcome |
        Car  |      0.333      0.020    16.93   0.000        0.295       0.372
     Public  |      0.221      0.018    12.00   0.000        0.185       0.257
    Bicycle  |      0.168      0.018     9.23   0.000        0.132       0.203
       Walk  |      0.278      0.024    11.41   0.000        0.230       0.326
------------------------------------------------------------------------------

例 2：相比年收入为 30000 美元的样本群体，年收入为 40000 美元的样本群体在不同时间选择各种交通方式的期望概率变化。

. margins, at(income=(3 4)) contrast(at(r) nowald) over(t)

Contrasts of predictive margins                  Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
Over:       t
1._at: 1.t
           income = 3
1._at: 2.t
           income = 3
1._at: 3.t
           income = 3
2._at: 1.t
           income = 4
2._at: 2.t
           income = 4
2._at: 3.t
           income = 4



    
---------------------------------------------------------------------
                    |            Delta-method
                    |   Contrast   std. err.     [95% conf. interval]
--------------------+------------------------------------------------
     _at@_outcome#t |
    (2 vs 1) Car#1  |      0.079      0.004         0.071       0.087
    (2 vs 1) Car#2  |      0.083      0.004         0.074       0.091
    (2 vs 1) Car#3  |      0.079      0.004         0.071       0.087
 (2 vs 1) Public#1  |      0.007      0.005        -0.003       0.016
 (2 vs 1) Public#2  |      0.005      0.005        -0.004       0.015
 (2 vs 1) Public#3  |      0.008      0.005        -0.001       0.017
(2 vs 1) Bicycle#1  |     -0.009      0.006        -0.020       0.002
(2 vs 1) Bicycle#2  |     -0.008      0.005        -0.019       0.002
(2 vs 1) Bicycle#3  |     -0.007      0.005        -0.018       0.004
   (2 vs 1) Walk#1  |     -0.077      0.010        -0.097      -0.058
   (2 vs 1) Walk#2  |     -0.079      0.010        -0.099      -0.060
   (2 vs 1) Walk#3  |     -0.080      0.010        -0.099      -0.060
---------------------------------------------------------------------

通过 marginsplot 命令，我们可以进一步将这种随时间变化的期望概率的变化可视化。

. marginsplot
Variables that uniquely identify margins: t _outcome

例 3：在整个收入区间内，样本群体选择各种交通方式的平均期望概率。

. margins,at(income=(1 (1) 16))

Predictive margins                                       Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
1._at:  income =  1
2._at:  income =  2
3._at:  income =  3
......
------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_outcome#_at |
     Car# 1  |      0.187      0.021     8.85   0.000        0.145       0.228
     Car# 2  |      0.256      0.021    12.13   0.000        0.215       0.297
     Car# 3  |      0.333      0.020    16.93   0.000        0.295       0.372
    ......
    Walk#14  |      0.001      0.000     1.90   0.058       -0.000       0.002
    Walk#15  |      0.000      0.000     1.66   0.096       -0.000       0.001
    Walk#16  |      0.000      0.000     1.48   0.140       -0.000       0.000
------------------------------------------------------------------------------

. marginsplot,recast(line) ciopts(recast(rarea) color(%20))
Variables that uniquely identify margins: income _outcome

例 4：如果汽车出行成本增加了 25%，这将如何影响人们选择汽车出行的概率？这对人们选择其他出行方式的概率有什么影响？

. margins, alternative(Car) at(trcost=generate(trcost)) ///   
>   at(trcost=generate(1.25*trcost)) subpop(if t==1)

Predictive margins                                     Number of obs   = 6,000
Model VCE: OIM                                         Subpop. no. obs = 2,000
Expression:  Pr(alt), predict()
Alternative: Car
1._at: trcost =      trcost
2._at: trcost = 1.25*trcost
------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_outcome#_at |
      Car#1  |      0.544      0.011    47.71   0.000        0.522       0.566
      Car#2  |      0.441      0.010    43.61   0.000        0.421       0.460
   Public#1  |      0.201      0.010    19.26   0.000        0.181       0.221
   Public#2  |      0.255      0.012    21.60   0.000        0.232       0.278
  Bicycle#1  |      0.126      0.010    13.14   0.000        0.107       0.144
  Bicycle#2  |      0.157      0.011    14.21   0.000        0.135       0.178
     Walk#1  |      0.130      0.010    12.76   0.000        0.110       0.149
     Walk#2  |      0.148      0.011    13.43   0.000        0.126       0.169
------------------------------------------------------------------------------

进一步地，我们可以将汽车出行成本增加 25% 后人们选择各种出行方式的概率与汽车出行成本未增加的情况进行比较。

. margins, alternative(Car) at(trcost=generate(trcost))      ///              
>   at(trcost=generate(1.25*trcost)) contrast (at(r) nowald) ///                
>   subpop(if t==1)

Contrasts of predictive margins             Number of obs   = 6,000
Model VCE: OIM                              Subpop. no. obs = 2,000
Expression:  Pr(alt), predict()
Alternative: Car
1._at: trcost =      trcost
2._at: trcost = 1.25*trcost
-------------------------------------------------------------------
                  |            Delta-method
                  |   Contrast   std. err.     [95% conf. interval]
------------------+------------------------------------------------
     _at@_outcome |
    (2 vs 1) Car  |     -0.103      0.003        -0.108      -0.098
 (2 vs 1) Public  |      0.054      0.002         0.049       0.058
(2 vs 1) Bicycle  |      0.031      0.002         0.027       0.035
   (2 vs 1) Walk  |      0.018      0.002         0.015       0.022
-------------------------------------------------------------------

. marginsplot, recast(dot) yline(0) plotopts(msymbol(square))
Variables that uniquely identify margins: _outcome
Multiple at() options specified:
      _atoption=1: trcost=generate(trcost)
      _atoption=2: trcost=generate(1.25*trcost)

例 5：选择汽车出行的概率如何随着汽车出行时间的变化而变化？

. margins, dydx(trtime) outcome(Car) alternative(Car)

Average marginal effects                                 Number of obs = 6,000
Model VCE: OIM
Expression:  Pr(alt), predict()
Alternative: Car
Outcome:     Car
dy/dx wrt:   trtime
------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
trtime       |
       _cons |     -0.158      0.027    -5.88   0.000       -0.211      -0.105
------------------------------------------------------------------------------

例 6：选择公共交通工具出行的概率如何随与汽车出行时间的变化而变化？

. margins, dydx(trtime) outcome(Public) alternative(Car)

Average marginal effects                                 Number of obs = 6,000
Model VCE: OIM
Expression:  Pr(alt), predict()
Alternative: Car
Outcome:     Public
dy/dx wrt:   trtime
------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
trtime       |
       _cons |      0.106      0.017     6.15   0.000        0.072       0.139
------------------------------------------------------------------------------

例 7：选择各种出行方式的概率如何随着汽车出行时间的变化而变化？

. margins, dydx(trtime) outcome(Car)

Average marginal effects                                 Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
Outcome:    Car
dy/dx wrt:  trtime
------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
trtime       |
         alt |
        Car  |     -0.158      0.027    -5.88   0.000       -0.211      -0.105
     Public  |      0.106      0.017     6.15   0.000        0.072       0.139
    Bicycle  |      0.037      0.007     5.11   0.000        0.023       0.052
       Walk  |      0.015      0.004     3.52   0.000        0.007       0.024
------------------------------------------------------------------------------

4. 参考文献

Joerg Luedicke. 2019. Performing and interpreting discrete choice analyses in Stata. -PDF-
钟经樊, 连玉君. 计量分析与 STATA 应用第十五章 Logistic 模型, 版本 2.0, 2010.6.

5. 相关推文

Note：产生如下推文列表的 Stata 命令为：
lianxh logit probit, m
安装最新版 lianxh 命令：
ssc install lianxh, replace

专题：Stata命令

Stata新命令：面板-LogitFE-ProbitFE

专题：交乘项-调节

Logit-Probit中的交乘项及边际效应图示

专题：Probit-Logit

Logit-Probit：非线性模型中交互项的边际效应解读
秒懂小罗肥归：logit与mlogit详解
reg2logit：用OLS估计Logit模型参数
feologit：固定效应有序Logit模型
Stata：多元 Logit 模型详解 (mlogit)
Stata：Logit模型一文读懂
详解 Logit/Probit 模型中的 completely determined 问题
Stata：Logit 模型评介
二元选择模型：Probit 还是 Logit？
Stata：何时使用线性概率模型而非Logit？
Stata：嵌套 Logit 模型 (Nested Logit)
Stata：二元Probit模型
动态 Probit 模型及 Stata 实现

🍓 课程推荐：2024 机器学习与因果推断专题
主讲老师：司继春 (上海对外经贸大学) ；张宏亮（浙江大学）
课程时间：2024 年 11 月 9-10 日；16-17日
课程咨询：王老师 18903405450（微信）

尊敬的老师 / 亲爱的同学们：

连享会致力于不断优化和丰富课程内容，以确保每位学员都能获得最有价值的学习体验。为了更精准地满足您的学习需求，我们诚挚地邀请您参与到我们的课程规划中来。请您在下面的问卷中，分享您 感兴趣的学习主题或您希望深入了解的知识领域 。您的每一条建议都是我们宝贵的资源，将直接影响到我们课程的改进和创新。我们期待您的反馈，因为您的参与和支持是我们不断前进的动力。感谢您抽出宝贵时间，与我们共同塑造更加精彩的学习旅程！https://www.wjx.cn/vm/YgPfdsJ.aspx# 再次感谢大家宝贵的意见！

New！ Stata 搜索神器：lianxh 和 songbl GIF 动图介绍
搜：推文、数据分享、期刊论文、重现代码 ……
👉 安装：
. ssc install lianxh
. ssc install songbl
👉 使用：
. lianxh DID 倍分法
. songbl all

🍏 关于我们

连享会 ( www.lianxh.cn，推文列表) 由中山大学连玉君老师团队创办，定期分享实证分析经验。
直通车： 👉【百度一下：连享会】即可直达连享会主页。亦可进一步添加「知乎」,「b 站」,「面板数据」,「公开课」等关键词细化搜索。