在r中,可以执行如下的多元线性回归
temp = lm(log(volume_1[11:62])~log(price_1[11:62])+log(volume_1[10:61]))
在python中,可以使用
R风格的公式,所以我认为下面的代码应该也能工作,
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
rando = lambda x: np.random.randint(low=1, high=100, size=x)
df = pd.DataFrame(data={'volume_1': rando(62), 'price_1': rando(62)})
temp = smf.ols(formula='np.log(volume_1)[11:62] ~ np.log(price_1)[11:62] + np.log(volume_1)[10:61]',
data=df)
# np.log(volume_1)[10:61] express the lagged volume
但我明白了
PatsyError: Number of rows mismatch between data argument and volume_1[11:62] (62 versus 51)
volume_1[11:62] ~ price_1[11:62] + volume_1[10:61]
我想不可能只回归列中的一部分行,因为data=df有62行,而其他变量有51行。
有没有什么方法可以像r一样方便地进行回归?
df type是pandas dataframe,列名是volume_1,price_1