我会以不同的方式组织数据,index是date,columns是portf,value是base。
首先,我们需要重塑数据并重新采样到每日字段。那就是一个简单的支点。
cols = ['portf', 'base']
s = (df.reset_index()
.melt(cols+['index'], value_name='date')
.set_index('date')
.groupby(cols+['index'], group_keys=False)
.resample('D').ffill()
.drop(columns=['variable', 'index'])
.reset_index())
res = s.pivot(index='date', columns='portf')
res = res.resample('D').first() # Recover missing dates between
输出
res
base
portf a b c d
2018-01-01 no no NaN NaN
2018-01-02 no no NaN NaN
2018-01-03 own no NaN NaN
2018-01-04 own no NaN NaN
2018-01-05 NaN no NaN NaN
2018-01-06 NaN own NaN NaN
2018-01-07 NaN own NaN NaN
2018-01-08 NaN NaN NaN NaN
2018-01-09 NaN NaN own own
2018-01-10 NaN NaN own NaN
如果你需要其他输出,我们可以用一些不太理想的
Series.apply
电话。这对于一个大数据帧来说是非常糟糕的;我会认真考虑保留上面的内容。
s.set_index('date').apply(tuple, axis=1).groupby('date').apply(tuple)
date
2018-01-01 ((a, no), (b, no))
2018-01-02 ((a, no), (b, no))
2018-01-03 ((a, own), (b, no))
2018-01-04 ((a, own), (b, no))
2018-01-05 ((b, no),)
2018-01-06 ((b, own),)
2018-01-07 ((b, own),)
2018-01-09 ((c, own), (d, own))
2018-01-10 ((c, own),)
dtype: object