私信  •  关注

ALollz

ALollz 最近创建的主题
ALollz 最近回复了
5 年前
回复了 ALollz 创建的主题 » Python如何将一个数据帧的两列组合成一个列表?

array([[row1], [row2], ..., [rowN]]) 所以我们可以 ravel 应该很快。

df[['data1', 'data2']].to_numpy().ravel().tolist()
#[20, 120, 30, 456, 40, 34]

import perfplot
import pandas as pd
import numpy as np
from itertools import chain

perfplot.show(
    setup=lambda n: pd.DataFrame(np.random.randint(1, 10, (n, 2))), 
    kernels=[
        lambda df: df[[0, 1]].to_numpy().ravel().tolist(),
        lambda df: [x for i in zip(df[0], df[1]) for x in i],
        lambda df: [*chain.from_iterable(df[[0,1]].to_numpy())],
        lambda df: df[[0,1]].stack().tolist()  #  proposed by @anky_91
    ],
    labels=['ravel', 'zip', 'chain', 'stack'],
    n_range=[2 ** k for k in range(20)],
    equality_check=np.allclose,  
    xlabel="len(df)"
)

enter image description here

6 年前
回复了 ALollz 创建的主题 » python:在dataframe范围内的元组中填充元组

我会以不同的方式组织数据,index是date,columns是portf,value是base。

首先,我们需要重塑数据并重新采样到每日字段。那就是一个简单的支点。

cols = ['portf', 'base']
s = (df.reset_index()
       .melt(cols+['index'], value_name='date')
       .set_index('date')
       .groupby(cols+['index'], group_keys=False)
       .resample('D').ffill()
       .drop(columns=['variable', 'index'])
       .reset_index())

res = s.pivot(index='date', columns='portf')
res = res.resample('D').first()  # Recover missing dates between

输出 res

           base               
portf         a    b    c    d
2018-01-01   no   no  NaN  NaN
2018-01-02   no   no  NaN  NaN
2018-01-03  own   no  NaN  NaN
2018-01-04  own   no  NaN  NaN
2018-01-05  NaN   no  NaN  NaN
2018-01-06  NaN  own  NaN  NaN
2018-01-07  NaN  own  NaN  NaN
2018-01-08  NaN  NaN  NaN  NaN
2018-01-09  NaN  NaN  own  own
2018-01-10  NaN  NaN  own  NaN

如果你需要其他输出,我们可以用一些不太理想的 Series.apply 电话。这对于一个大数据帧来说是非常糟糕的;我会认真考虑保留上面的内容。

s.set_index('date').apply(tuple, axis=1).groupby('date').apply(tuple)

date
2018-01-01      ((a, no), (b, no))
2018-01-02      ((a, no), (b, no))
2018-01-03     ((a, own), (b, no))
2018-01-04     ((a, own), (b, no))
2018-01-05              ((b, no),)
2018-01-06             ((b, own),)
2018-01-07             ((b, own),)
2018-01-09    ((c, own), (d, own))
2018-01-10             ((c, own),)
dtype: object