python pandas等价于使用超前和滞后窗口函数的sql case语句

这里是python的新手,想看看是否有更优雅的解决方案。

我有一个带有运动指示器的远程信息处理设备的时间序列数据。我需要将运动指示器扩展到实际运动开始和停止的+/-1行(由下面的motion2列表示)。我在sql中使用case语句和lead和lag窗口函数。正在尝试将我的代码转换为python…

这是数据。导入熊猫作为PD

data = {'device':[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2], 
    'time':[1,2,3,4,5,6,7,8,9,10,11,12,5,6,7,8,9,10,11,12,13,14],
    'motion':[0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0]}

df = pd.DataFrame.from_dict(data)
df = df[['device','time','motion']]

##sort data chronologically for each device
df.sort_values(['device','time'], ascending = True, inplace = True)

这就是df的样子

device, time, motion
1,1,0
1,2,0
1,3,1
1,4,1
1,5,1
1,6,0
1,7,0
1,8,0
1,9,1
1,10,1
1,11,0
1,12,0
2,5,0
2,6,0
2,7,0
2,8,1
2,9,1
2,10,1
2,11,0
2,12,1
2,13,0
2,14,0

我需要的是下面的motion2列添加到数据帧中。

device, time, motion, motion2
1,1,0,0
1,2,0,1
1,3,1,1
1,4,1,1
1,5,1,1
1,6,0,1
1,7,0,0
1,8,0,1
1,9,1,1
1,10,1,1
1,11,0,1
1,12,0,0
2,5,0,0
2,6,0,0
2,7,0,1
2,8,1,1
2,9,1,1
2,10,1,1
2,11,0,1
2,12,1,1
2,13,0,1
2,14,0,0

下面是可以工作的python代码。不过,想知道是否有更优雅的方式。

##create new columns for prior and next motion indicator
df['prev_motion'] = df.groupby(['device'])['motion'].shift(1)
df['next_motion'] = df.groupby(['device'])['motion'].shift(-1)

##create the desired motion2 indicator to expand +/- 1 record of the motion 
start and stop

df['motion2'] = df[['prev_motion', 'motion', 'next_motion']].apply(lambda 
row: 1 if row['motion']==1 else (1 if row['prev_motion']==1 or 
row['next_motion']==1 else 0), axis=1)

##drop unwanted columns        
df.drop(columns=['prev_motion', 'next_motion'], inplace = True)

这在使用case语句和windows函数的sql中要容易得多(lead和lag)。

case 
when motion = 1 then 1
when motion = 0 and (lead(motion) over (partition by device order by time) = 1) then 1
when motion = 0 and (lag(motion) over (partition by device order by time) = 1) then 1
else 0
end as motion2

df['motion2'] = pd.np.where(df.motion.values|pd.np.roll(df.motion.values,1)|pd.np.roll(df.motion.values,-1),1,0) # The following is Essentially the equivalent, but maybe a bit clearer / more efficient df['motion2'] = pd.np.stack((df.motion.values,pd.np.roll(df.motion.values,1),pd.np.roll(df.motion.values,-1))).any(0).astype(int) >>> df device time motion motion2 0 1 1 0 0 1 1 2 0 1 2 1 3 1 1 3 1 4 1 1 4 1 5 1 1 5 1 6 0 1 6 1 7 0 0 7 1 8 0 1 8 1 9 1 1 9 1 10 1 1 10 1 11 0 1 11 1 12 0 0 12 2 5 0 0 13 2 6 0 0 14 2 7 0 1 15 2 8 1 1 16 2 9 1 1 17 2 10 1 1 18 2 11 0 1 19 2 12 1 1 20 2 13 0 1 21 2 14 0 0