而其他许多选项显示了如何改进
apply
手术,我建议使用
pd.cut
:
df['composition_comment'] = pd.cut(
df['composition_score'] / composition_score_value, # Divide to get percent
bins=[0, 0.7, 0.8, 0.9, np.inf], # Set Bounds
labels=[composition_comment_d_level, # Set Labels
composition_comment_c_level,
composition_comment_b_level,
composition_comment_a_level],
right=False # Set Lower bound inclusive
)
df
:
composition_score composition_comment
0 40 Good work.
1 35 Satisfactory work.
2 31 Improvement needed.
3 27 Unsatisfactory work.
*背景
right=False
使下边界包含在内,这意味着箱子:
[0.0, 0.7) # 0.0 (inclusive) up to 0.7 (not inclusive)
[0.7, 0.8) # 0.7 (inclusive) up to 0.8 (not inclusive)
[0.8, 0.9) # 0.8 (inclusive) up to 0.9 (not inclusive)
[0.9, inf) # 0.9 (inclusive) up to infinity
笔记:
-
inf
如果有一个设定的上限,可以修改。
1
将
不
作为上限与
右=假
因为1严格来说不小于1。
-
np.NINF
如果预期值小于0,则可以使用,而不是下限
主要的好处是,有一个分类表可以返回。也就是说
sort_values
将不按字母顺序排序,而是按类别排序。
['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']
df = df.sort_values('composition_comment')
df
:
composition_score composition_comment
3 27 Unsatisfactory work.
2 31 Improvement needed.
1 35 Satisfactory work.
0 40 Good work.
程序设置:
import numpy as np
import pandas as pd
df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40 # calculated by another process
# For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_a_level = "Good work."
# For scores between 89 and 80.
composition_comment_b_level = "Satisfactory work."
# For scores between 79 and 70.
composition_comment_c_level = "Improvement needed."
# For Scores below 70
composition_comment_d_level = "Unsatisfactory work."