私信  •  关注

Henry Ecker

Henry Ecker 最近创建的主题
Henry Ecker 最近回复了
4 年前
回复了 Henry Ecker 创建的主题 » Python和Pandas:构造lambda参数

而其他许多选项显示了如何改进 apply 手术,我建议使用 pd.cut :

df['composition_comment'] = pd.cut(
    df['composition_score'] / composition_score_value,  # Divide to get percent
    bins=[0, 0.7, 0.8, 0.9, np.inf],                    # Set Bounds
    labels=[composition_comment_d_level,                # Set Labels
            composition_comment_c_level,
            composition_comment_b_level,
            composition_comment_a_level],
    right=False                                         # Set Lower bound inclusive
)

df :

   composition_score   composition_comment
0                 40            Good work.
1                 35    Satisfactory work.
2                 31   Improvement needed.
3                 27  Unsatisfactory work.

*背景 right=False 使下边界包含在内,这意味着箱子:

[0.0, 0.7)  # 0.0 (inclusive) up to 0.7 (not inclusive)
[0.7, 0.8)  # 0.7 (inclusive) up to 0.8 (not inclusive)
[0.8, 0.9)  # 0.8 (inclusive) up to 0.9 (not inclusive)
[0.9, inf)  # 0.9 (inclusive) up to infinity

笔记:

  1. inf 如果有一个设定的上限,可以修改。 1 作为上限与 右=假 因为1严格来说不小于1。
  2. np.NINF 如果预期值小于0,则可以使用,而不是下限

主要的好处是,有一个分类表可以返回。也就是说 sort_values 将不按字母顺序排序,而是按类别排序。

['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']
df = df.sort_values('composition_comment')

df :

   composition_score   composition_comment
3                 27  Unsatisfactory work.
2                 31   Improvement needed.
1                 35    Satisfactory work.
0                 40            Good work.

程序设置:

import numpy as np
import pandas as pd

df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40  # calculated by another process

# For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_a_level = "Good work."
# For scores between 89 and 80.
composition_comment_b_level = "Satisfactory work."
# For scores between 79 and 70.
composition_comment_c_level = "Improvement needed."
# For Scores below 70
composition_comment_d_level = "Unsatisfactory work."