Py学习  »  Python

Python和Pandas:构造lambda参数

Daniel Hutchinson • 3 年前 • 1560 次点击  

我正在用Python和Pandas编写一个脚本,它使用lambda语句根据分配给每行的数字等级在csv列中编写预格式化的注释。虽然我能在很多系列中做到这一点,但我在一个案例中遇到了困难。

以下是csv的结构:

enter image description here

下面是编写新专栏的工作代码 composition_comment (我相信有一种方法可以更简洁地表达这一点,但我仍在学习Python和Pandas。)

import pandas as pd

df = pd.read_csv('stack.csv')
composition_score_value = 40  #calculated by another process

composition_comment_a_level = "Good work." # For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_b_level = "Satisfactory work." # For scores between 89 and 80.
composition_comment_c_level = "Improvement needed." # For scores between 79 and 70.
composition_comment_d_level = "Unsatisfactory work." # For scores below 69.

df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_a_level if element <= (composition_score_value * 1) else element >= (composition_score_value *.90))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_b_level if element <= (composition_score_value *.899) else element >= (composition_score_value *.80))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_c_level if element <= (composition_score_value *.799) else element >= (composition_score_value *.70))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_d_level if element <= (composition_score_value *.699) else element >= (composition_score_value *.001))

df 

df.to_csv('stack.csv', index=False)

预期产出为:

enter image description here

但实际产出是:

enter image description here

你知道为什么会这样吗 True 正在写入的值,以及为什么最后一行处理正确?感谢您的帮助。

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/133719
文章 [ 6 ]  |  最新文章 3 年前
Muhteva
Reply   •   1 楼
Muhteva    4 年前

它会覆盖您在上述代码行中所做的操作。当它看到一个满足 element <= (composition_score_value *.699) 它返回合成、注释和级别。如果值不满足该条件,则返回 element >= (composition_score_value *.001) ,它本质上是一个布尔值,对或错。

这一条应该有效:

def composition_comment(element):
    composition_score_value = 40
    if element <= (composition_score_value *.699) :
        return "Unsatisfactory work."  
    elif element <= (composition_score_value *.799):
        return "Improvement needed."
    elif  element <= (composition_score_value *.899):
        return "Satisfactory work."
    elif element <= (composition_score_value * 1):
        return "Good work."
    else:
        return None

df['composition_comment'] = df['composition_score'].apply(composition_comment)
df

rahlf23
Reply   •   2 楼
rahlf23    4 年前

你可以用 pd.cut() 要执行映射,这比必须在 if-else 声明:

import pandas as pd

df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40

bins = pd.IntervalIndex.from_tuples([(0, 70), (70,80), (80,90), (90,100)])
labels = ['Unsatisfactory work.','Improvement needed.','Satisfactory work.','Good work.']
d = dict(zip(bins,labels))
x = pd.cut(df['composition_score']/composition_score_value*100, bins, right=False).map(d)

产量:

0              Good work.
1      Satisfactory work.
2     Improvement needed.
3    Unsatisfactory work.
Name: composition_score, dtype: category
Categories (4, object): ['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']
Sky Scraper
Reply   •   3 楼
Sky Scraper    4 年前

使用Numpy而不是 if/elif 用法:

import numpy as np

conditions = [ (df['composition_score']>90,
               (df['composition_score']>80) & (df['composition_score']<90),
               (df['composition_score']>70) & (df['composition_score']<80),
               (df['composition_score']<70
                      ]

choices = ['Great','Not Bad','Poor','very ugly']


df['composition_comment'] = np.select(conditions , choices , default='')

笔记 : default='' 当不满足任何条件时,表示默认值。

Acccumulation
Reply   •   4 楼
Acccumulation    4 年前

首先是一个样式注释:您应该使用更多的回车符,并且您的变量名相当长(包括名称中的“score”和“value”有点多余)。我相信以下功能仍然有效,不需要侧滚:

df['composition_comment'] = df['composition_score'].apply(
    lambda element: comp_comment_a if element <= (comp_score * 1) 
        else element >= (comp_score *.90))

至于发生了什么,上面的代码告诉Python您想要 comp_comment_a 如果 element 小于或等于 comp_score * 1 ,否则您需要 element >= (comp_score *.90) 元素>=(综合得分*.90) 是一个布尔值。我不完全清楚你的预期结果是什么,但根据我对你想要什么的猜测,你应该有 and 而不是 else .您的代码可以变得更干净,例如:

import pandas as pd

df = pd.read_csv('stack.csv')
comp_score = 40  #calculated by another process

comp_comments = ["Good work.", "Satisfactory work.", "Improvement needed.", 
    "Unsatisfactory work."]

def score_to_comment(score):
    if score > comp_score:
        #it's not clear what you want to do in this case
        #but this follows your original code
        return None
    if score >= comp_score * .9:
        return comp_comments[0]
    if score >= comp_score * .8:
       return comp_comments[1]
    if score >= comp_score * .7:
       return comp_comments[2]
    return comp_comments[3]

df['composition_comment'] = df['composition_score'].apply(score_to_comment)

df.to_csv('stack.csv', index=False)
RJ Adriaansen
Reply   •   5 楼
RJ Adriaansen    4 年前

else 在lambda函数中不返回任何内容,因此它只返回True。我建议将它们组合在一个函数中,同时颠倒顺序:

composition_score_value = 40  #calculated by another process

def return_level(element):
    if element <= (composition_score_value *.699):
        return "Unsatisfactory work." # For scores below 69.
    elif element <= (composition_score_value *.799):
        return "Improvement needed." # For scores between 79 and 70.
    elif element <= (composition_score_value *.899):
        return "Satisfactory work." # For scores between 89 and 80.
    elif element <= (composition_score_value * 1):
        return "Good work." # For scores falling between 100 and 90 percent of composition_score_value.
    else:
        return None

df['composition_comment'] = df['composition_score'].apply(return_level)

结果:

作文分数 作文与评论
0 40 干得好。
1. 35 令人满意的工作。
2. 31 需要改进。
3. 27 令人不满意的工作。
Henry Ecker
Reply   •   6 楼
Henry Ecker    4 年前

而其他许多选项显示了如何改进 apply 手术,我建议使用 pd.cut :

df['composition_comment'] = pd.cut(
    df['composition_score'] / composition_score_value,  # Divide to get percent
    bins=[0, 0.7, 0.8, 0.9, np.inf],                    # Set Bounds
    labels=[composition_comment_d_level,                # Set Labels
            composition_comment_c_level,
            composition_comment_b_level,
            composition_comment_a_level],
    right=False                                         # Set Lower bound inclusive
)

df :

   composition_score   composition_comment
0                 40            Good work.
1                 35    Satisfactory work.
2                 31   Improvement needed.
3                 27  Unsatisfactory work.

*背景 right=False 使下边界包含在内,这意味着箱子:

[0.0, 0.7)  # 0.0 (inclusive) up to 0.7 (not inclusive)
[0.7, 0.8)  # 0.7 (inclusive) up to 0.8 (not inclusive)
[0.8, 0.9)  # 0.8 (inclusive) up to 0.9 (not inclusive)
[0.9, inf)  # 0.9 (inclusive) up to infinity

笔记:

  1. inf 如果有一个设定的上限,可以修改。 1 作为上限与 右=假 因为1严格来说不小于1。
  2. np.NINF 如果预期值小于0,则可以使用,而不是下限

主要的好处是,有一个分类表可以返回。也就是说 sort_values 将不按字母顺序排序,而是按类别排序。

['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']
df = df.sort_values('composition_comment')

df :

   composition_score   composition_comment
3                 27  Unsatisfactory work.
2                 31   Improvement needed.
1                 35    Satisfactory work.
0                 40            Good work.

程序设置:

import numpy as np
import pandas as pd

df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40  # calculated by another process

# For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_a_level = "Good work."
# For scores between 89 and 80.
composition_comment_b_level = "Satisfactory work."
# For scores between 79 and 70.
composition_comment_c_level = "Improvement needed."
# For Scores below 70
composition_comment_d_level = "Unsatisfactory work."