Py学习  »  Python

【Python】Pandas技巧:nsmallest和nlargest

机器学习初学者 • 2 年前 • 406 次点击  

公众号:尤而小屋
作者:Peter
编辑:Peter

本文介绍两个pandas函数:nsmallestnlargest的使用。其他Pandas的文章:


官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html

DataFrame.nsmallest(
    n,  # int类型
    columns,  # 字段名
    keep='first'  # 重复值处理;{‘first’, ‘last’, ‘all’}, default ‘first’
   )

模拟数据

import pandas as pd
import numpy as np
df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
                   "score":[100,128,100,150,100,145],
                   "age":[21,25,23,21,25,25],
                   "height":[1.75,1.8,1.77,1.8,1.9,1.71]
                  })
df

namescoreageheight
0xiaosun100211.75
1zhoujuan128251.80
2xiaozhang100231.77
3wangfeng150211.80
4xiaoming100251.90
5zhangjun145251.71

nsmallest

默认情况

在指定的字段下取出最小的两行数据记录

df.nsmallest(2"score")

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77

取出4行记录:

df.nsmallest(4"score")  

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77
4xiaoming100251.90
1zhoujuan128251.80

从上面的结果可以看到默认情况,重复值也会多次计数。

参数keep

# 同上结果,默认first

df.nsmallest(4"score", keep="first"

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77
4xiaoming100251.90
1zhoujuan128251.80
df.nsmallest(4"score", keep="last")  

namescoreageheight
4xiaoming100251.90
2xiaozhang100231.77
0xiaosun100211.75
1zhoujuan128251.80

从上面的结果能看到:排序的顺序发生了变化,从索引号最大的4开始;

如何理解keep="all"?

df.nsmallest(2"score")

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77

当keep="all"会把全部的信息显示出来:

df.nsmallest(2"score", keep="all")

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77
4xiaoming100251.90

多个字段取值

df.nsmallest(4,["age","height"])

namescoreageheight
0xiaosun100211.75
3wangfeng150211.80
2xiaozhang100231.77
5zhangjun145251.71

nlargest

该函数的功能是实现降序排列

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest

DataFrame.nlargest(
    n, 
    columns, 
    keep='first'  # {‘first’, ‘last’, ‘all’}, default ‘first’
    )
df.nlargest(3,"score")

namescoreageheight
3wangfeng150211.80
5zhangjun145251.71
1zhoujuan128251.80
df.nlargest(3,"age")

namescoreageheight
1 zhoujuan128251.80
4xiaoming100251.90
5zhangjun145251.71
df.nlargest(2,"age",keep="first")

namescoreageheight
1zhoujuan128251.8
4xiaoming100251.9
df.nlargest(2,"age",keep="last")

namescoreageheight
5zhangjun145251.71
4xiaoming100251.90
df.nlargest(2,"age",keep="all")

namescoreageheight
1zhoujuan128251.80
4xiaoming100251.90
5zhangjun145251.71

nlargest + drop_duplicates

实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可

df

namescoreageheight
0xiaosun100211.75
1zhoujuan128251.80
2xiaozhang100231.77
3wangfeng150211.80
4xiaoming100251.90
5zhangjun145251.71
df["age"].value_counts()
25    3
21 2
23 1
Name: age, dtype: int64

年龄最大为25,且有3位;根据age去重:

df1 = df.drop_duplicates(subset=["age"], keep="first")
df1

namescoreageheight
0xiaosun100211.75
1zhoujuan128251.80
2xiaozhang100231.77
df1.nlargest(2,"age")

namescoreage height
1zhoujuan128251.80
2xiaozhang100231.77


    
往期精彩 回顾





Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/148334
 
406 次点击