社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

【Python】Pandas技巧:nsmallest和nlargest

机器学习初学者 • 2 年前 • 419 次点击  

公众号:尤而小屋
作者:Peter
编辑:Peter

本文介绍两个pandas函数:nsmallestnlargest的使用。其他Pandas的文章:


官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html

DataFrame.nsmallest(
    n,  # int类型
    columns,  # 字段名
    keep='first'  # 重复值处理;{‘first’, ‘last’, ‘all’}, default ‘first’
   )

模拟数据

import pandas as pd
import numpy as np
df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
                   "score":[100,128,100,150,100,145],
                   "age":[21,25,23,21,25,25],
                   "height":[1.75,1.8,1.77,1.8,1.9,1.71]
                  })
df

namescoreageheight
0xiaosun100211.75
1zhoujuan128251.80
2xiaozhang100231.77
3wangfeng150211.80
4xiaoming100251.90
5zhangjun145251.71

nsmallest

默认情况

在指定的字段下取出最小的两行数据记录

df.nsmallest(2"score")

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77

取出4行记录:

df.nsmallest(4"score")  

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77
4xiaoming100251.90
1zhoujuan128251.80

从上面的结果可以看到默认情况,重复值也会多次计数。

参数keep

# 同上结果,默认first

df.nsmallest(4"score", keep="first"

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77
4xiaoming100251.90
1zhoujuan128251.80
df.nsmallest(4"score", keep="last")  

namescoreageheight
4xiaoming100251.90
2xiaozhang100231.77
0xiaosun100211.75
1zhoujuan128251.80

从上面的结果能看到:排序的顺序发生了变化,从索引号最大的4开始;

如何理解keep="all"?

df.nsmallest(2"score")

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77

当keep="all"会把全部的信息显示出来:

df.nsmallest(2"score", keep="all")

namescoreageheight
0xiaosun100211.75
2xiaozhang100231.77
4xiaoming100251.90

多个字段取值

df.nsmallest(4,["age","height"])

namescoreageheight
0xiaosun100211.75
3wangfeng150211.80
2xiaozhang100231.77
5zhangjun145251.71

nlargest

该函数的功能是实现降序排列

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest

DataFrame.nlargest(
    n, 
    columns, 
    keep='first'  # {‘first’, ‘last’, ‘all’}, default ‘first’
    )
df.nlargest(3,"score")

namescoreageheight
3wangfeng150211.80
5zhangjun145251.71
1zhoujuan128251.80
df.nlargest(3,"age")

namescoreageheight
1 zhoujuan128251.80
4xiaoming100251.90
5zhangjun145251.71
df.nlargest(2,"age",keep="first")

namescoreageheight
1zhoujuan128251.8
4xiaoming100251.9
df.nlargest(2,"age",keep="last")

namescoreageheight
5zhangjun145251.71
4xiaoming100251.90
df.nlargest(2,"age",keep="all")

namescoreageheight
1zhoujuan128251.80
4xiaoming100251.90
5zhangjun145251.71

nlargest + drop_duplicates

实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可

df

namescoreageheight
0xiaosun100211.75
1zhoujuan128251.80
2xiaozhang100231.77
3wangfeng150211.80
4xiaoming100251.90
5zhangjun145251.71
df["age"].value_counts()
25    3
21 2
23 1
Name: age, dtype: int64

年龄最大为25,且有3位;根据age去重:

df1 = df.drop_duplicates(subset=["age"], keep="first")
df1

namescoreageheight
0xiaosun100211.75
1zhoujuan128251.80
2xiaozhang100231.77
df1.nlargest(2,"age")

namescoreage height
1zhoujuan128251.80
2xiaozhang100231.77


    
往期精彩 回顾





Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/148334
 
419 次点击