公众号:尤而小屋
作者:Peter
编辑:Peter
本文介绍两个pandas函数:nsmallest
和nlargest
的使用。其他Pandas的文章:

官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html
DataFrame.nsmallest(
n, # int类型
columns, # 字段名
keep='first' # 重复值处理;{‘first’, ‘last’, ‘all’}, default ‘first’
)
模拟数据
import pandas as pd
import numpy as np
df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
"score":[100,128,100,150,100,145],
"age":[21,25,23,21,25,25],
"height":[1.75,1.8,1.77,1.8,1.9,1.71]
})
df
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
---|
3 | wangfeng | 150 | 21 | 1.80 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
5 | zhangjun | 145 | 25 | 1.71 |
---|
nsmallest
默认情况
在指定的字段下取出最小的两行数据记录
df.nsmallest(2, "score")
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
---|
取出4行记录:
df.nsmallest(4, "score")
|
name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
---|
从上面的结果可以看到默认情况,重复值也会多次计数。
参数keep
# 同上结果,默认first
df.nsmallest(4, "score", keep="first")
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
1 | zhoujuan | 128 | 25 | 1.80 |
---|
df.nsmallest(4, "score", keep="last")
| name | score | age | height |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
---|
从上面的结果能看到:排序的顺序发生了变化,从索引号最大的4开始;
如何理解keep="all"?
df.nsmallest(2, "score")
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
当keep="all"会把全部的信息显示出来:
df.nsmallest(2, "score", keep="all")
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
多个字段取值
df.nsmallest(4,["age","height"])
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
3 | wangfeng | 150 | 21 | 1.80 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
5 | zhangjun | 145 | 25 | 1.71 |
---|
nlargest
该函数的功能是实现降序排列
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest
DataFrame.nlargest(
n,
columns,
keep='first' # {‘first’, ‘last’, ‘all’}, default ‘first’
)
df.nlargest(3,"score")
| name | score | age | height |
---|
3 | wangfeng | 150 | 21 | 1.80 |
---|
5 | zhangjun | 145 | 25 | 1.71 |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
---|
df.nlargest(3,"age")
| name | score | age | height |
---|
1 |
zhoujuan | 128 | 25 | 1.80 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
5 | zhangjun | 145 | 25 | 1.71 |
---|
df.nlargest(2,"age",keep="first")
| name | score | age | height |
---|
1 | zhoujuan | 128 | 25 | 1.8 |
---|
4 | xiaoming | 100 | 25 | 1.9 |
---|
df.nlargest(2,"age",keep="last")
| name | score | age | height |
---|
5 | zhangjun | 145 | 25 | 1.71 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
df.nlargest(2,"age",keep="all")
| name | score | age | height |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
5 | zhangjun | 145 | 25 | 1.71 |
---|
nlargest + drop_duplicates
实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可
df
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
3 | wangfeng | 150 | 21 | 1.80 |
---|
4 | xiaoming | 100 | 25 | 1.90 |
---|
5 | zhangjun | 145 | 25 | 1.71 |
---|
df["age"].value_counts()
25 3
21 2
23 1
Name: age, dtype: int64
年龄最大为25,且有3位;根据age去重:
df1 = df.drop_duplicates(subset=["age"], keep="first")
df1
| name | score | age | height |
---|
0 | xiaosun | 100 | 21 | 1.75 |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
---|
df1.nlargest(2,"age")
| name | score | age |
height |
---|
1 | zhoujuan | 128 | 25 | 1.80 |
---|
2 | xiaozhang | 100 | 23 | 1.77 |
---|