Python使用索引列计算精确匹配的列

SG Kwon • 3 年前 • 1613 次点击

我有一个0和1的数据帧

a   1 1 1 1 0 0 0 1 0 0 0 0 0
b   1 1 1 1 0 0 0 1 1 0 0 0 0
c   1 1 1 1 0 0 0 1 1 1 1 0 0
d   1 1 1 1 0 0 0 1 1 1 1 0 0
e   1 1 1 1 0 0 0 0 0 0 0 1 1
f   1 1 1 1 1 1 1 0 0 0 0 0 0

(无标题)

我想做一个函数,如果一个给定字符串的列表(行名),

输出将是与字符串完全匹配的列数

例如

def exact_match(ls1):
  ~~~~~
  return col_num

print(exact_match(['c', 'd']))
>>> 2

输出为2,因为

精确匹配的列只有两个。

Python社区是高质量的Python/Django开发社区
本文地址：http://www.python88.com/topic/133033

1613 次点击

文章 [ 2 ] | 最新文章 3 年前

• 1 楼

MoRe 3 年前

如果我理解你的意思,对吗

你的数据框是这样的:

df = pd.DataFrame(data = [
    ["a", 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0],
    ["b", 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0],
    ["c", 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0],
    ["d", 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0],
    ["e", 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1],
    ["f", 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
])
df = df.rename(columns = {0:"name"}).set_index("name")

然后:

def exact_match(lst):
    s = df[df.columns[df.loc[lst].sum(axis = 0) == len(lst)]].sum(axis = 0) == len(lst)
    return len(s[s])
exact_match(["c","d"]) # output: 2

• 2 楼

mozway 3 年前

问题尚不清楚,但如果您想获得在提供的索引中只有1而在其他行中没有的列,可以使用:

def exact_match(ls1):
    # 1s on the provided indices
    m1 = df.loc[ls1].eq(1).all()
    # no 1s in the other rows
    m2 = df.drop(ls1).ne(1).all()
    # slice and get shape
    return df.loc[:, m1&m2].shape[1]
    # or
    # return (m1&m2).sum()

print(exact_match(['c', 'd']))
# 2

登录后回复