私信  •  关注

perl

perl 最近创建的主题
perl 最近回复了

就像问题中建议的那样,我们首先生成数据并找到坐标。

cKDTree 在1的距离内找到邻居 query_pairs

然后我们用这些边创建networkx图 from_edgelist 然后跑 connected_components

最后一步是可视化。

import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from scipy.spatial.ckdtree import cKDTree
from mpl_toolkits.mplot3d import Axes3D

# create data
data = np.random.binomial(1, 0.1, 1000)
data = data.reshape((10,10,10))

# find coordinates
cs = np.argwhere(data > 0)

# build k-d tree
kdt = cKDTree(cs)
edges = kdt.query_pairs(1)

# create graph
G = nx.from_edgelist(edges)

# find connected components
ccs = nx.connected_components(G)
node_component = {v:k for k,vs in enumerate(ccs) for v in vs}

# visualize
df = pd.DataFrame(cs, columns=['x','y','z'])
df['c'] = pd.Series(node_component)

# to include single-node connected components
# df.loc[df['c'].isna(), 'c'] = df.loc[df['c'].isna(), 'c'].isna().cumsum() + df['c'].max()

fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111, projection='3d')
cmhot = plt.get_cmap("hot")
ax.scatter(df['x'], df['y'], df['z'], c=df['c'], s=50, cmap=cmhot)

输出:

enter image description here

  • 我把生成节点的概率从0.4降低到0.1,使可视化更加“可读”
  • 我没有显示只包含一个节点的连接组件。这可以通过取消注释 # to include single-node connected components
  • 数据帧 df 包含坐标 x , y z c 对于每个节点:
print(df)

输出:

     x  y  z     c
0    0  0  3  20.0
1    0  1  8  21.0
2    0  2  1   6.0
3    0  2  3  22.0
4    0  3  0  23.0
...
  • 基于数据帧 数据框
df['c'].value_counts().nlargest(5)

输出:

4.0    5
1.0    4
7.0    3
8.0    3
5.0    2
Name: c, dtype: int64
6 年前
回复了 perl 创建的主题 » 使用python pandas合并子组值

例如下面的代码(基本上是将所有行分成两个列表 z[0] z[1] 基于它们是否包含“会计顺序”,然后 read_fwf 关于非会计订单行 Z[0] ,同时添加 bfill 会计订单列表中的ed会计订单号 Z[1] )以下内容:

with open('input.txt') as f:
    s = f.read()

z = list(zip(*[(x.split('Accounting Order')[1], '') if 'Accounting Order' in x
               else (np.nan, x)
               for x in s.splitlines()]))

df = pd.concat([
    pd.DataFrame(z[0], columns=['Accounting Order']).bfill(),
    pd.read_fwf(pd.compat.StringIO('\n'.join(z[1])), header=None)], 1).dropna()

print(df)

输出:

   Accounting Order     0           1           2            3
0            190291   1.0  2019-03-01      Travel  1500 DCA CR
1            190291   4.0  2019-03-01   Allowance   300 ATC DR
2            190291   5.0  2019-03-02  Local Trip   100 TCO CR
5            195297  22.0  2019-02-01     Charges  2500 DCA CR
6            195297  98.0  2019-02-08   Allowance   900 ATC DR
7            195297  36.0  2019-01-30  Local Trip    50 TCO CR
8            195297  74.0  2019-02-09  Court fees   300 ATC DR
11           180876  33.0  2019-03-01      Travel  1500 DCA CR
12           180876  97.0  2019-03-01   Allowance   300 ATC DR