社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

使用numpy和pandas优化python代码

Yeison H. Arias • 5 年前 • 1360 次点击  

我有以下代码:

import numpy as np
import pandas as pd
colum1 = [0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
colum2 = [1,2,3,4,5,6,7,8,9,10,11,12]
colum3 = [0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
colum4 = [1743.85, 1485.58, 1250.07, 1021.83, 818.96, 628.05, 455.40, 319.03, 190.86 , 97.07, 26.96 , 0.00]
df = pd.DataFrame({
    'colum1' : colum1,
    'colum2' : colum2,
    'colum3' : colum3,
    'colum4' : colum4,
});

df['result'] = 0
for i in range(len(colum2)):
    df['result'] = np.where(
        df['colum2'] <= 5,
        np.where(
            df['colum2'] == 1,
            df['colum4'],
            np.where(
                ( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) )>0,
                ( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) ),
                0
            )
        ),
        np.where(
            ( df['colum4'] - (df['result'].shift(1) * df['colum1']) )>0,
            ( df['colum4'] - (df['result'].shift(1) * df['colum1']) ),
            0
        )
    )

我需要在不使用for循环的情况下执行相同的操作。 这将是非常有帮助的,因为我正在与成千上万的记录,这是非常缓慢的工作。

我的预期结果如下:

    colum1  colum2  colum3   colum4       result
0     0.05       1    0.85  1743.85  1743.850000
1     0.05       2    0.80  1485.58  1415.826000
2     0.05       3    0.80  1250.07  1193.436960
3     0.05       4    0.80  1021.83   974.092522
4     0.05       5    0.85   818.96   777.561068
5     0.05       6    0.00   628.05   589.171947
6     0.05       7    0.00   455.40   425.941403
7     0.05       8    0.00   319.03   297.732930
8     0.05       9    0.00   190.86   175.973354
9     0.05      10    0.00    97.07    88.271332
10    0.05      11    0.00    26.96    22.546433
11    0.05      12    0.00     0.00     0.000000
Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/40162
 
1360 次点击  
文章 [ 1 ]  |  最新文章 5 年前
jpp
Reply   •   1 楼
jpp    6 年前

第一步是删除索引上的循环,并用 np.maximum . 这样做是因为 np.where(a > 0, a, 0) 就我们的目的而言 np.maximum(0, a) .

同时,分别定义较长的表达式以使代码可读:

s1 = df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3']))
s2 = df['colum4'] - (df['result'].shift(1) * df['colum1'])

df['result'] = np.where(df['colum2'] <= 5,
                        np.where(df['colum2'] == 1, df['colum4'],
                                 np.maximum(0, s1)),
                        np.maximum(0, s2))

下一步是使用 np.select 删除嵌套的 np.where 声明:

m1 = df['colum2'] <= 5
m2 = df['colum2'] == 1

conds = [m1 & m2, m1 & ~m2]
choices = [df['colum4'], np.maximum(0, s1)]

df['result'] = np.select(conds, choices, np.maximum(0, s2))

这个版本会更容易管理。