创作新主题

社区所有版块导航

Python

python开源 Django Python DjangoApp pycharm

DATA

docker Elasticsearch

问与答闲聊招聘翻译创业分享发现分享创造求职区块链支付之战

aigc

aigc chatgpt

WEB开发

linux MongoDB Redis DATABASE NGINX 其他Web框架 web工具 zookeeper tornado NoSql Bootstrap js peewee Git bottle IE MQ Jquery

机器学习

机器学习算法

Python88.com

反馈公告社区推广

产品

短视频

印度

一周十大热门主题

LeCun发布2025学年机器学习研究生课程的教学大纲和讲义

2025必看AI干货!《大模型/AIGC/GPT-4/Transformer/DL/KG/NLP/C...

Chem. Sci. | 机器学习原子间势能模型AIMNet2实现复杂元素有机化合物模拟

从程序化生成到AIGC：3D场景生成技术如何跨越"虚假→真实"鸿沟？480+文献揭秘四大范式

NYU教授公布2025机器学习课程大纲：所有人都在追LLM，高校为何死磕基础理论？

2025必看AI干货!《大模型/AIGC/GPT-4/Transformer/DL/KG/NLP/C...

字节跳动提出Mogao模型：开启 AIGC 从“能写会画”到“边写边画”

美国版权局发布第三份人工智能报告：现有版权法框架能够应对AIGC挑战

CVPR2025｜MCA-Ctrl：多方协同注意力控制助力AIGC时代图像精准定制化

2025必看AI干货!《大模型/AIGC/GPT-4/Transformer/DL/KG/NLP/C...

私信 • 关注

chitown88

chitown88 最近创建的主题

» chitown88 创建的更多主题

chitown88 最近回复了

3 年前

回复了 chitown88 创建的主题 » 使用Selenium和Python删除足球网站上的一些数据

你确定你需要用硒吗?你可以很容易地拉那些有熊猫和请求的桌子。

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.soccerstats.com/matches.asp?matchday=1#'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a', text='stats')

filtered_links = []
for link in links:
    if 'pmatch' in link['href']:
        filtered_links.append(link['href'])

tables = {}
for count, link in enumerate(filtered_links, start=1):
    try:
        html = requests.get('https://www.soccerstats.com/' + link).text
        soup = BeautifulSoup(html, 'html.parser')
        
        goalsTable = soup.find('h2', text='Goal statistics')
        
        teams = goalsTable.find_next('table')
        teamsStr = teams.find_all('td')[0].text + ' ' + teams.find_all('td')[-1].text
        
        goalsTable = teams.find_next('table')
        df = pd.read_html(str(goalsTable))[0]
        
        print(f'{count} of {len(filtered_links)}: {teamsStr}')
        tables[teamsStr] = df
        
    except:
        print(f'{count} of {len(filtered_links)}: {teamsStr} !! NO GOALS STATISTICS !!')

输出:

3 年前

回复了 chitown88 创建的主题 » Python:基于组求和,并将其显示为附加列

做一个 .grouby() 上 channel ,并得到 units .然后简单地划分 单位 通过 units_per_channel

import pandas as pd


df = pd.DataFrame([['Offline',    'Bournemouth',    62],
['Offline' ,    'Kettering'  ,    90],
['Offline' ,    'Manchester' ,    145],
['Online'  ,    'Bournemouth',    220],
['Online'  ,    'Kettering',      212],
['Online'  ,    'Manchester',     272]],
                  columns=['channel','store','units'],)


df['units_per_channel'] = df.groupby('channel')['units'].transform('sum')
df['store_share'] = df['units'] / df['units_per_channel']

输出:

print(df)
   channel        store  units  units_per_channel  store_share
0  Offline  Bournemouth     62                297     0.208754
1  Offline    Kettering     90                297     0.303030
2  Offline   Manchester    145                297     0.488215
3   Online  Bournemouth    220                704     0.312500
4   Online    Kettering    212                704     0.301136
5   Online   Manchester    272                704     0.386364

6 年前

回复了 chitown88 创建的主题 » 从具有多个表的web页面获取python中的数据

该站点有一个api端点,它将数据以良好的json格式返回给您。您可以获取json格式的响应,然后将其规范化以创建表。现在当它这样做时,它返回两个表,所以我不确定您是否需要第二个表。如果没有,我将它们分开存放,然后将它们附加在一起。

import requests    
from pandas.io.json import json_normalize

url = 'https://api.bseindia.com/BseIndiaAPI/api/MktHighLowData/w?Grpcode=&HLflag=H&indexcode=&scripcode='

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}

payload = {
'Grpcode':'', 
'HLflag': 'H',
'indexcode':'' ,
'scripcode':'' }

jsonObj = requests.get(url, headers=headers, params=payload).json()

df_table = json_normalize(jsonObj['Table'])
df_table1 = json_normalize(jsonObj['Table1'])

df = df_table.append(df_table1)

输出:

print (df)
     ALLTimeHigh         ...                         dt_tm
0        1019.95         ...           2019-02-25T16:00:03
1         263.00         ...           2019-02-25T16:00:03
2          24.00         ...           2019-02-25T16:00:03
3          35.90         ...           2019-02-25T16:00:03
4          29.75         ...           2019-02-25T16:00:03
5          43.00         ...           2019-02-25T16:00:03
6         140.40         ...           2019-02-25T16:00:03
7          15.39         ...           2019-02-25T16:00:03
8         724.00         ...           2019-02-25T16:00:03
9        1495.00         ...           2019-02-25T16:00:03
10        123.15         ...           2019-02-25T16:00:03
11        121.00         ...           2019-02-25T16:00:03
12        238.50         ...           2019-02-25T16:00:03
13         89.00         ...           2019-02-25T16:00:03
14        819.95         ...           2019-02-25T16:00:03
15        112.40         ...           2019-02-25T16:00:03
16         49.95         ...           2019-02-25T16:00:03
17        330.85         ...           2019-02-25T16:00:03
18        167.45         ...           2019-02-25T16:00:03
19         25.10         ...           2019-02-25T16:00:03
20        940.00         ...           2019-02-25T16:00:03
21        165.00         ...           2019-02-25T16:00:03
22           NaN         ...           2019-02-25T16:00:03
23        239.00         ...           2019-02-25T16:00:03
24        151.55         ...           2019-02-25T16:00:03
25         34.35         ...           2019-02-25T16:00:03
26        256.15         ...           2019-02-25T16:00:03
27         49.75         ...           2019-02-25T16:00:03
28        103.25         ...           2019-02-25T16:00:03
29         50.50         ...           2019-02-25T16:00:03
..           ...         ...                           ...
87        135.00         ...           2019-02-25T16:00:03
88        219.80         ...           2019-02-25T16:00:03
89         58.00         ...           2019-02-25T16:00:03
90        494.00         ...           2019-02-25T16:00:03
91        285.30         ...           2019-02-25T16:00:03
92         55.65         ...           2019-02-25T16:00:03
93          4.45         ...           2019-02-25T16:00:03
94         50.00         ...           2019-02-25T16:00:03
95         50.00         ...           2019-02-25T16:00:03
96         92.50         ...           2019-02-25T16:00:03
97        154.80         ...           2019-02-25T16:00:03
98         82.40         ...           2019-02-25T16:00:03
99        293.85         ...           2019-02-25T16:00:03
100       396.00         ...           2019-02-25T16:00:03
101        98.00         ...           2019-02-25T16:00:03
102       144.60         ...           2019-02-25T16:00:03
103        11.50         ...           2019-02-25T16:00:03
104        42.95         ...           2019-02-25T16:00:03
105       313.00         ...           2019-02-25T16:00:03
106      1120.00         ...           2019-02-25T16:00:03
107        87.00         ...           2019-02-25T16:00:03
108        82.00         ...           2019-02-25T16:00:03
109       214.00         ...           2019-02-25T16:00:03
110       505.00         ...           2019-02-25T16:00:03
111      1525.00         ...           2019-02-25T16:00:03
112       220.00         ...           2019-02-25T16:00:03
113        36.00         ...           2019-02-25T16:00:03
114       170.00         ...           2019-02-25T16:00:03
115       549.50         ...           2019-02-25T16:00:03
116      4990.00         ...           2019-02-25T16:00:03

[168 rows x 19 columns]

6 年前

回复了 chitown88 创建的主题 » 在python中读取Json

有几件事:

1-那个 emotionsAll 钥匙在 objects 键,列表中的第一个元素 [0] , attributes 钥匙

2-您的json文件是以bytes前缀写入的,因此在读取时,它以 b' . 您可以a)通过解码/编码使该文件在不使用该模式的情况下写入,或者仅操作该字符串。

import json

data = repr(open('file.json', 'rb').read())

data = data.split('{', 1)[-1]
data = data.rsplit('}', 1)[0]

data = ''.join(['{', data, '}'])
jsonObj = json.loads(data)

print(jsonObj['objects'][0]['attributes']['emotionsAll']['neutral'])

输出:

print(jsonObj['objects'][0]['attributes']['emotionsAll']['neutral']) 
0.0

6 年前

回复了 chitown88 创建的主题 » 如何在Python中构造具有多个特性的元素

在我看来,熊猫是一种很好的方式。但你当然可以用字典:

elements = ['A', 'B', 'C', 'D']
colors = ['red','red', 'blue', 'red']
shapes = ['square', 'circle', 'circle', 'triangle']


dict1 = { element: {'color':colors[index], 'shape':shapes[index]} for index,element in enumerate(elements)}


def find_keys(keyword):
    result = []
    for key, val in dict1.items():
        for k, v in val.items():
            if v == keyword:
                result.append(key)
    return result

print (find_keys('red'))

输出:

 print (find_keys('red'))
['A', 'B', 'D']

print (find_keys('circle')) 
['B', 'C']

6 年前

回复了 chitown88 创建的主题 » 使用python selenium从html标记提取属性[duplicate]

使用 .attrs

import bs4

html = '''<input  
class="text header_login_text_box ignore_interaction" 
type="text" 
name="email" tabindex="1"
data-group="js-editable"
placeholder="Email"
w2cid="wZgD2YHa18" 
id="__w2_wZgD2YHa18_email">'''

soup = bs4.BeautifulSoup(html, 'html.parser')


for tag in soup:
    attr_dict = (tag.attrs)

输出: print (attr_dict)

{'class': ['text', 'header_login_text_box', 'ignore_interaction'], 
'type': 'text', 
'name': 'email', 
'tabindex': '1', 
'data-group': 'js-editable', 
'placeholder': 'Email', 
'w2cid': 'wZgD2YHa18', 
'id': '__w2_wZgD2YHa18_email'}

6 年前

回复了 chitown88 创建的主题 » 在这里,我不知道如何使用python beautifulsoup废弃这个javascript站点[关闭]

这将打开浏览器,然后单击下拉菜单。您可以单击所需的选项继续:

from selenium import webdriver 

driver = webdriver.Chrome()
url = 'http://www.mpcci.com/members_list.php'
driver.get(url) 

driver.find_element_by_xpath('//*[@id="select_gp_id"]').click()

6 年前

回复了 chitown88 创建的主题 » 需要了解如何使用python刮取实时流数据的帮助

你的代码不完整。具体来说,1)您实际上没有使用beautifulsoup来做任何事情,2)您的函数没有返回任何东西,这就是为什么它会打印“none”

import pandas as pd
import bs4 
from requests_html import HTMLSession 
import time

def get_count():

    url = 'http://10.0.0.206/apps/cy8ckit_062_demo/main.html'

    session = HTMLSession()
    r = session.get(url)
    r.html.render(sleep=5,timeout=8)

    soup = bs4.BeautifulSoup(r.text,'html.parser')

    data = soup.findAll('div', {'id':'currentData'})[0]
    temp_data = data.findAll('p')
    current_time = temp_data[0].text
    current_date = temp_data[1].text
    current_usage = temp_data[2].text

    print ('%s\n%s\n%s' %(current_time, current_date, current_usage))



while True:
    get_count()
    time.sleep(8)

5 年前

回复了 chitown88 创建的主题 » python列表只打印到csv中的一行

按原样的方式不会逐行写入,因为您基本上是按列创建列表的。有一种方法可以用 zip 我相信,但您也可以将每一行写入数据帧,然后将数据帧写入熊猫的文件:

import requests
from bs4 import BeautifulSoup
import pandas as pd

def write_output(data):
    data.to_csv('data.csv', index=False)    



def fetch_data():
    df = pd.DataFrame()
    base_url = 'http://leevers.com/'
    r = requests.get(base_url)
    soup = BeautifulSoup(r.text, 'lxml')

    locations = soup.find_all('div',{'class':'border'})

    for stores in locations:
        store = stores.find_all('p')

        name = store[0].text
        address = store[1].text
        city, state_zip = store[2].text.split(',')
        state, zip_code = state_zip.strip().split(' ')
        phone = store[3].text

        temp_df = pd.DataFrame([[base_url,name,address,city,state, zip_code,'US','<MISSING>',
                                phone,'<MISSING>','<MISSING>','<MISSING>','<MISSING>']],
                                columns=["locator_domain", "location_name", "street_address", "city", "state", "zip", "country_code",
                                         "store_number", "phone", "location_type", "latitude", "longitude", "hours_of_operation"])

        df = df.append(temp_df).reset_index(drop=True)
    return df

data = fetch_data()
write_output(data)

退出:

print (df.to_string())
         locator_domain          location_name        street_address           city state    zip country_code store_number             phone location_type   latitude  longitude hours_of_operation
0   http://leevers.com/  Colorado Ranch Market   11505 E. Colfax Ave         Aurora    CO  80010           US    <MISSING>  PH: 720-343-2195     <MISSING>  <MISSING>  <MISSING>          <MISSING>
1   http://leevers.com/             Save-A-Lot    4255 W Florida Ave         Denver    CO  80219           US    <MISSING>  PH: 303-935-0880     <MISSING>  <MISSING>  <MISSING>          <MISSING>
2   http://leevers.com/             Save-A-Lot      15220 E. 6th Ave         Aurora    CO  80011           US    <MISSING>  PH: 720-343-2011     <MISSING>  <MISSING>  <MISSING>          <MISSING>
3   http://leevers.com/             Save-A-Lot      3045 W. 74th Ave    Westminster    CO  80030           US    <MISSING>  PH: 303-339-2610     <MISSING>  <MISSING>  <MISSING>          <MISSING>
4   http://leevers.com/             Save-A-Lot    1110 Bonforte Blvd         Pueblo    CO  81001           US    <MISSING>  PH: 719-544-6057     <MISSING>  <MISSING>  <MISSING>          <MISSING>
5   http://leevers.com/             Save-A-Lot          698 Peria St         Aurora    CO  80011           US    <MISSING>  PH: 303-365-0393     <MISSING>  <MISSING>  <MISSING>          <MISSING>
6   http://leevers.com/             Save-A-Lot         4860 Pecos St         Denver    CO  80221           US    <MISSING>  PH: 720-235-3900     <MISSING>  <MISSING>  <MISSING>          <MISSING>
7   http://leevers.com/             Save-A-Lot      2630 W. 38th Ave         Denver    CO  80211           US    <MISSING>  PH: 303-433-4405     <MISSING>  <MISSING>  <MISSING>          <MISSING>
8   http://leevers.com/             Save-A-Lot       405 S Circle Dr  Colo. Springs    CO  80910           US    <MISSING>  PH: 719-520-5620     <MISSING>  <MISSING>  <MISSING>          <MISSING>
9   http://leevers.com/             Save-A-Lot       1750 N. Main St       Longmont    CO  80501           US    <MISSING>  PH: 720-864-8060     <MISSING>  <MISSING>  <MISSING>          <MISSING>
10  http://leevers.com/             Save-A-Lot       630 W. 84th Ave       Thornton    CO  80260           US    <MISSING>  PH: 303-468-6290     <MISSING>  <MISSING>  <MISSING>          <MISSING>
11  http://leevers.com/             Save-A-Lot  1951 S. Federal Blvd         Denver    CO  80219           US    <MISSING>  PH: 303-407-0430     <MISSING>  <MISSING>  <MISSING>          <MISSING>
12  http://leevers.com/             Save-A-Lot        7290 Manaco St  Commerce City    CO  80022           US    <MISSING>  PH: 303-288-1747     <MISSING>  <MISSING>  <MISSING>          <MISSING>
13  http://leevers.com/             Save-A-Lot    6601 W. Colfax Ave       Lakewood    CO  80214           US    <MISSING>  PH: 303-468-6290     <MISSING>  <MISSING>  <MISSING>          <MISSING>
14  http://leevers.com/             Save-A-Lot           816 25th St        Greeley    CO  80631           US    <MISSING>  PH: 970-356-7498     <MISSING>  <MISSING>  <MISSING>          <MISSING>

» chitown88 创建的更多回复