Py学习  »  Python

为什么从Python中读取HTML不起作用?

TropicalMagic • 3 年前 • 1487 次点击  

我想使用Python Pandas Read_HTML()函数从雅虎财务表中抓取信息,如屏幕截图所示,红色边框。

enter image description here

但是,我收到了一个HTTPError:HTTP Error 404:Not Found

以下是我的代码输出:

!pip install pandas
!pip install requests
!pip install bs4
!pip install requests_html
!pip install pytest-astropy
!pip install nest_asyncio
!pip install plotly

import pandas as pd
from bs4 import BeautifulSoup
import requests
import requests_html
import nest_asyncio
import lxml
import html5lib
nest_asyncio.apply()

url_link = "https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27"
read_html_pandas_data = pd.read_html(url_link)
Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/128531
 
1487 次点击  
文章 [ 2 ]  |  最新文章 3 年前
QHarr
Reply   •   1 楼
QHarr    4 年前

因为需要一个用户代理头,不能用 read_html .你可以先拿桌子 requests ,指定适当的标题,然后移交给熊猫:

from pandas import read_html as rh
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
table = rh(str(soup.select_one('[data-test="historical-prices"]')))[0]
print(table)
F.Hoque
Reply   •   2 楼
F.Hoque    4 年前

尝试以下方法:

import pandas as pd
import requests
url_link = 'https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27'
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text)
print(read_html_pandas_data)