私信  •  关注

F.Hoque

F.Hoque 最近创建的主题
F.Hoque 最近回复了
3 年前
回复了 F.Hoque 创建的主题 » 如何在python bs4中使用xpath获取字符串?

在xpath中,只需使用 text() 方法

from bs4 import BeautifulSoup
from lxml import etree

html_doc = """
<html>
<head>
</head>
<body>
   <div class="container">
      <section id="page">
         <div class="content">   
            <div class="box">  
               <ul>
                  <li>Name: Peter</li>
                  <li>Age: 21</li>
                  <li>Status: Active</li>
               </ul> 
            </div>
         </div>
      </section>
   </div>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'lxml')
dom = etree.HTML(str(soup))
print(dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text())

输出:

 ['Status: Active']

#或者

for li in dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text()'):
    txt=li.split()[1]
    print(txt)

输出:

Active

#或者

print(' '.join(dom.xpath('/html/body/div/section/div[1]/div[1]/ul/li[3]/text()')))

输出:

Status: Active

#或者

print(''.join(dom.xpath('//*[@class="box"]/ul/li[3]/text()')))

输出:

状态:活动
4 年前
回复了 F.Hoque 创建的主题 » 为什么从Python中读取HTML不起作用?

尝试以下方法:

import pandas as pd
import requests
url_link = 'https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27'
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text)
print(read_html_pandas_data)