社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

如何让Chrome Webdriver和Selenium在python中识别页面上的类别链接?

JamesLC • 5 年前 • 1444 次点击  

使用Python3,我试图让Chrome Webdriver和Selenium在www.jtinsight.com网页上识别各种“分类”类别,并从中打印出类别标题。到目前为止,使用下面的代码,我能做的就是让它打印出前两个-“所有类别”和“汽车(私人)”。我已经确定这两个html与其他html不同,并尝试了许多不同的代码行,我在代码中列出了注释,但无法识别正确的标记/class/xpath等。 任何帮助都将不胜感激。

from selenium import webdriver
from selenium.webdriver.common.by import By

# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()

# Directing the driver to the defined url
driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")

# Locate the categories

# Each code line runs but only returns the first two categories
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6"]')
# categories = driver.find_elements_by_xpath('//div[@class="mainCatEntry"]')
# categories = driver.find_elements_by_xpath('//div[@class="Description"]')

# Process ran but finished with exit code 0
# categories = driver.find_elements_by_xpath('//*[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_partial_link_text('//href[@class="divLink"]')
# categories = driver.find_elements_by_tag_name('bindonce')
# categories = driver.find_elements_by_xpath('//div[@class="divLink"]')

# Error before finished running
# categories = driver.find_elements(By.CLASS_NAME, "col-md-3 col-sm-4 col-xs-6 ng-scope")
# categories = driver.find_elements(By.XPATH, '//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_class_name('//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')

# Print out all categories on current page
num_page_items = len(categories)
print(num_page_items)
for i in range(num_page_items):
    print(categories[i].text)

# Clean up (close browser once task is completed.)
driver.close()
Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/51231
 
1444 次点击  
文章 [ 2 ]  |  最新文章 5 年前
DebanjanB
Reply   •   1 楼
DebanjanB    6 年前

确定 分类表 网页上的类别 https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main 并打印类别标题,例如。 所有类别 , 汽车(私人) 等等,你需要 纸卷 降低一点并诱导 网络驱动器 对于 visibility_of_all_elements_located() 您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")
    driver.execute_script("arguments[0].scrollIntoView(true);",WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='ng-scope' and text()='Classifieds']"))));
    print([elem.get_attribute("innerHTML") for elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='mainCatEntry']//div[@class='Description']")))])
    
Breaks Software
Reply   •   2 楼
Breaks Software    6 年前

这真是个时间问题。如果我在收集分类之前加了一个“睡眠(5)”,它会找到所有24个。有趣的是,当我使用WebDriverWait时,它仍然只会拉2个项目。所以,为了迫使驱动程序做更多的工作,我扩展了xpath。以下对我有效:

categories = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="mainCatEntry"]/div[@class="Description"]')))