Py学习  »  Python

selenium python-这不是重复问题..实际上无法正确定位元素[重复]

asp • 6 年前 • 2114 次点击  

我正在尝试为amazon结果创建一个基本的web scraper。当我迭代结果时,有时会得到结果的第5页(有时只有第2页),然后 StaleElementException 被扔掉。当我在抛出异常后查看浏览器时,可以看到驱动程序/页面没有向下滚动到页码所在的位置(底部栏)。

我的代码:

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

for page in range(1,last_page_number +1):

    driver.implicitly_wait(10)

    bottom_bar = driver.find_element_by_class_name('pagnCur')
    driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)

    current_page_number = int(driver.find_element_by_class_name('pagnCur').text)

    if page == current_page_number:
        next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number+1))
        next_page.click()
        print('page #',page,': going to next page')
    else:
        print('page #: ', page,'error')

我看过这个 question ,我猜可以应用类似的修复,但我不确定如何在页面上找到消失的内容。另外,根据打印语句的出现速度,我可以看到 implicitly_wait(10) 实际上并没有等整整10秒钟。

异常是指向以“driver.execute_script”开头的行。这是一个例外:

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

有时我会得到一个值错误:

ValueError: invalid literal for int() with base 10: ''

因此,这些错误/异常使我相信,在等待页面完全刷新时会发生一些事情。

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/40890
文章 [ 2 ]  |  最新文章 6 年前
DebanjanB
Reply   •   1 楼
DebanjanB    7 年前

你好像快到了。

保留你的概念 滚动 通过 scrollIntoView() 印刷 一些有用的调试消息,我做了一些小的调整,包括 网络驱动器 您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
    while True:
        try:
            current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
            driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
            current_page_number = current_page_number_element.get_attribute("innerHTML")
            WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
            print("page # {} : going to next page".format(current_page_number))
        except:
            print("page # {} : error, no more pages".format(current_page_number))
            break
    driver.quit()
    
  • 控制台输出:

    page # 1 : going to next page
    page # 2 : going to next page
    page # 3 : going to next page
    page # 4 : going to next page
    page # 5 : going to next page
    page # 6 : going to next page
    page # 7 : going to next page
    page # 8 : going to next page
    page # 9 : going to next page
    page # 10 : going to next page
    page # 11 : going to next page
    page # 12 : going to next page
    page # 13 : going to next page
    page # 14 : going to next page
    page # 15 : going to next page
    page # 16 : going to next page
    page # 17 : going to next page
    page # 18 : going to next page
    page # 19 : going to next page
    page # 20 : error, no more pages
    
Andersson
Reply   •   2 楼
Andersson    7 年前

如果只想让脚本遍历所有结果页,则不需要任何复杂的逻辑—只需单击“下一步”按钮,而这是可能的:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

while True:
    try:
        wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a > span#pagnNextString'))).click()
    except TimeoutException:
        break

另请注意 implicitly_wait(10) 不应该等 整整10秒 但是 等待10秒,使元素出现在html dom中 . 所以,如果元素在1或2秒内被找到,那么等待就完成了,您将不会等待剩下的8-9秒…