社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

如何在python中使用Selenium在Myntra上单击“显示更多”

Dhruv Darda • 3 年前 • 1175 次点击  

我想在Myntra网站上提取规格和“完整外观”,只有点击“显示更多”才能看到。我为此编写了以下代码:

url = 'https://www.myntra.com/kurtas/jompers/jompers-men-yellow-printed-straight-kurta/11226756/buy'

df = pd.DataFrame(columns=['name','title','price','description','Size & fit','Material & care', 'Complete the look'])
metadata = dict.fromkeys(['name','title','price','description','Size & fit','Material & care', 'Complete the look'])
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Chrome('chromedriver')
specs = dict()
for i in range(1): #len(links)
    driver.get(url)
    try:
        metadata['title'] = driver.find_element_by_class_name('pdp-title').get_attribute("innerHTML")
        metadata['name'] = driver.find_element_by_class_name('pdp-name').get_attribute("innerHTML")
        metadata['price'] = driver.find_element_by_class_name('pdp-price').find_element_by_xpath('./strong').get_attribute("innerHTML")
        metadata['description'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[8]/div/div[1]/p').text
        #metadata['Specifications'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[1]/div[1]').text
        if driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]'):
            print('yes')
            element = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]')
            element.click()
        for i in range(1,20):
            try:
                specs[driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[{}]/div[1]'.format(i)).text] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[{}]/div[2]'.format(i)).text
            except:
                break
        metadata['Complete the look'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[8]/div/div[4]/div[2]/div/p/p').text
    except NoSuchElementException:  
        pass
    df = df.append(metadata, ignore_index=True)

我在输出中得到了一个“是”,我想这表明单击了“显示更多”选项,但在我的数据框的“完成外观”列中得到了一个“无”。如何获取隐藏在“show more”中的详细信息,它有以下标签:

    <div class="index-sizeFitDesc">
<h4 class="index-sizeFitDescTitle index-product-description-title" style="padding-bottom: 12px;">Specifications</h4>
<div class="index-tableContainer">
<div class="index-row">
<div class="index-rowKey">Sleeve Length</div>
<div class="index-rowValue">Long Sleeves</div>
</div><div class="index-row">
<div class="index-rowKey">Shape</div>
<div class="index-rowValue">Straight</div>
</div><div class="index-row">
<div class="index-rowKey">Neck</div>
<div class="index-rowValue">Mandarin Collar</div>
</div><div class="index-row">
<div class="index-rowKey">Print or Pattern Type</div>
<div class="index-rowValue">Geometric</div>
</div><div class="index-row">
<div class="index-rowKey">Design Styling</div>
<div class="index-rowValue">Regular</div></div>
<div class="index-row">
<div class="index-rowKey">Slit Detail</div>
<div class="index-rowValue">Side Slits</div>
</div><div class="index-row">
<div class="index-rowKey">Length</div>
<div class="index-rowValue">Above Knee</div>
</div><div class="index-row">
<div class="index-rowKey">Hemline</div>
<div class="index-rowValue">Curved</div></div></div>
<div class="index-showMoreText">See More</div></div>
Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/129827
 
1175 次点击  
文章 [ 2 ]  |  最新文章 3 年前
pmadhu
Reply   •   1 楼
pmadhu    3 年前

这个 Specifications 在里面 Product Details 是多个部分的组合。

最好在这些部分中逐一提取细节。

最好试着找到 relative xpaths 对于元素。

url = 'https://www.myntra.com/kurtas/jompers/jompers-men-yellow-printed-straight-kurta/11226756/buy'

# df = pd.DataFrame(columns=['name','title','price','description','Size & fit','Material & care', 'Complete the look'])
metadata = dict.fromkeys(['name','title','price','description','Size & fit','Material & care','Specifications', 'Complete the look'])
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Chrome('chromedriver')
specs = dict()
specfication = []
for i in range(1): #len(links)
    driver.get(url)
    try:
        metadata['title'] = driver.find_element_by_class_name('pdp-title').get_attribute("innerHTML")
        metadata['name'] = driver.find_element_by_class_name('pdp-name').get_attribute("innerHTML")
        metadata['price'] = driver.find_element_by_class_name('pdp-price').find_element_by_xpath('./strong').get_attribute("innerHTML")

        # Details were extracted even without scrolling, but it would be better to scroll down.
        driver.execute_script("arguments[0].scrollIntoView(true);",driver.find_element_by_xpath("//div[@class='pdp-productDescriptorsContainer']"))
        metadata['description'] = driver.find_element_by_xpath("//p[@class='pdp-product-description-content']").text
        #metadata['Specifications'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[1]/div[1]/div[1]').text
        if driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]'):
            print('yes')
            element = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[7]/div/div[4]/div[2]')
            element.click()
        metadata['Size & fit'] = driver.find_element_by_xpath("//h4[contains(text(),'Size')]/following-sibling::p").text
        metadata['Material & care']=driver.find_element_by_xpath("//h4[contains(text(),'Material')]/following-sibling::p").text

        # from Sleeve Length to Hemline
        specn1 = driver.find_elements_by_xpath("//div[@class='index-sizeFitDesc']/div[1]/div")
        for spec in specn1:
            key = spec.find_element_by_xpath("./div[@class='index-rowKey']").text
            value = spec.find_element_by_xpath("./div[@class='index-rowValue']").text
            specfication.append([key,value])

        #from Colour Family to Occasion
        specn2 = driver.find_elements_by_xpath("//div[@class='index-sizeFitDesc']/div[2]/div[1]/div")
        for spec in specn2:
            key = spec.find_element_by_xpath("./div[@class='index-rowKey']").text
            value = spec.find_element_by_xpath("./div[@class='index-rowValue']").text
            specfication.append([key, value])

        metadata['Specifications'] = specfication
        metadata['Complete the look'] = driver.find_element_by_xpath("//h4[contains(text(),'Complete')]/following-sibling::p").text
        # metadata['Complete the look'] = driver.find_element_by_xpath('//*[@id="mountRoot"]/div/div/div/main/div[2]/div[2]/div[8]/div/div[4]/div[2]/div/p/p').text
    except Exception as e:
        print(e)
        pass
for key,value in metadata.items():
    print(f"{key} : {value}")
    # df = df.append(metadata, ignore_index=True)
yes
name : Men Yellow Printed Straight Kurta
title : Jompers
price : Rs. 892
description : Yellow printed straight kurta, has a mandarin collar, long sleeves, straight hem, and side slits
Size & fit : The model (height 6') is wearing a size M
Material & care : Material: Cotton
Hand Wash
Specifications : [['Sleeve Length', 'Long Sleeves'], ['Shape', 'Straight'], ['Neck', 'Mandarin Collar'], ['Print or Pattern Type', 'Solid'], ['Design Styling', 'Regular'], ['Slit Detail', 'Side Slits'], ['Length', 'Knee Length'], ['Hemline', 'Straight'], ['Colour Family', 'Bright'], ['Weave Pattern', 'Regular'], ['Weave Type', 'Machine Weave'], ['Occasion', 'Daily']]
Complete the look : Sport this classic kurta from Jompers this season. Achieve a comfortably chic look for your next dinner party or family outing when you team this yellow piece with slim trousers and minimal flair.
cruisepandey
Reply   •   2 楼
cruisepandey    3 年前

我没有通读你写的所有代码,但为了点击show more,我尝试了下面的代码,可能你可以用现有代码注入下面的代码。

  1. 我们必须 scroll to that particular element Selenium 知道元素的确切位置。

  2. 我用过JS .click() 点击 展示更多

示例代码:

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
driver.get("https://www.myntra.com/kurtas/jompers/jompers-men-yellow-printed-straight-kurta/11226756/buy")
ele = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.index-showMoreText")))
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
ActionChains(driver).move_to_element(ele).perform()
driver.execute_script("arguments[0].click();", ele)
Complete_The_Look = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.index-product-description-content"))).text
print(Complete_The_Look)

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

输出:

Sport this classic kurta from Jompers this season. Achieve a comfortably chic look for your next dinner party or family outing when you team this yellow piece with slim trousers and minimal flair.