社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Elasticsearch

Elasticsearch应用之京东搜索|七日打卡

全栈自学社区 • 3 年前 • 212 次点击  
阅读 155

Elasticsearch应用之京东搜索|七日打卡

Sometimes all we need is just a new perspective. 有时候我们只是需要新的角度

Elasticsearch应用之京东搜索

开发环境

  • spring boot 2.4.2
  • elasticsearch 7.10.1
  • lombok
  • 解析网页 jsoup 1.10.2
  • alibaba fastjson 1.2.73
  • jdk 1.8
  • 集成IDE idea
  • elasticsearch-head

所有开发环境 全栈自学社区 公众号回复 电脑环境 关键字即可获取.

项目概况

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>cn.com.codingce</groupId>
        <artifactId>codingce-es</artifactId>
        <version>0.0.1-SNAPSHOT</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>cn.com.codingce</groupId>
    <artifactId>codingce-es-jd</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>codingce-es-jd</name>
    <description>Demo project for Spring Boot</description>

    <properties>
        <java.version>1.8</java.version>
        <!-- 指定和本地版本一致 -->
        <elasticsearch.version>7.10.1</elasticsearch.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        


    
</dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-thymeleaf</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <scope>runtime</scope>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-configuration-processor</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <!-- 解析网页 -->
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.10.2</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.73</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>
复制代码

实现代码

ElasticSearchConfig

这里仅仅是单个 Elasticsearch

/**
 * ElasticSearch配置类
 * 找到对象, 放到Spring里面就可以用了
 *
 * @author mxz
 */
@Configuration
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")));
//                        new HttpHost("localhost", 9201, "http")));
        return client;
    }

}
复制代码

实体类 Content

用于对应京东单个商品数据

/**
 * 实体类
 *
 * @author mxz
 */
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Content {
    private String title;
    private String img;
    private


    
 String price;
}
复制代码

工具类 HtmlParseUtil

用于解析京东搜索的数据

/**
 * @author mxz
 */
@Component
public class HtmlParseUtil {

    public List<Content> parseJD(String keywords) throws Exception {

        // 获取请求 https://search.jd.com/Search?keyword=java
        String url = "https://search.jd.com/Search?keyword=" + keywords;

        // 解析网页 (返回 Document 就是浏览器 Document 对象)
        Document document = Jsoup.parse(new URL(url), 30000);

        // 所有在js中可以使用的方法, 这里都可以使用
        Element element = document.getElementById("J_goodsList");
//        System.out.println(element.html());


        ArrayList<Content> goodList = new ArrayList<>();

        // 获取所有的li元素
        Elements elements = element.getElementsByTag("li");
        // 获取元素中的内容
        for (Element el : elements) {
            // 关于这种图片特别多的网站, 所有的图片都是延迟加载的
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = el.getElementsByClass("p-price").eq(0).text();
            String title = el.getElementsByClass("p-name").eq(0).text();
            Content content = new Content(title, img, price);
            goodList.add(content);
        }

        return goodList;

    }

    public static void main(String[] args) throws IOException {

        // 获取请求 https://search.jd.com/Search?keyword=java
        String url = "https://search.jd.com/Search?keyword=java";

        // 解析网页 (返回 Document 就是浏览器 Document 对象)
        Document document = Jsoup.parse(new URL(url), 30000);

        // 所有在js中可以使用的方法, 这里都可以使用
        Element element = document.getElementById("J_goodsList");
//        System.out.println(element.html());

        // 获取所有的li元素
        Elements elements = element.getElementsByTag("li");
        // 获取元素中的内容
        for (Element el : elements) {
            // 关于这种图片特别多的网站, 所有的图片都是延迟加载的   source-data-Lazy-img
            String img = el.getElementsByTag("img").eq(0).attr("src");
            String price = el.getElementsByClass("p-price").eq(0).text();
            String title = el.getElementsByClass("p-name").eq(0).text();
            System.out.println("=====================================");
            System.out.println(img);
            System.out.println(price);
            System.out.println(title);
        }
    }


}
复制代码

分析京东搜索商品

业务逻辑层 ContentService

/**
 * 业务逻辑层
 *
 * @author mxz
 */
@Service
public class ContentService {

    public static final String ES_INDEX = "jd_goods";

    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;

    /**
     * 1 解析数据 放入 es 中
     *
     * @param keywords
     * @return
     * @throws Exception
     */
    public Boolean parseContent(String keywords) throws Exception {
        List<Content> contents = new HtmlParseUtil().parseJD(keywords);

        // 查询出来的数据放入到 es 中
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout(TimeValue.timeValueSeconds(2));
        for (int i = 0; i < contents.size(); i++) {

            bulkRequest.add(new IndexRequest(ES_INDEX)
                    .source(JSON.toJSONString(contents.get(i)), XContentType.JSON));

        }
        BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);

        return !bulkResponse.hasFailures();
    }


    /**
     * 2 获取这些数据  实现搜索功能
     *
     * @param keywords
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
    public List<Map<String, Object>> searchPage(String keywords, int pageNo, int pageSize) throws IOException {
        if (pageNo <= 1) {
            pageNo = 1;
        }

        // 条件搜索
        SearchRequest searchRequest = new SearchRequest(ES_INDEX);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        // 分页
        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        //精准匹配
        TermQueryBuilder termQuery = QueryBuilders.termQuery("title", keywords);
        searchSourceBuilder.query(termQuery);
        searchSourceBuilder.timeout(TimeValue.timeValueSeconds(60));

        // 执行搜索
        searchRequest.source(searchSourceBuilder);

        // 通过客户端查询
        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        List<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit documentFields : searchResponse.getHits().getHits()) {
            list.add(documentFields.getSourceAsMap());
        }
        return list;
    }


    /**
     * 3 实现搜索功能高亮
     *
     * @param keywords
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
    public List<Map<String, Object>> searchPageHighlighter(String keywords, int pageNo, int pageSize) throws IOException {

        if (pageNo <= 1) {
            pageNo = 1;
        }

        // 条件搜索
        SearchRequest searchRequest = new


    
 SearchRequest(ES_INDEX);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        // 分页
        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        //精准匹配
        TermQueryBuilder termQuery = QueryBuilders.termQuery("title", keywords);
        searchSourceBuilder.query(termQuery);
        searchSourceBuilder.timeout(TimeValue.timeValueSeconds(60));

        //生成高亮查询器
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        //高亮查询字段
        highlightBuilder.field("title");
        //如果要多个字段高亮,这项要为false
        highlightBuilder.requireFieldMatch(false);
        //高亮设置
        highlightBuilder.preTags("<span style = 'color:red'>");
        highlightBuilder.postTags("</span>");
        //下面这两项,如果你要高亮如文字内容等有很多字的字段,必须配置,不然会导致高亮不全,文章内容缺失等
        //最大高亮分片数
        highlightBuilder.fragmentSize(800000);
        //从第一个分片获取高亮片段
        highlightBuilder.numOfFragments(0);
        searchSourceBuilder.highlighter(highlightBuilder);


        // 执行搜索
        searchRequest.source(searchSourceBuilder);

        // 通过客户端查询
        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        List<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit documentFields : searchResponse.getHits().getHits()) {
            // 解析高亮字段
            Map<String, HighlightField> highlightFields = documentFields.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();  // 原来的结果
            // 解析高亮字段 将原来的字段换为高亮字段
            // 千万记得要记得判断是不是为空, 不然你匹配的第一个结果没有高亮内容, 那么就会报空指针异常, 这个错误一开始真的搞了很久
            if (title != null) {
                Text[] fragments = title.fragments();
                String newTitle = "";
                for (Text fragment : fragments) {
                    newTitle += fragment;
                }
                sourceAsMap.put("title", newTitle);
                System.out.println(newTitle);
            }
            list.add(sourceAsMap);
        }
        return list;
    }

}
复制代码

控制层 RestController

@RestController
public class ContentController {

    @Autowired
    private ContentService contentService;

    @GetMapping("/parse/{keywords}")
    public Boolean parse(@PathVariable("keywords") String keywords) throws Exception {
        return contentService.parseContent(keywords);
    }

    @GetMapping("/search/{keywords}/{pageNo}/{pageSize}")
    public List<Map<String, Object>> searchPage(@PathVariable("keywords") String keywords,
                                                @PathVariable("pageNo") int pageNo,
                                                @PathVariable("pageSize") int pageSize) throws IOException {
        // return contentService.searchPage(keywords, pageNo, pageSize);
        // 高亮
        return contentService.searchPageHighlighter(keywords, pageNo, pageSize);
    }

}
复制代码

前端页面 myindex.html

<!DOCTYPE html>
<html lang="zh-en" xmlns:th="http://www.thymeleaf.org">
<head>
    <meta charset="UTF-8">
    <title>全栈自学社区搜索</title>

    <link rel="stylesheet" href="https://v4.bootcss.com/docs/4.6/dist/css/bootstrap.min.css">
    <link rel="apple-touch-icon" href="https://v4.bootcss.com/docs/4.6/assets/img/favicons/apple-touch-icon.png"
          sizes="180x180">
    <link rel="icon" href="https://v4.bootcss.com/docs/4.6/assets/img/favicons/favicon-32x32.png" sizes="32x32"
          type="image/png">
    <link rel="icon" href="https://v4.bootcss.com/docs/4.6/assets/img/favicons/favicon-16x16.png" sizes="16x16"
          type="image/png">
    <link rel="mask-icon" href="https://v4.bootcss.com/docs/4.6/assets/img/favicons/safari-pinned-tab.svg"
          color="#563d7c">
    <link rel="icon" href="https://v4.bootcss.com/docs/4.6/assets/img/favicons/favicon.ico">
    <meta name="msapplication-config" content="/docs/4.6/assets/img/favicons/browserconfig.xml">
    <meta name="theme-color" content="#563d7c">

    <style>
        .bd-placeholder-img {
            font-size: 1.125rem;
            text-anchor: middle;
            -webkit-user-select: none;
            -moz-user-select: none;
            -ms-user-select: none;
            user-select: none;
        }

        @media (min-width: 768px) {
            .bd-placeholder-img-lg {
                font-size: 3.5rem;
            }
        }
    </style>


    

</head>
<body>
<div id="app">
    <nav class="navbar navbar-expand-md navbar-dark bg-dark mb-4">
        <a class="navbar-brand" href="https://v4.bootcss.com/docs/examples/navbar-static/#">全栈自学社区</a>
        <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarCollapse"
                aria-controls="navbarCollapse" aria-expanded="false" aria-label="Toggle navigation">
            <span class="navbar-toggler-icon"></span>
        </button>
        <div class="collapse navbar-collapse" id="navbarCollapse">
            <ul class="navbar-nav mr-auto">
                <li class="nav-item active">
                    <a class="nav-link" href="https://v4.bootcss.com/docs/examples/navbar-static/#">主页 <span
                            class="sr-only">(current)</span></a>
                </li>
                <li class="nav-item">
                    <a class="nav-link" href="https://v4.bootcss.com/docs/examples/navbar-static/#">友链</a>
                </li>
                <li class="nav-item">
                    <a class="nav-link disabled" href="https://v4.bootcss.com/docs/examples/navbar-static/#"
                       tabindex="-1" aria-disabled="true">控制面板</a>
                </li>
            </ul>
            <form class="form-inline mt-2 mt-md-0">
                <input class="form-control mr-sm-2" v-model="keyword" type="text" placeholder="输入名称"
                       aria-label="Search">
                <button class="btn btn-outline-success my-2 my-sm-0" @click.prevent="searchKey" type="submit">搜索
                </button>
            </form>
        </div>
    </nav>

    <main role="main" class="container">
        <div class="row">
            <div class="col-md-4" v-for="result in results">
                <img :src="result.img" class="img-thumbnail"/>
                <p><a v-html="result.title"></a></p>
                <p class="lead">{{result.price}}</p>
            </div>
        </div>
    </main>
</div>
<script type="text/javascript" th:src="@{/js/axios.min.js}"></script>
<script type="text/javascript" th:src="@{/js/vue.min.js}"></script>
<script>
    new Vue({
        el: '#app',
        data: {
            keyword: '', //搜索关键字
            results: []
        },
        methods: {
            searchKey() {
                var keyword = this.keyword;
                console.log(keyword);
                // 对接后端接口
                axios.get('/search/' + keyword + '/1/' + '10').then(response => {
                    console


    
.log(response);
                    this.results = response.data;  // 绑定数据
                })
            }
        }
    })
</script>

</body>
</html>
复制代码

项目运行

  • 搜索前

  • 搜索后

文章已上传gitee gitee.com/codingce/he…
项目地址: github.com/xzMhehe/cod…

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/106557
 
212 次点击