为什么elasticsearch术语bucket size会影响内部反向嵌套聚合的doc_计数?

"aggs": { "mainGrouping": { "nested": { "path": "parent.child" }, "aggs": { "uniqueCount": { "cardinality": { "field": "parent.child.id" } }, "groupBy": { "terms": { "field": "parent.child.id", "size": 20, <- If I change this, my doc count for noOfParents changes "order": [ { "noOfParents": "desc" } ] }, "aggs": { "noOfParents": { "reverse_nested": {} } } } } }

你所观察到的很可能是 terms 聚合,因为 文档计数是近似值 . 这与 reverse_nested ,也不 nested 聚合。

简言之,由于数据分布在碎片上,ElasticSearch首先在每个碎片上进行本地最佳猜测,然后在碎片上组合结果。为了得到更好更详细的解释,请查看 this section of the documentation .

为了确保事实如此,您可以添加 top_hits 聚合 explain 启用:

      "aggs": {
        "noOfParents": {
          "reverse_nested": {},
          "aggs": {
            "top hits": {
              "top_hits": {
                "size": 10,
                "explain": true
              }
            }
          }
        }
      }

这将为您提供匹配的父文档及其碎片ID的列表。像这样的:

  "aggregations": {
    "mainGrouping": {
      ...
      "groupBy": {
        ...
        "buckets": [
          {
            "key": "1",
            "doc_count": 5,
            "noOfParents": {
              "doc_count": 5,
              "top hits": {
                "hits": {
                  "total": 5,
                  "max_score": 1,
                  "hits": [
                    {
                      "_shard": "[my-index-2018-12][0]", <-- this is the shard
                      "_node": "7JNqOhTtROqzQR9QBUENcg",
                      "_index": "my-index-2018-12",
                      "_type": "doc",
                      "_id": "AWdpyZ4Y3HZjlM-Ibd7O",
                      "_score": 1,
                      "_source": {
                        "parent": "A",
                        "child": {
                          "id": "1"
                        }
                      },
                      "_explanation": ...
                    },

另一种证明这是问题根源的方法是将查询隔离在一个shard中。这样做就足以增加 routing 搜索请求: ?routing=0

这会让你 条款 桶数在一个碎片内稳定。然后比较 noOfParents 与预期数量的父级(同样,在同一个碎片内)。

希望能有帮助!