Elasticsearch: find documents with distinct values and then aggregate over them -
my index has log-like structure: insert version of document whenever event occurs. example, here documents in index:
{ "key": "a", subkey: 0 } { "key": "a", subkey: 0 } { "key": "a", subkey: 1 } { "key": "a", subkey: 1 } { "key": "b", subkey: 0 } { "key": "b", subkey: 0 } { "key": "b", subkey: 1 } { "key": "b", subkey: 1 }
i'm trying construct query in elasticsearch equivalent following sql query:
select count(*), key, subkey (select distinct key, subkey t)
the answer query be
(1, a, 0) (1, a, 1) (1, b, 0) (1, b, 1)
how replicate query in elasticsearch? came following:
get test_index/test_type/_search?search_type=count { "aggregations": { "count_aggr": { "terms": { "field": "concatenated_key" }, "aggs": { "sample_doc": { "top_hits": { "size": 1 } } } } } }
concatenated_key
concatenation of key
, subkey
. query create bucket each (key, subkey) combination , return sample document each bucket. however, don't know how can aggregate on fields of _source
.
would appreciate ideas. thanks!
if don't have possibility re-index documents , add your own concatenated key field, way of doing it:
get /my_index/my_type/_search?search_type=count { "aggs": { "key_agg": { "terms": { "field": "key", "size": 10 }, "aggs": { "sub_key_agg": { "terms": { "field": "subkey", "size": 10 } } } } } }
it give this:
"buckets": [ { "key": "a", "doc_count": 4, "sub_key_agg": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 0, "doc_count": 2 }, { "key": 1, "doc_count": 2 } ] } }, { "key": "b", "doc_count": 4, "sub_key_agg": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 0, "doc_count": 2 }, { "key": 1, "doc_count": 2 } ] } } ]
where have key - "key": "a"
- , each combination key , number of docs match key=a , subkey=0 or key=a , subkey=1:
"buckets": [ { "key": 0, "doc_count": 2 }, { "key": 1, "doc_count": 2 } ]
same goes other key.
Comments
Post a Comment