Aggregation¶
The Aggs
class provides :
- multiple syntaxes to declare and udpate a aggregation
- aggregation clause validation
- ability to insert clauses at specific locations (and not just below last manipulated clause)
Declaration¶
From native “dict” query¶
Given the following aggregation:
>>> expected_aggs = {
>>> "decade": {
>>> "histogram": {"field": "year", "interval": 10},
>>> "aggs": {
>>> "genres": {
>>> "terms": {"field": "genres", "size": 3},
>>> "aggs": {
>>> "max_nb_roles": {
>>> "max": {"field": "nb_roles"}
>>> },
>>> "avg_rank": {
>>> "avg": {"field": "rank"}
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }
To declare Aggs
, simply pass “dict” query as argument:
>>> from pandagg.agg import Aggs
>>> a = Aggs(expected_aggs)
A visual representation of the query is available with show()
:
>>> a.show()
<Aggregations>
decade <histogram, field="year", interval=10>
└── genres <terms, field="genres", size=3>
├── max_nb_roles <max, field="nb_roles">
└── avg_rank <avg, field="rank">
Call to_dict()
to convert it to native dict:
>>> a.to_dict() == expected_aggs
True
With DSL classes¶
Pandagg provides a DSL to declare this query in a quite similar fashion:
>>> from pandagg.agg import Histogram, Terms, Max, Avg
>>>
>>> a = Histogram("decade", field='year', interval=10, aggs=[
>>> Terms("genres", field="genres", size=3, aggs=[
>>> Max("max_nb_roles", field="nb_roles"),
>>> Avg("avg_rank", field="range")
>>> ]),
>>> ])
All these classes inherit from Aggs
and thus provide the same interface.
>>> from pandagg.agg import Aggs
>>> isinstance(a, Aggs)
True
With flattened syntax¶
In the flattened syntax, the first argument is the aggregation name, the second argument is the aggregation type, the following keyword arguments define the aggregation body:
>>> from pandagg.query import Aggs
>>> a = Aggs('genres', 'terms', size=3)
>>> a.to_dict()
{'genres': {'terms': {'field': 'genres', 'size': 3}}}
Aggregations enrichment¶
Aggregations can be enriched using two methods:
aggs()
groupby()
Both methods return a new Aggs
instance, and keep unchanged the initial Aggregation.
For instance:
>>> from pandagg.aggs import Aggs
>>> initial_a = Aggs()
>>> enriched_a = initial_a.agg('genres_agg', 'terms', field='genres')
>>> initial_q.to_dict()
None
>>> enriched_q.to_dict()
{'genres_agg': {'terms': {'field': 'genres'}}}
Note
Calling to_dict()
on an empty Aggregation returns None
>>> from pandagg.agg import Aggs >>> Aggs().to_dict() NoneTODO >>> Aggs().to_dict() None
TODO