pandagg.aggs module

class pandagg.aggs.Aggs(*args, **kwargs)[source]

Bases: pandagg.tree._tree.Tree

Combination of aggregation clauses. This class provides handful methods to build an aggregation (see aggs() and groupby()), and is used as well to parse aggregations response in handy formats.

Mapping declaration is optional, but doing so validates aggregation validity and automatically handles missing nested clauses.

All following syntaxes are identical:

From a dict:

>>> Aggs({"per_user":{"terms":{"field":"user"}}})

Using shortcut declaration: first argument is the aggregation type, other arguments are aggregation body parameters:

>>> Aggs('terms', name='per_user', field='user')

Using DSL class:

>>> from pandagg.aggs import Terms
>>> Aggs(Terms('per_user', field='user'))

Dict and DSL class syntaxes allow to provide multiple clauses aggregations:

>>> Aggs({"per_user":{"terms":{"field":"user"}, "aggs": {"avg_age": {"avg": {"field": "age"}}}}})

Which is similar to:

>>> from pandagg.aggs import Terms, Avg
>>> Terms('per_user', field='user', aggs=Avg('avg_age', field='age'))
Keyword Arguments:
 
  • mapping (dict or pandagg.tree.mapping.Mapping) – Mapping of requested indice(s). Providing it will validate aggregations validity, and add required nested clauses if missing.
  • nested_autocorrect (bool) – In case of missing nested clauses in aggregation, if True, automatically add missing nested clauses, else raise error.
  • remaining kwargs: Used as body in aggregation
aggs(*args, **kwargs)[source]

Arrange passed aggregations “horizontally”.

Given the initial aggregation:

A──> B
└──> C

If passing multiple aggregations with insert_below = ‘A’:

A──> B
└──> C
└──> new1
└──> new2

Note: those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity:

OK:

A──> B ─> C ─> new

KO:

A──> B
└──> C

args accepts single occurrence or sequence of following formats:

  • string (for terms agg concise declaration)
  • regular Elasticsearch dict syntax
  • AggNode instance (for instance Terms, Filters etc)
Keyword Arguments:
 
  • insert_below (string) – Parent aggregation name under which these aggregations should be placed
  • at_root (string) – Insert aggregations at root of aggregation query
  • remaining kwargs: Used as body in aggregation
Return type:

pandagg.aggs.Aggs

applied_nested_path_at_node(nid)[source]
deepest_linear_bucket_agg

Return deepest bucket aggregation node (pandagg.nodes.abstract.BucketAggNode) of that aggregation that neither has siblings, nor has an ancestor with siblings.

groupby(*args, **kwargs)[source]

Arrange passed aggregations in vertical/nested manner, above or below another agg clause.

Given the initial aggregation:

A──> B
└──> C

If insert_below = ‘A’:

A──> new──> B
      └──> C

If insert_above = ‘B’:

A──> new──> B
└──> C

by argument accepts single occurrence or sequence of following formats:

  • string (for terms agg concise declaration)
  • regular Elasticsearch dict syntax
  • AggNode instance (for instance Terms, Filters etc)

If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children:

A──> B      : OK generates     A──> B ─> C ─> by

A──> B      : KO, ambiguous, must precise either A, B or C
└──> C

Accepted all Aggs.__init__ syntaxes

>>> Aggs()\
>>> .groupby('terms', name='per_user_id', field='user_id')
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Passing a dict:

>>> Aggs().groupby({"terms_on_my_field":{"terms":{"field":"some_field"}}})
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Using DSL class:

>>> from pandagg.aggs import Terms
>>> Aggs().groupby(Terms('terms_on_my_field', field='some_field'))
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Shortcut syntax for terms aggregation: creates a terms aggregation, using field as aggregation name

>>> Aggs().groupby('some_field')
{"some_field":{"terms":{"field":"some_field"}}}

Using a Aggs object:

>>> Aggs().groupby(Aggs('per_user_id', 'terms', field='user_id'))
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Accepted declarations for multiple aggregations:

Keyword Arguments:
 
  • insert_below (string) – Parent aggregation name under which these aggregations should be placed
  • insert_above (string) – Aggregation name above which these aggregations should be placed
  • at_root (string) – Insert aggregations at root of aggregation query
  • remaining kwargs: Used as body in aggregation
Return type:

pandagg.aggs.Aggs

node_class

alias of pandagg.node.aggs.abstract.AggNode

show(*args, **kwargs)[source]

Return tree structure in hierarchy style.

Parameters:
  • nid – Node identifier from which tree traversal will start. If None tree root will be used
  • filter_ – filter function performed on nodes. Nodes excluded from filter function nor their children won’t be displayed
  • reverse – the reverse param for sorting Node objects in the same level
  • key – key used to order nodes of same parent
  • reverse – reverse parameter applied at sorting
  • line_type – display type choice
  • limit – int, truncate tree display to this number of lines
  • kwargs – kwargs params passed to node line_repr method
Return type:

unicode in python2, str in python3

to_dict(from_=None, depth=None, with_name=True)[source]
class pandagg.aggs.Terms(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'terms'
class pandagg.aggs.Filters(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'filters'
class pandagg.aggs.Histogram(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'histogram'
class pandagg.aggs.DateHistogram(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'date_histogram'
class pandagg.aggs.Range(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'range'
class pandagg.aggs.Global(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'global'
class pandagg.aggs.Filter(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'filter'
class pandagg.aggs.Missing(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'missing'
class pandagg.aggs.Nested(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'nested'
class pandagg.aggs.ReverseNested(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'reverse_nested'
class pandagg.aggs.Avg(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'avg'
class pandagg.aggs.Max(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'max'
class pandagg.aggs.Sum(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'sum'
class pandagg.aggs.Min(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'min'
class pandagg.aggs.Cardinality(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'cardinality'
class pandagg.aggs.Stats(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'stats'
class pandagg.aggs.ExtendedStats(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'extended_stats'
class pandagg.aggs.Percentiles(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

Percents body argument can be passed to specify which percentiles to fetch.

KEY = 'percentiles'
class pandagg.aggs.PercentileRanks(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'percentile_ranks'
class pandagg.aggs.GeoBound(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'geo_bounds'
class pandagg.aggs.GeoCentroid(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'geo_centroid'
class pandagg.aggs.TopHits(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'top_hits'
class pandagg.aggs.ValueCount(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractLeafAgg

KEY = 'value_count'
class pandagg.aggs.AvgBucket(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'avg_bucket'
class pandagg.aggs.Derivative(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'derivative'
class pandagg.aggs.MaxBucket(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'max_bucket'
class pandagg.aggs.MinBucket(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'min_bucket'
class pandagg.aggs.SumBucket(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'sum_bucket'
class pandagg.aggs.StatsBucket(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'stats_bucket'
class pandagg.aggs.ExtendedStatsBucket(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'extended_stats_bucket'
class pandagg.aggs.PercentilesBucket(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'percentiles_bucket'
class pandagg.aggs.MovingAvg(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'moving_avg'
class pandagg.aggs.CumulativeSum(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'cumulative_sum'
class pandagg.aggs.BucketScript(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'bucket_script'
class pandagg.aggs.BucketSelector(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'bucket_selector'
class pandagg.aggs.BucketSort(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'bucket_sort'
class pandagg.aggs.SerialDiff(*args, **kwargs)[source]

Bases: pandagg.tree.aggs.aggs.AbstractParentAgg

KEY = 'serial_diff'