pandagg.agg module

class pandagg.agg.Agg(from_=None, mapping=None, identifier=None, client=None, query=None, index_name=None)[source]

Bases: pandagg.tree._tree.Tree

Tree combination of aggregation nodes.

Mapping declaration is optional, but doing so validates aggregation validity.

DEFAULT_OUTPUT = 'dataframe'
add_node(node, pid=None)[source]

If mapping is provided, nested and outnested are automatically applied.

agg(arg, insert_below=None, **kwargs)[source]

Arrange passed aggregations in arg arguments “horizontally”.

Those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity: OK: A──> B ─> C ─> arg KO: A──> B

└──> C

arg argument accepts single occurrence or sequence of following formats: - string (for terms agg concise declaration) - regular Elasticsearch dict syntax - AggNode instance (for instance Terms, Filters etc)

Parameters:
  • arg – aggregation(s) clauses to insert “horizontally”
  • insert_below – parent aggregation id under which these aggregations should be placed
  • kwargs – agg body arguments when using “string” syntax for terms aggregation
Return type:

pandagg.agg.Agg

applied_nested_path_at_node(nid)[source]
bind(client, index_name=None)[source]
deepest_linear_bucket_agg

Return deepest bucket aggregation node (pandagg.nodes.abstract.BucketAggNode) of that aggregation that neither has siblings, nor has an ancestor with siblings.

classmethod deserialize(from_)[source]
execute(index=None, output='dataframe', **kwargs)[source]
groupby(by, insert_below=None, insert_above=None, **kwargs)[source]

Arrange passed aggregations in by arguments “vertically” (nested manner), above or below another agg clause.

Given the initial aggregation: A──> B └──> C

If insert_below = ‘A’: A──> by──> B

└──> C

If insert_above = ‘B’: A──> by──> B └──> C

by argument accepts single occurrence or sequence of following formats: - string (for terms agg concise declaration) - regular Elasticsearch dict syntax - AggNode instance (for instance Terms, Filters etc)

If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children: A──> B : OK generates A──> B ─> C ─> by

A──> B : KO, ambiguous, must precise either A, B or C └──> C

Parameters:
  • by – aggregation(s) clauses to insert “vertically”
  • insert_below – parent aggregation id under which these aggregations should be placed
  • insert_above – aggregation id above which these aggregations should be placed
  • kwargs – agg body arguments when using “string” syntax for terms aggregation
Return type:

pandagg.agg.Agg

node_class

alias of pandagg.node.agg.abstract.AggNode

paste(nid, new_tree, deep=False)[source]

Pastes a tree handling nested implications if mapping is provided. The provided tree should be validated beforehands.

query(query, validate=False, **kwargs)[source]
query_dict(from_=None, depth=None, with_name=True)[source]
serialize_response(aggs, output, **kwargs)[source]
set_mapping(mapping)[source]
validate_tree(exc=False)[source]

Validate tree definition against defined mapping. :param exc: if set to True, will raise exception if tree is invalid :return: boolean

class pandagg.agg.MatchAll(name, meta=None, aggs=None)[source]

Bases: pandagg.node.agg.bucket.Filter

class pandagg.agg.Terms(name, field, missing=None, size=None, aggs=None, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

Terms aggregation.

BLACKLISTED_MAPPING_TYPES = []
KEY = 'terms'
VALUE_ATTRS = ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.Filters(name, filters, other_bucket=False, other_bucket_key=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

DEFAULT_OTHER_KEY = '_other_'
IMPLICIT_KEYED = True
KEY = 'filters'
VALUE_ATTRS = ['doc_count']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.Histogram(name, field, interval, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

KEY = 'histogram'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.DateHistogram(name, field, interval=None, calendar_interval=None, fixed_interval=None, meta=None, keyed=False, key_as_string=True, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

KEY = 'date_histogram'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['date']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.Range(name, field, ranges, keyed=False, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

KEY = 'range'
KEY_SEP = '-'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
from_key
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

to_key
class pandagg.agg.Global(name, meta=None, aggs=None)[source]

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'global'
VALUE_ATTRS = ['doc_count']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.Filter(name, filter, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'filter'
VALUE_ATTRS = ['doc_count']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.Nested(name, path, meta=None, aggs=None)[source]

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'nested'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['nested']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.ReverseNested(name, path=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'reverse_nested'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['nested']
get_filter(key)[source]

Provide filter to get documents belonging to document of given key.

class pandagg.agg.Avg(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'avg'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.Max(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'max'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.Sum(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'sum'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.Min(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'min'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.Cardinality(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'cardinality'
VALUE_ATTRS = ['value']
class pandagg.agg.Stats(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'stats'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.ExtendedStats(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'extended_stats'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.Percentiles(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

Percents body argument can be passed to specify which percentiles to fetch.

KEY = 'percentiles'
VALUE_ATTRS = ['values']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.PercentileRanks(name, field, values, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'percentile_ranks'
VALUE_ATTRS = ['values']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.agg.GeoBound(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'geo_bounds'
VALUE_ATTRS = ['bounds']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.agg.GeoCentroid(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'geo_centroid'
VALUE_ATTRS = ['location']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.agg.TopHits(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.MetricAgg

KEY = 'top_hits'
VALUE_ATTRS = ['hits']
class pandagg.agg.ValueCount(name, meta=None, **body)[source]

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

BLACKLISTED_MAPPING_TYPES = []
KEY = 'value_count'
VALUE_ATTRS = ['value']
class pandagg.agg.AvgBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'avg_bucket'
VALUE_ATTRS = ['value']
class pandagg.agg.Derivative(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'derivative'
VALUE_ATTRS = ['value']
class pandagg.agg.MaxBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'max_bucket'
VALUE_ATTRS = ['value']
class pandagg.agg.MinBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'min_bucket'
VALUE_ATTRS = ['value']
class pandagg.agg.SumBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'sum_bucket'
VALUE_ATTRS = ['value']
class pandagg.agg.StatsBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'stats_bucket'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']
class pandagg.agg.ExtendedStatsBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'extended_stats_bucket'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']
class pandagg.agg.PercentilesBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'percentiles_bucket'
VALUE_ATTRS = ['values']
class pandagg.agg.MovingAvg(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'moving_avg'
VALUE_ATTRS = ['value']
class pandagg.agg.CumulativeSum(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'cumulative_sum'
VALUE_ATTRS = ['value']
class pandagg.agg.BucketScript(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.ScriptPipeline

KEY = 'bucket_script'
VALUE_ATTRS = ['value']
class pandagg.agg.BucketSelector(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.ScriptPipeline

KEY = 'bucket_selector'
VALUE_ATTRS = None
class pandagg.agg.BucketSort(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.ScriptPipeline

KEY = 'bucket_sort'
VALUE_ATTRS = None
class pandagg.agg.SerialDiff(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'serial_diff'
VALUE_ATTRS = ['value']