pandagg.agg module¶

class pandagg.agg.Agg(from_=None, mapping=None, identifier=None, client=None, query=None, index_name=None)[source]¶

Bases: pandagg.tree._tree.Tree

Tree combination of aggregation nodes.

Mapping declaration is optional, but doing so validates aggregation validity.

DEFAULT_OUTPUT = 'dataframe'¶

add_node(node, pid=None)[source]¶: If mapping is provided, nested and outnested are automatically applied.

agg(arg, insert_below=None, **kwargs)[source]¶

Arrange passed aggregations in arg arguments “horizontally”.

Those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity: OK: A──> B ─> C ─> arg KO: A──> B

└──> C

arg argument accepts single occurrence or sequence of following formats: - string (for terms agg concise declaration) - regular Elasticsearch dict syntax - AggNode instance (for instance Terms, Filters etc)

Parameters:	arg – aggregation(s) clauses to insert “horizontally” insert_below – parent aggregation id under which these aggregations should be placed kwargs – agg body arguments when using “string” syntax for terms aggregation
Return type:	pandagg.agg.Agg

applied_nested_path_at_node(nid)[source]¶

bind(client, index_name=None)[source]¶

deepest_linear_bucket_agg¶: Return deepest bucket aggregation node (pandagg.nodes.abstract.BucketAggNode) of that aggregation that neither has siblings, nor has an ancestor with siblings.

classmethod deserialize(from_)[source]¶

execute(index=None, output='dataframe', **kwargs)[source]¶

groupby(by, insert_below=None, insert_above=None, **kwargs)[source]¶

Arrange passed aggregations in by arguments “vertically” (nested manner), above or below another agg clause.

Given the initial aggregation: A──> B └──> C

If insert_below = ‘A’: A──> by──> B

└──> C

If insert_above = ‘B’: A──> by──> B └──> C

by argument accepts single occurrence or sequence of following formats: - string (for terms agg concise declaration) - regular Elasticsearch dict syntax - AggNode instance (for instance Terms, Filters etc)

If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children: A──> B : OK generates A──> B ─> C ─> by

A──> B : KO, ambiguous, must precise either A, B or C └──> C

Parameters:	by – aggregation(s) clauses to insert “vertically” insert_below – parent aggregation id under which these aggregations should be placed insert_above – aggregation id above which these aggregations should be placed kwargs – agg body arguments when using “string” syntax for terms aggregation
Return type:	pandagg.agg.Agg

node_class¶: alias of pandagg.node.agg.abstract.AggNode

paste(nid, new_tree, deep=False)[source]¶: Pastes a tree handling nested implications if mapping is provided. The provided tree should be validated beforehands.

query(query, validate=False, **kwargs)[source]¶

query_dict(from_=None, depth=None, with_name=True)[source]¶

serialize_response(aggs, output, **kwargs)[source]¶

set_mapping(mapping)[source]¶

validate_tree(exc=False)[source]¶: Validate tree definition against defined mapping. :param exc: if set to True, will raise exception if tree is invalid :return: boolean

class pandagg.agg.MatchAll(name, meta=None, aggs=None)[source]¶: Bases: pandagg.node.agg.bucket.Filter

class pandagg.agg.Terms(name, field, missing=None, size=None, aggs=None, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

Terms aggregation.

BLACKLISTED_MAPPING_TYPES = []¶

KEY = 'terms'¶

VALUE_ATTRS = ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.Filters(name, filters, other_bucket=False, other_bucket_key=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

DEFAULT_OTHER_KEY = '_other_'¶

IMPLICIT_KEYED = True¶

KEY = 'filters'¶

VALUE_ATTRS = ['doc_count']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.Histogram(name, field, interval, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

KEY = 'histogram'¶

VALUE_ATTRS = ['doc_count']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.DateHistogram(name, field, interval=None, calendar_interval=None, fixed_interval=None, meta=None, keyed=False, key_as_string=True, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

KEY = 'date_histogram'¶

VALUE_ATTRS = ['doc_count']¶

WHITELISTED_MAPPING_TYPES = ['date']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.Range(name, field, ranges, keyed=False, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.MultipleBucketAgg

KEY = 'range'¶

KEY_SEP = '-'¶

VALUE_ATTRS = ['doc_count']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

from_key¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

to_key¶

class pandagg.agg.Global(name, meta=None, aggs=None)[source]¶

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'global'¶

VALUE_ATTRS = ['doc_count']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.Filter(name, filter, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'filter'¶

VALUE_ATTRS = ['doc_count']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.Nested(name, path, meta=None, aggs=None)[source]¶

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'nested'¶

VALUE_ATTRS = ['doc_count']¶

WHITELISTED_MAPPING_TYPES = ['nested']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.ReverseNested(name, path=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.UniqueBucketAgg

KEY = 'reverse_nested'¶

VALUE_ATTRS = ['doc_count']¶

WHITELISTED_MAPPING_TYPES = ['nested']¶

get_filter(key)[source]¶: Provide filter to get documents belonging to document of given key.

class pandagg.agg.Avg(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'avg'¶

VALUE_ATTRS = ['value']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.Max(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'max'¶

VALUE_ATTRS = ['value']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.Sum(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'sum'¶

VALUE_ATTRS = ['value']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.Min(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'min'¶

VALUE_ATTRS = ['value']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.Cardinality(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'cardinality'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.Stats(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'stats'¶

VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.ExtendedStats(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'extended_stats'¶

VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.Percentiles(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

Percents body argument can be passed to specify which percentiles to fetch.

KEY = 'percentiles'¶

VALUE_ATTRS = ['values']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.PercentileRanks(name, field, values, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'percentile_ranks'¶

VALUE_ATTRS = ['values']¶

WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶

class pandagg.agg.GeoBound(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'geo_bounds'¶

VALUE_ATTRS = ['bounds']¶

WHITELISTED_MAPPING_TYPES = ['geo_point']¶

class pandagg.agg.GeoCentroid(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

KEY = 'geo_centroid'¶

VALUE_ATTRS = ['location']¶

WHITELISTED_MAPPING_TYPES = ['geo_point']¶

class pandagg.agg.TopHits(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.MetricAgg

KEY = 'top_hits'¶

VALUE_ATTRS = ['hits']¶

class pandagg.agg.ValueCount(name, meta=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.FieldOrScriptMetricAgg

BLACKLISTED_MAPPING_TYPES = []¶

KEY = 'value_count'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.AvgBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'avg_bucket'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.Derivative(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'derivative'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.MaxBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'max_bucket'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.MinBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'min_bucket'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.SumBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'sum_bucket'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.StatsBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'stats_bucket'¶

VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']¶

class pandagg.agg.ExtendedStatsBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'extended_stats_bucket'¶

VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']¶

class pandagg.agg.PercentilesBucket(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'percentiles_bucket'¶

VALUE_ATTRS = ['values']¶

class pandagg.agg.MovingAvg(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'moving_avg'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.CumulativeSum(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'cumulative_sum'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.BucketScript(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.ScriptPipeline

KEY = 'bucket_script'¶

VALUE_ATTRS = ['value']¶

class pandagg.agg.BucketSelector(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.ScriptPipeline

KEY = 'bucket_selector'¶

VALUE_ATTRS = None¶

class pandagg.agg.BucketSort(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.ScriptPipeline

KEY = 'bucket_sort'¶

VALUE_ATTRS = None¶

class pandagg.agg.SerialDiff(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶

Bases: pandagg.node.agg.abstract.Pipeline

KEY = 'serial_diff'¶

VALUE_ATTRS = ['value']¶