pandagg.agg module¶
-
class
pandagg.agg.
Agg
(from_=None, mapping=None, identifier=None, client=None, query=None, index_name=None)[source]¶ Bases:
pandagg.tree._tree.Tree
Tree combination of aggregation nodes.
Mapping declaration is optional, but doing so validates aggregation validity.
-
DEFAULT_OUTPUT
= 'dataframe'¶
-
add_node
(node, pid=None)[source]¶ If mapping is provided, nested and outnested are automatically applied.
-
agg
(arg, insert_below=None, **kwargs)[source]¶ Arrange passed aggregations in arg arguments “horizontally”.
Those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity: OK: A──> B ─> C ─> arg KO: A──> B
└──> Carg argument accepts single occurrence or sequence of following formats: - string (for terms agg concise declaration) - regular Elasticsearch dict syntax - AggNode instance (for instance Terms, Filters etc)
Parameters: - arg – aggregation(s) clauses to insert “horizontally”
- insert_below – parent aggregation id under which these aggregations should be placed
- kwargs – agg body arguments when using “string” syntax for terms aggregation
Return type:
-
deepest_linear_bucket_agg
¶ Return deepest bucket aggregation node (pandagg.nodes.abstract.BucketAggNode) of that aggregation that neither has siblings, nor has an ancestor with siblings.
-
groupby
(by, insert_below=None, insert_above=None, **kwargs)[source]¶ Arrange passed aggregations in by arguments “vertically” (nested manner), above or below another agg clause.
Given the initial aggregation: A──> B └──> C
If insert_below = ‘A’: A──> by──> B
└──> CIf insert_above = ‘B’: A──> by──> B └──> C
by argument accepts single occurrence or sequence of following formats: - string (for terms agg concise declaration) - regular Elasticsearch dict syntax - AggNode instance (for instance Terms, Filters etc)
If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children: A──> B : OK generates A──> B ─> C ─> by
A──> B : KO, ambiguous, must precise either A, B or C └──> C
Parameters: - by – aggregation(s) clauses to insert “vertically”
- insert_below – parent aggregation id under which these aggregations should be placed
- insert_above – aggregation id above which these aggregations should be placed
- kwargs – agg body arguments when using “string” syntax for terms aggregation
Return type:
-
node_class
¶ alias of
pandagg.node.agg.abstract.AggNode
-
-
class
pandagg.agg.
Terms
(name, field, missing=None, size=None, aggs=None, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.MultipleBucketAgg
Terms aggregation.
-
BLACKLISTED_MAPPING_TYPES
= []¶
-
KEY
= 'terms'¶
-
VALUE_ATTRS
= ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']¶
-
-
class
pandagg.agg.
Filters
(name, filters, other_bucket=False, other_bucket_key=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.MultipleBucketAgg
-
DEFAULT_OTHER_KEY
= '_other_'¶
-
IMPLICIT_KEYED
= True¶
-
KEY
= 'filters'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
-
class
pandagg.agg.
Histogram
(name, field, interval, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.MultipleBucketAgg
-
KEY
= 'histogram'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
DateHistogram
(name, field, interval=None, calendar_interval=None, fixed_interval=None, meta=None, keyed=False, key_as_string=True, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.MultipleBucketAgg
-
KEY
= 'date_histogram'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['date']¶
-
-
class
pandagg.agg.
Range
(name, field, ranges, keyed=False, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.MultipleBucketAgg
-
KEY
= 'range'¶
-
KEY_SEP
= '-'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
from_key
¶
-
to_key
¶
-
-
class
pandagg.agg.
Global
(name, meta=None, aggs=None)[source]¶ Bases:
pandagg.node.agg.abstract.UniqueBucketAgg
-
KEY
= 'global'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
-
class
pandagg.agg.
Filter
(name, filter, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.UniqueBucketAgg
-
KEY
= 'filter'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
-
class
pandagg.agg.
Nested
(name, path, meta=None, aggs=None)[source]¶ Bases:
pandagg.node.agg.abstract.UniqueBucketAgg
-
KEY
= 'nested'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['nested']¶
-
-
class
pandagg.agg.
ReverseNested
(name, path=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.UniqueBucketAgg
-
KEY
= 'reverse_nested'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['nested']¶
-
-
class
pandagg.agg.
Avg
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'avg'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
Max
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'max'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
Sum
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'sum'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
Min
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'min'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
Cardinality
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'cardinality'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
Stats
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'stats'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
ExtendedStats
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'extended_stats'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
Percentiles
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
Percents body argument can be passed to specify which percentiles to fetch.
-
KEY
= 'percentiles'¶
-
VALUE_ATTRS
= ['values']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
PercentileRanks
(name, field, values, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'percentile_ranks'¶
-
VALUE_ATTRS
= ['values']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.agg.
GeoBound
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'geo_bounds'¶
-
VALUE_ATTRS
= ['bounds']¶
-
WHITELISTED_MAPPING_TYPES
= ['geo_point']¶
-
-
class
pandagg.agg.
GeoCentroid
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
KEY
= 'geo_centroid'¶
-
VALUE_ATTRS
= ['location']¶
-
WHITELISTED_MAPPING_TYPES
= ['geo_point']¶
-
-
class
pandagg.agg.
TopHits
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.MetricAgg
-
KEY
= 'top_hits'¶
-
VALUE_ATTRS
= ['hits']¶
-
-
class
pandagg.agg.
ValueCount
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.FieldOrScriptMetricAgg
-
BLACKLISTED_MAPPING_TYPES
= []¶
-
KEY
= 'value_count'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
AvgBucket
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'avg_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
Derivative
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'derivative'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
MaxBucket
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'max_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
MinBucket
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'min_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
SumBucket
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'sum_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
StatsBucket
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'stats_bucket'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum']¶
-
-
class
pandagg.agg.
ExtendedStatsBucket
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'extended_stats_bucket'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']¶
-
-
class
pandagg.agg.
PercentilesBucket
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'percentiles_bucket'¶
-
VALUE_ATTRS
= ['values']¶
-
-
class
pandagg.agg.
MovingAvg
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'moving_avg'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
CumulativeSum
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'cumulative_sum'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
BucketScript
(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.ScriptPipeline
-
KEY
= 'bucket_script'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.agg.
BucketSelector
(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.ScriptPipeline
-
KEY
= 'bucket_selector'¶
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.agg.
BucketSort
(name, script, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.ScriptPipeline
-
KEY
= 'bucket_sort'¶
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.agg.
SerialDiff
(name, buckets_path, gap_policy=None, meta=None, aggs=None, **body)[source]¶ Bases:
pandagg.node.agg.abstract.Pipeline
-
KEY
= 'serial_diff'¶
-
VALUE_ATTRS
= ['value']¶
-