pandagg¶
Principles¶
This library focuses on two principles:
- stick to the tree structure of Elasticsearch objects
- provide simple and flexible interfaces to make it easy and intuitive to use in an interactive usage
Elasticsearch tree structures¶
Many Elasticsearch objects have a tree structure, ie they are built from a hierarchy of nodes:
- a mapping (tree) is a hierarchy of fields (nodes)
- a query (tree) is a hierarchy of query clauses (nodes)
- an aggregation (tree) is a hierarchy of aggregation clauses (nodes)
- an aggregation response (tree) is a hierarchy of response buckets (nodes)
This library sticks to that structure by providing a flexible syntax distinguishing trees and nodes, trees all inherit from lighttree.Tree class, whereas nodes all inherit from lighttree.Node class.
Interactive usage¶
pandagg is designed for both for “regular” code repository usage, and “interactive” usage (ipython or jupyter notebook usage with autocompletion features inspired by pandas design).
Some classes are not intended to be used elsewhere than in interactive mode (ipython), since their purpose is to serve auto-completion features and convenient representations.
Namely:
IMapping
: used to interactively navigate in mapping and run quick aggregations on some fieldsIResponse
: used to interactively navigate in an aggregation response
These use case will be detailed in following sections.
User Guide¶
pandagg library provides interfaces to perform read operations on cluster.
Search¶
Search
class is intended to perform requests, and refers to
Elasticsearch search api:
>>> from pandagg.search import Search
>>>
>>> client = ElasticSearch(hosts=['localhost:9200'])
>>> search = Search(using=client, index='movies')\
>>> .size(2)\
>>> .groupby('decade', 'histogram', interval=10, field='year')\
>>> .groupby('genres', size=3)\
>>> .aggs('avg_rank', 'avg', field='rank')\
>>> .aggs('avg_nb_roles', 'avg', field='nb_roles')\
>>> .filter('range', year={"gte": 1990})
>>> search
{
"query": {
"bool": {
"filter": [
{
"range": {
"year": {
"gte": 1990
}
}
}
]
}
},
"aggs": {
"decade": {
"histogram": {
"field": "year",
"interval": 10
},
"aggs": {
"genres": {
"terms": {
"field": "genres",
"size": 3
},
"aggs": {
"avg_rank": {
"avg": {
"field": "rank"
}
},
"avg_nb_roles": {
"avg": {
"field": "nb_roles"
}
}
}
}
}
}
},
"size": 2
}
It relies on:
Query
to build queries, query or post_filter (see Query),Aggs
to build aggregations (see Aggregation)
Note
All methods described below return a new Search
instance, and keep unchanged the
initial search request.
>>> from pandagg.search import Search
>>> initial_s = Search()
>>> enriched_s = initial_s.query('terms', genres=['Comedy', 'Short'])
>>> initial_s.to_dict()
{}
>>> enriched_s.to_dict()
{'query': {'terms': {'genres': ['Comedy', 'Short']}}}
Query part¶
The query or post_filter parts of a Search
instance are available respectively
under _query and _post_filter attributes.
>>> search._query.__class__
pandagg.tree.query.abstract.Query
>>> search._query.show()
<Query>
bool
└── filter
└── range, field=year, gte=1990
To enrich query of a search request, methods are exactly the same as for a
Query
instance.
>>> Search().must_not('range', year={'lt': 1980})
{
"query": {
"bool": {
"must_not": [
{
"range": {
"year": {
"lt": 1980
}
}
}
]
}
}
}
See section Query for more details.
To enrich post_filter of a search request, use post_filter()
:
>>> Search().post_filter('term', genres='Short')
{
"post_filter": {
"term": {
"genres": {
"value": "Short"
}
}
}
}
Aggregations part¶
The aggregations part of a Search
instance is available under _aggs attribute.
>>> search._aggs.__class__
pandagg.tree.aggs.aggs.Aggs
>>> search._aggs.show()
<Aggregations>
decade <histogram, field="year", interval=10>
└── genres <terms, field="genres", size=3>
├── avg_nb_roles <avg, field="nb_roles">
└── avg_rank <avg, field="rank">
To enrich aggregations of a search request, methods are exactly the same as for a
Aggs
instance.
>>> Search()\
>>> .groupby('decade', 'histogram', interval=10, field='year')\
>>> .aggs('avg_rank', 'avg', field='rank')
{
"aggs": {
"decade": {
"histogram": {
"field": "year",
"interval": 10
},
"aggs": {
"avg_rank": {
"avg": {
"field": "rank"
}
}
}
}
}
}
See section Aggregation for more details.
Other search request parameters¶
size, sources, limit etc, all those parameters are documented in Search
documentation and their usage is quite self-explanatory.
Request execution¶
To a execute a search request, you must first have bound it to an Elasticsearch client beforehand:
>>> from elasticsearch import Elasticsearch
>>> client = Elasticsearch(hosts=['localhost:9200'])
Either at instantiation:
>>> from pandagg.search import Search
>>> search = Search(using=client, index='movies')
Either with using()
method:
>>> from pandagg.search import Search
>>> search = Search()\
>>> .using(client=client)\
>>> .index('movies')
Executing a Search
request using execute()
will return a
Response
instance (see more in Response).
>>> response = search.execute()
>>> response
<Response> took 58ms, success: True, total result >=10000, contains 2 hits
>>> response.__class__
pandagg.response.Response
Query¶
The Query
class provides :
- multiple syntaxes to declare and udpate a query
- query validation (with nested clauses validation)
- ability to insert clauses at specific points
- tree-like visual representation
Declaration¶
From native “dict” query¶
Given the following query:
>>> expected_query = {'bool': {'must': [
>>> {'terms': {'genres': ['Action', 'Thriller']}},
>>> {'range': {'rank': {'gte': 7}}},
>>> {'nested': {
>>> 'path': 'roles',
>>> 'query': {'bool': {'must': [
>>> {'term': {'roles.gender': {'value': 'F'}}},
>>> {'term': {'roles.role': {'value': 'Reporter'}}}]}
>>> }
>>> }}
>>> ]}}
To instantiate Query
, simply pass “dict” query as argument:
>>> from pandagg.query import Query
>>> q = Query(expected_query)
A visual representation of the query is available with show()
:
>>> q.show()
<Query>
bool
└── must
├── nested, path="roles"
│ └── query
│ └── bool
│ └── must
│ ├── term, field=roles.gender, value="F"
│ └── term, field=roles.role, value="Reporter"
├── range, field=rank, gte=7
└── terms, genres=["Action", "Thriller"]
Call to_dict()
to convert it to native dict:
>>> q.to_dict()
{'bool': {
'must': [
{'range': {'rank': {'gte': 7}}},
{'terms': {'genres': ['Action', 'Thriller']}},
{'bool': {'must': [
{'term': {'roles.role': {'value': 'Reporter'}}},
{'term': {'roles.gender': {'value': 'F'}}}]}}}}
]}
]
}}
>>> from pandagg.utils import equal_queries
>>> equal_queries(q.to_dict(), expected_query)
True
Note
equal_queries function won’t consider order of clauses in must/should parameters since it actually doesn’t matter in Elasticsearch execution, ie
>>> equal_queries({'must': [A, B]}, {'must': [B, A]})
True
With DSL classes¶
Pandagg provides a DSL to declare this query in a quite similar fashion:
>>> from pandagg.query import Nested, Bool, Range, Term, Terms
>>> q = Bool(must=[
>>> Terms(genres=['Action', 'Thriller']),
>>> Range(rank={"gte": 7}),
>>> Nested(
>>> path='roles',
>>> query=Bool(must=[
>>> Term(roles__gender='F'),
>>> Term(roles__role='Reporter')
>>> ])
>>> )
>>> ])
All these classes inherit from Query
and thus provide the same interface.
>>> from pandagg.query import Query
>>> isinstance(q, Query)
True
With flattened syntax¶
In the flattened syntax, the query clause type is used as first argument:
>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])
Query enrichment¶
All methods described below return a new Query
instance, and keep unchanged the
initial query.
For instance:
>>> from pandagg.query import Query
>>> initial_q = Query()
>>> enriched_q = initial_q.query('terms', genres=['Comedy', 'Short'])
>>> initial_q.to_dict()
None
>>> enriched_q.to_dict()
{'terms': {'genres': ['Comedy', 'Short']}}
Note
Calling to_dict()
on an empty Query returns None
>>> from pandagg.query import Query
>>> Query().to_dict()
None
query() method¶
The base method to enrich a Query
is query()
.
Considering this query:
>>> from pandagg.query import Query
>>> q = Query()
query()
accepts following syntaxes:
from dictionnary:
>>> q.query({"terms": {"genres": ['Comedy', 'Short']})
flattened syntax:
>>> q.query("terms", genres=['Comedy', 'Short'])
from Query instance (this includes DSL classes):
>>> from pandagg.query import Terms
>>> q.query(Terms(genres=['Action', 'Thriller']))
Compound clauses specific methods¶
Query
instance also exposes following methods for specific compound queries:
(TODO: detail allowed syntaxes)
Specific to bool queries:
Specific to other compound queries:
Inserted clause location¶
On all insertion methods detailed above, by default, the inserted clause is placed at the top level of your query, and generates a bool clause if necessary.
Considering the following query:
>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])
>>> q.show()
<Query>
terms, genres=["Action", "Thriller"]
A bool query will be created:
>>> q = q.query('range', rank={"gte": 7})
>>> q.show()
<Query>
bool
└── must
├── range, field=rank, gte=7
└── terms, genres=["Action", "Thriller"]
And reused if necessary:
>>> q = q.must_not('range', year={"lte": 1970})
>>> q.show()
<Query>
bool
├── must
│ ├── range, field=rank, gte=7
│ └── terms, genres=["Action", "Thriller"]
└── must_not
└── range, field=year, lte=1970
Specifying a specific location requires to name queries :
>>> from pandagg.query import Nested
>>> q = q.nested(path='roles', _name='nested_roles', query=Term('roles.gender', value='F'))
>>> q.show()
<Query>
bool
├── must
│ ├── nested, _name=nested_roles, path="roles"
│ │ └── query
│ │ └── term, field=roles.gender, value="F"
│ ├── range, field=rank, gte=7
│ └── terms, genres=["Action", "Thriller"]
└── must_not
└── range, field=year, lte=1970
Doing so allows to insert clauses above/below given clause using parent/child parameters:
>>> q = q.query('term', roles__role='Reporter', parent='nested_roles')
>>> q.show()
<Query>
bool
├── must
│ ├── nested, _name=nested_roles, path="roles"
│ │ └── query
│ │ └── bool
│ │ └── must
│ │ ├── term, field=roles.role, value="Reporter"
│ │ └── term, field=roles.gender, value="F"
│ ├── range, field=rank, gte=7
│ └── terms, genres=["Action", "Thriller"]
└── must_not
└── range, field=year, lte=1970
TODO: explain parent_param, child_param, mode merging strategies on same named clause etc..
Aggregation¶
The Aggs
class provides :
- multiple syntaxes to declare and udpate a aggregation
- aggregation clause validation
- ability to insert clauses at specific locations (and not just below last manipulated clause)
Declaration¶
From native “dict” query¶
Given the following aggregation:
>>> expected_aggs = {
>>> "decade": {
>>> "histogram": {"field": "year", "interval": 10},
>>> "aggs": {
>>> "genres": {
>>> "terms": {"field": "genres", "size": 3},
>>> "aggs": {
>>> "max_nb_roles": {
>>> "max": {"field": "nb_roles"}
>>> },
>>> "avg_rank": {
>>> "avg": {"field": "rank"}
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }
To declare Aggs
, simply pass “dict” query as argument:
>>> from pandagg.aggs import Aggs
>>> a = Aggs(expected_aggs)
A visual representation of the query is available with show()
:
>>> a.show()
<Aggregations>
decade <histogram, field="year", interval=10>
└── genres <terms, field="genres", size=3>
├── max_nb_roles <max, field="nb_roles">
└── avg_rank <avg, field="rank">
Call to_dict()
to convert it to native dict:
>>> a.to_dict() == expected_aggs
True
With DSL classes¶
Pandagg provides a DSL to declare this query in a quite similar fashion:
>>> from pandagg.aggs import Histogram, Terms, Max, Avg
>>>
>>> a = Histogram("decade", field='year', interval=10, aggs=[
>>> Terms("genres", field="genres", size=3, aggs=[
>>> Max("max_nb_roles", field="nb_roles"),
>>> Avg("avg_rank", field="range")
>>> ]),
>>> ])
All these classes inherit from Aggs
and thus provide the same interface.
>>> from pandagg.aggs import Aggs
>>> isinstance(a, Aggs)
True
With flattened syntax¶
In the flattened syntax, the first argument is the aggregation name, the second argument is the aggregation type, the following keyword arguments define the aggregation body:
>>> from pandagg.query import Aggs
>>> a = Aggs('genres', 'terms', size=3)
>>> a.to_dict()
{'genres': {'terms': {'field': 'genres', 'size': 3}}}
Aggregations enrichment¶
Aggregations can be enriched using two methods:
Both methods return a new Aggs
instance, and keep unchanged the initial Aggregation.
For instance:
>>> from pandagg.aggs import Aggs
>>> initial_a = Aggs()
>>> enriched_a = initial_a.aggs('genres_agg', 'terms', field='genres')
>>> initial_q.to_dict()
None
>>> enriched_q.to_dict()
{'genres_agg': {'terms': {'field': 'genres'}}}
Note
Calling to_dict()
on an empty Aggregation returns None
>>> from pandagg.aggs import Aggs
>>> Aggs().to_dict()
None
TODO
Response¶
When executing a search request via execute()
method of Search
,
a Response
instance is returned.
>>> from elasticsearch import Elasticsearch
>>> from pandagg.search import Search
>>>
>>> client = ElasticSearch(hosts=['localhost:9200'])
>>> response = Search(using=client, index='movies')\
>>> .size(2)\
>>> .filter('term', genres='Documentary')\
>>> .aggs('avg_rank', 'avg', field='rank')\
>>> .execute()
>>> response
<Response> took 9ms, success: True, total result >=10000, contains 2 hits
>>> response.__class__
pandagg.response.Response
ElasticSearch raw dict response is available under data attribute:
>>> response.data
{
'took': 9, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 10000, 'relation': 'gte'},
'max_score': 0.0,
'hits': [{'_index': 'movies', ...}],
'aggregations': {'avg_rank': {'value': 6.496829211219546}}
}
Hits¶
Hits are available under hits attribute:
>>> response.hits
<Hits> total: >10000, contains 2 hits
>>> response.hits.total
{'value': 10000, 'relation': 'gte'}
>>> response.hits.hits
[<Hit 642> score=0.00, <Hit 643> score=0.00]
Those hits are instances of Hit
.
Directly iterating over Response
will return those hits:
>>> list(response)
[<Hit 642> score=0.00, <Hit 643> score=0.00]
>>> hit = next(iter(response))
Each hit contains the raw dict under data attribute:
>>> hit.data
{'_index': 'movies',
'_type': '_doc',
'_id': '642',
'_score': 0.0,
'_source': {'movie_id': 642,
'name': '10 Tage in Calcutta',
'year': 1984,
'genres': ['Documentary'],
'roles': None,
'nb_roles': 0,
'directors': [{'director_id': 33096,
'first_name': 'Reinhard',
'last_name': 'Hauff',
'full_name': 'Reinhard Hauff',
'genres': ['Documentary', 'Drama', 'Musical', 'Short']}],
'nb_directors': 1,
'rank': None}}
>>> hit._index
'movies'
>>> hit._source
{'movie_id': 642,
'name': '10 Tage in Calcutta',
'year': 1984,
'genres': ['Documentary'],
'roles': None,
'nb_roles': 0,
'directors': [{'director_id': 33096,
'first_name': 'Reinhard',
'last_name': 'Hauff',
'full_name': 'Reinhard Hauff',
'genres': ['Documentary', 'Drama', 'Musical', 'Short']}],
'nb_directors': 1,
'rank': None}
If pandas dependency is installed, hits can be parsed as a dataframe:
>>> hits.to_dataframe()
_index _score _type directors genres movie_id name nb_directors nb_roles rank roles year
_id
642 movies 0.0 _doc [{'director_id': 33096, 'first_name': 'Reinhard', 'last_name': 'Hauff', 'full_name': 'Reinhard Hauff', 'genres': ['Documentary', 'Drama', 'Musical', 'Short']}] [Documentary] 642 10 Tage in Calcutta 1 0 None None 1984
643 movies 0.0 _doc [{'director_id': 32148, 'first_name': 'Tanja', 'last_name': 'Hamilton', 'full_name': 'Tanja Hamilton', 'genres': ['Documentary']}] [Documentary] 643 10 Tage, ein ganzes Leben 1 0 None None 2004
Aggregations¶
Aggregations are handled differently, the aggregations attribute of a Response
returns
a Aggregations
instance, that provides specific parsing abilities in addition to exposing
raw aggregations response under data attribute.
Let’s build a bit more complex aggregation query to showcase its functionalities:
>>> from elasticsearch import Elasticsearch
>>> from pandagg.search import Search
>>>
>>> client = Elasticsearch(hosts=['localhost:9200'])
>>> response = Search(using=client, index='movies')\
>>> .size(0)\
>>> .groupby('decade', 'histogram', interval=10, field='year')\
>>> .groupby('genres', size=3)\
>>> .aggs('avg_rank', 'avg', field='rank')\
>>> .aggs('avg_nb_roles', 'avg', field='nb_roles')\
>>> .filter('range', year={"gte": 1990})\
>>> .execute()
Note
for more details about how to build aggregation query, consult Aggregation section
Using data attribute:
>>> response.aggregations.data
{'decade': {'buckets': [{'key': 1990.0,
'doc_count': 79495,
'genres': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 38060,
'buckets': [{'key': 'Drama',
'doc_count': 12232,
'avg_nb_roles': {'value': 18.518067364290385},
'avg_rank': {'value': 5.981429367965072}},
{'key': 'Short',
...
Tree serialization¶
Using to_normalized()
:
>>> response.aggregations.to_normalized()
{'level': 'root',
'key': None,
'value': None,
'children': [{'level': 'decade',
'key': 1990.0,
'value': 79495,
'children': [{'level': 'genres',
'key': 'Drama',
'value': 12232,
'children': [{'level': 'avg_rank',
'key': None,
'value': 5.981429367965072},
{'level': 'avg_nb_roles', 'key': None, 'value': 18.518067364290385}]},
{'level': 'genres',
'key': 'Short',
'value': 12197,
'children': [{'level': 'avg_rank',
'key': None,
'value': 6.311325829450123},
...
Using to_interactive_tree()
:
>>> response.aggregations.to_interactive_tree()
<IResponse>
root
├── decade=1990 79495
│ ├── genres=Documentary 8393
│ │ ├── avg_nb_roles 3.7789824854045038
│ │ └── avg_rank 6.517093241977517
│ ├── genres=Drama 12232
│ │ ├── avg_nb_roles 18.518067364290385
│ │ └── avg_rank 5.981429367965072
│ └── genres=Short 12197
│ ├── avg_nb_roles 3.023284414200213
│ └── avg_rank 6.311325829450123
└── decade=2000 57649
├── genres=Documentary 8639
│ ├── avg_nb_roles 5.581433036231045
│ └── avg_rank 6.980897812811443
├── genres=Drama 11500
│ ├── avg_nb_roles 14.385391304347825
│ └── avg_rank 6.269675415719865
└── genres=Short 13451
├── avg_nb_roles 4.053081555274701
└── avg_rank 6.83625304327684
Tabular serialization¶
Doing so requires to identify a level that will draw the line between:
- grouping levels: those which will be used to identify rows (here decades, and genres), and provide doc_count per row
- columns levels: those which will be used to populate columns and cells (here avg_nb_roles and avg_rank)
The tabular format will suit especially well aggregations with a T shape.
Using to_dataframe()
:
>>> response.aggregations.to_dataframe()
avg_nb_roles avg_rank doc_count
decade genres
1990.0 Drama 18.518067 5.981429 12232
Short 3.023284 6.311326 12197
Documentary 3.778982 6.517093 8393
2000.0 Short 4.053082 6.836253 13451
Drama 14.385391 6.269675 11500
Documentary 5.581433 6.980898 8639
Using to_tabular()
:
>>> response.aggregations.to_tabular()
(['decade', 'genres'],
{(1990.0, 'Drama'): {'doc_count': 12232,
'avg_rank': 5.981429367965072,
'avg_nb_roles': 18.518067364290385},
(1990.0, 'Short'): {'doc_count': 12197,
'avg_rank': 6.311325829450123,
'avg_nb_roles': 3.023284414200213},
(1990.0, 'Documentary'): {'doc_count': 8393,
'avg_rank': 6.517093241977517,
'avg_nb_roles': 3.7789824854045038},
(2000.0, 'Short'): {'doc_count': 13451,
'avg_rank': 6.83625304327684,
'avg_nb_roles': 4.053081555274701},
(2000.0, 'Drama'): {'doc_count': 11500,
'avg_rank': 6.269675415719865,
'avg_nb_roles': 14.385391304347825},
(2000.0, 'Documentary'): {'doc_count': 8639,
'avg_rank': 6.980897812811443,
'avg_nb_roles': 5.581433036231045}})
Note
TODO - explain parameters:
- index_orient
- grouped_by
- expand_columns
- expand_sep
- normalize
- with_single_bucket_groups
Interactive features¶
Features described in this module are primarly designed for interactive usage, for instance in an ipython shell<https://ipython.org/>_, since one of the key features is the intuitive usage provided by auto-completion.
Cluster indices discovery¶
discover()
function list all indices on a cluster matching a provided pattern:
>>> from elasticsearch import Elasticsearch
>>> from pandagg.discovery import discover
>>> client = Elasticsearch(hosts=['xxx'])
>>> indices = discover(client, index='mov*')
>>> indices
<Indices> ['movies', 'movies_fake']
Each of the indices is accessible via autocompletion:
>>> indices.movies
<Index 'movies'>
An Index
exposes: settings, mapping (interactive), aliases and name:
>>> movies = indices.movies
>>> movies.settings
{'index': {'creation_date': '1591824202943',
'number_of_shards': '1',
'number_of_replicas': '1',
'uuid': 'v6Amj9x1Sk-trBShI-188A',
'version': {'created': '7070199'},
'provided_name': 'movies'}}
>>> movies.mapping
<Mapping>
_
├── directors [Nested]
│ ├── director_id Keyword
│ ├── first_name Text
│ │ └── raw ~ Keyword
│ ├── full_name Text
│ │ └── raw ~ Keyword
│ ├── genres Keyword
│ └── last_name Text
│ └── raw ~ Keyword
├── genres Keyword
├── movie_id Keyword
├── name Text
│ └── raw ~ Keyword
├── nb_directors Integer
├── nb_roles Integer
├── rank Float
├── roles [Nested]
│ ├── actor_id Keyword
│ ├── first_name Text
│ │ └── raw ~ Keyword
│ ├── full_name Text
│ │ └── raw ~ Keyword
│ ├── gender Keyword
│ ├── last_name Text
│ │ └── raw ~ Keyword
│ └── role Keyword
└── year Integer
Note
Examples will be based on IMDB dataset data.
Search
class is intended to perform request (see Search)
>>> from pandagg.search import Search
>>>
>>> client = ElasticSearch(hosts=['localhost:9200'])
>>> search = Search(using=client, index='movies')\
>>> .size(2)\
>>> .groupby('decade', 'histogram', interval=10, field='year')\
>>> .groupby('genres', size=3)\
>>> .aggs('avg_rank', 'avg', field='rank')\
>>> .aggs('avg_nb_roles', 'avg', field='nb_roles')\
>>> .filter('range', year={"gte": 1990})
>>> search
{
"query": {
"bool": {
"filter": [
{
"range": {
"year": {
"gte": 1990
}
}
}
]
}
},
"aggs": {
"decade": {
"histogram": {
"field": "year",
"interval": 10
},
"aggs": {
"genres": {
"terms": {
...
..truncated..
...
}
}
},
"size": 2
}
It relies on:
Aggs
to build aggregations (see Aggregation)>>> search._query.show() <Query> bool └── filter └── range, field=year, gte=1990
>>> search._aggs.show() <Aggregations> decade <histogram, field="year", interval=10> └── genres <terms, field="genres", size=3> ├── avg_nb_roles <avg, field="nb_roles"> └── avg_rank <avg, field="rank">
Executing a Search
request using execute()
will return a
Response
instance (see Response).
>>> response = search.execute()
>>> response
<Response> took 58ms, success: True, total result >=10000, contains 2 hits
>>> response.hits.hits
[<Hit 640> score=0.00, <Hit 641> score=0.00]
>>> response.aggregations.to_dataframe()
avg_nb_roles avg_rank doc_count
decade genres
1990.0 Drama 18.518067 5.981429 12232
Short 3.023284 6.311326 12197
Documentary 3.778982 6.517093 8393
2000.0 Short 4.053082 6.836253 13451
Drama 14.385391 6.269675 11500
Documentary 5.581433 6.980898 8639
On top of that some interactive features are available (see Interactive features).
IMDB dataset¶
You might know the Internet Movie Database, commonly called IMDB.
Well it’s a simple example to showcase some of Elasticsearch capabilities.
In this case, relational databases (SQL) are a good fit to store with consistence this kind of data. Yet indexing some of this data in a optimized search engine will allow more powerful queries.
Query requirements¶
In this example, we’ll suppose most usage/queries requirements will be around the concept of movie (rather than usages focused on fetching actors or directors, even though it will still be possible with this data structure).
The index should provide good performances trying to answer these kind question (non-exhaustive):
- in which movies this actor played?
- what movies genres were most popular among decades?
- which actors have played in best-rated movies, or worst-rated movies?
- which actors movies directors prefer to cast in their movies?
- which are best ranked movies of last decade in Action or Documentary genres?
- …
Data source¶
I exported following SQL tables from MariaDB following these instructions.
Relational schema is the following:
imdb tables
Index mapping¶
Overview¶
The base unit (document) will be a movie, having a name, rank (ratings), year of release, a list of actors and a list of directors.
Schematically:
Movie:
- name
- year
- rank
- [] genres
- [] directors
- [] actor roles
Which fields require nesting?¶
Since genres contain a single keyword field, in no case we need it to be stored as a nested field. On the contrary, actor roles and directors require a nested mapping if we consider applying multiple simultanous query clauses on their sub-fields (for instance search movie in which actor is a woman AND whose role is nurse). More information on distinction between array and nested fields here.
Text or keyword fields?¶
Some fields are easy to choose, in no situation gender will require a full text search, thus we’ll store it as a keyword. On the other hand actors and directors names (first and last) will require full-text search, we’ll thus opt for a text field. Yet we might want to aggregate on exact keywords to count number of movies per actor for instance. More inforamtion on distinction between text and keyword fields here
Mapping¶
<Mapping>
_
├── directors [Nested]
│ ├── director_id Keyword
│ ├── first_name Text
│ │ └── raw ~ Keyword
│ ├── full_name Text
│ │ └── raw ~ Keyword
│ ├── genres Keyword
│ └── last_name Text
│ └── raw ~ Keyword
├── genres Keyword
├── movie_id Keyword
├── name Text
│ └── raw ~ Keyword
├── nb_directors Integer
├── nb_roles Integer
├── rank Float
├── roles [Nested]
│ ├── actor_id Keyword
│ ├── first_name Text
│ │ └── raw ~ Keyword
│ ├── full_name Text
│ │ └── raw ~ Keyword
│ ├── gender Keyword
│ ├── last_name Text
│ │ └── raw ~ Keyword
│ └── role Keyword
└── year Integer
Steps to start playing with your index¶
You can either directly use the demo index available here
with credentials user: pandagg
, password: pandagg
:
Access it with following client instantiation:
from elasticsearch import Elasticsearch
client = Elasticsearch(
hosts=['https://beba020ee88d49488d8f30c163472151.eu-west-2.aws.cloud.es.io:9243/'],
http_auth=('pandagg', 'pandagg')
)
Or follow below steps to install it yourself locally.
In this case, you can either generate yourself the files, or download them from here (file md5 b363dee23720052501e24d15361ed605
).
Dump tables¶
Follow instruction on bottom of https://relational.fit.cvut.cz/dataset/IMDb page and dump following tables in a directory:
- movies.csv
- movies_genres.csv
- movies_directors.csv
- directors.csv
- directors_genres.csv
- roles.csv
- actors.csv
Clone pandagg and setup environment¶
git clone git@github.com:alkemics/pandagg.git
cd pandagg
virtualenv env
python setup.py develop
pip install pandas simplejson jupyter seaborn
Then copy conf.py.dist
file into conf.py
and edit variables as suits you, for instance:
# your cluster address
ES_HOST = 'localhost:9200'
# where your table dumps are stored, and where serialized output will be written
DATA_DIR = '/path/to/dumps/'
OUTPUT_FILE_NAME = 'serialized.json'
Serialize movie documents and insert them¶
# generate serialized movies documents, ready to be inserted in ES
# can take a while
python examples/imdb/serialize.py
# create index with mapping if necessary, bulk insert documents in ES
python examples/imdb/load.py
Explore pandagg notebooks¶
An example notebook is available to showcase some of pandagg
functionalities: here it is.
Code is present in examples/imdb/IMDB exploration.py
file.
pandagg package¶
Subpackages¶
pandagg.interactive package¶
Submodules¶
Module contents¶
pandagg.node package¶
Subpackages¶
pandagg.node.aggs package¶
-
class
pandagg.node.aggs.abstract.
AggNode
(name, meta=None, **body)[source]¶ Bases:
pandagg.node._node.Node
Wrapper around elasticsearch aggregation concept. https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations.html
Each aggregation can be seen both a Node that can be encapsulated in a parent agg.
Define a method to build aggregation request.
-
BLACKLISTED_MAPPING_TYPES
= None¶
-
KEY
= None¶
-
VALUE_ATTRS
= None¶
-
WHITELISTED_MAPPING_TYPES
= None¶
-
get_filter
(key)[source]¶ Return filter query to list documents having this aggregation key. :param key: string :return: elasticsearch filter query
-
to_dict
(with_name=False)[source]¶ ElasticSearch aggregation queries follow this formatting:
{ "<aggregation_name>" : { "<aggregation_type>" : { <aggregation_body> } [,"meta" : { [<meta_data_body>] } ]? } }
Query dict returns the following part (without aggregation name):
{ "<aggregation_type>" : { <aggregation_body> } [,"meta" : { [<meta_data_body>] } ]? }
-
-
class
pandagg.node.aggs.abstract.
BucketAggNode
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.AggNode
Bucket aggregation have special abilities: they can encapsulate other aggregations as children. Each time, the extracted value is a ‘doc_count’.
Provide methods: - to build aggregation request (with children aggregations) - to to extract buckets from raw response - to build query to filter documents belonging to that bucket
Note: the aggs attribute’s only purpose is for children initiation with the following syntax: >>> from pandagg.aggs import Terms, Avg >>> agg = Terms( >>> name=’term_agg’, >>> field=’some_path’, >>> aggs=[ >>> Avg(agg_name=’avg_agg’, field=’some_other_path’) >>> ] >>> )
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.node.aggs.abstract.
FieldOrScriptMetricAgg
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MetricAgg
Metric aggregation based on single field.
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.node.aggs.abstract.
MetricAgg
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.AggNode
Metric aggregation are aggregations providing a single bucket, with value attributes to be extracted.
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.node.aggs.abstract.
MultipleBucketAgg
(name, keyed=None, key_path='key', meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.BucketAggNode
-
IMPLICIT_KEYED
= False¶
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.node.aggs.abstract.
Pipeline
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.UniqueBucketAgg
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.node.aggs.abstract.
ScriptPipeline
(name, script, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= None¶
-
VALUE_ATTRS
= 'value'¶
-
-
class
pandagg.node.aggs.abstract.
ShadowRoot
[source]¶ Bases:
pandagg.node.aggs.abstract.UniqueBucketAgg
Not a real aggregation.
-
KEY
= 'shadow_root'¶
-
-
class
pandagg.node.aggs.abstract.
UniqueBucketAgg
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.BucketAggNode
Aggregations providing a single bucket.
-
VALUE_ATTRS
= None¶
-
Not implemented aggregations include: - children agg - geo-distance - geo-hash grid - ipv4 - sampler - significant terms
-
class
pandagg.node.aggs.bucket.
Composite
(name, keyed=None, key_path='key', meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MultipleBucketAgg
-
KEY
= 'composite'¶
-
-
class
pandagg.node.aggs.bucket.
DateHistogram
(name, field, interval=None, calendar_interval=None, fixed_interval=None, meta=None, keyed=False, key_as_string=True, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MultipleBucketAgg
-
KEY
= 'date_histogram'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['date']¶
-
-
class
pandagg.node.aggs.bucket.
DateRange
(name, field, key_as_string=True, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.bucket.Range
-
KEY
= 'date_range'¶
-
KEY_SEP
= '::'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['date']¶
-
-
class
pandagg.node.aggs.bucket.
Filter
(name, filter=None, meta=None, **kwargs)[source]¶ Bases:
pandagg.node.aggs.abstract.UniqueBucketAgg
-
KEY
= 'filter'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
-
class
pandagg.node.aggs.bucket.
Filters
(name, filters, other_bucket=False, other_bucket_key=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MultipleBucketAgg
-
DEFAULT_OTHER_KEY
= '_other_'¶
-
IMPLICIT_KEYED
= True¶
-
KEY
= 'filters'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
-
class
pandagg.node.aggs.bucket.
Global
(name, meta=None)[source]¶ Bases:
pandagg.node.aggs.abstract.UniqueBucketAgg
-
KEY
= 'global'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
-
class
pandagg.node.aggs.bucket.
Histogram
(name, field, interval, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MultipleBucketAgg
-
KEY
= 'histogram'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.bucket.
Missing
(name, field, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.UniqueBucketAgg
-
BLACKLISTED_MAPPING_TYPES
= []¶
-
KEY
= 'missing'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
-
class
pandagg.node.aggs.bucket.
Nested
(name, path, meta=None)[source]¶ Bases:
pandagg.node.aggs.abstract.UniqueBucketAgg
-
KEY
= 'nested'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['nested']¶
-
-
class
pandagg.node.aggs.bucket.
Range
(name, field, ranges, keyed=False, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MultipleBucketAgg
-
KEY
= 'range'¶
-
KEY_SEP
= '-'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
from_key
¶
-
to_key
¶
-
-
class
pandagg.node.aggs.bucket.
ReverseNested
(name, path=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.UniqueBucketAgg
-
KEY
= 'reverse_nested'¶
-
VALUE_ATTRS
= ['doc_count']¶
-
WHITELISTED_MAPPING_TYPES
= ['nested']¶
-
-
class
pandagg.node.aggs.bucket.
Terms
(name, field, missing=None, size=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MultipleBucketAgg
Terms aggregation.
-
BLACKLISTED_MAPPING_TYPES
= []¶
-
KEY
= 'terms'¶
-
VALUE_ATTRS
= ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']¶
-
-
class
pandagg.node.aggs.metric.
Avg
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'avg'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
Cardinality
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'cardinality'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.metric.
ExtendedStats
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'extended_stats'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
GeoBound
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'geo_bounds'¶
-
VALUE_ATTRS
= ['bounds']¶
-
WHITELISTED_MAPPING_TYPES
= ['geo_point']¶
-
-
class
pandagg.node.aggs.metric.
GeoCentroid
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'geo_centroid'¶
-
VALUE_ATTRS
= ['location']¶
-
WHITELISTED_MAPPING_TYPES
= ['geo_point']¶
-
-
class
pandagg.node.aggs.metric.
Max
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'max'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
Min
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'min'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
PercentileRanks
(name, field, values, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'percentile_ranks'¶
-
VALUE_ATTRS
= ['values']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
Percentiles
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
Percents body argument can be passed to specify which percentiles to fetch.
-
KEY
= 'percentiles'¶
-
VALUE_ATTRS
= ['values']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
Stats
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'stats'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
Sum
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.FieldOrScriptMetricAgg
-
KEY
= 'sum'¶
-
VALUE_ATTRS
= ['value']¶
-
WHITELISTED_MAPPING_TYPES
= ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']¶
-
-
class
pandagg.node.aggs.metric.
TopHits
(name, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.MetricAgg
-
KEY
= 'top_hits'¶
-
VALUE_ATTRS
= ['hits']¶
-
Pipeline aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations-pipeline.html
-
class
pandagg.node.aggs.pipeline.
AvgBucket
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'avg_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
BucketScript
(name, script, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.ScriptPipeline
-
KEY
= 'bucket_script'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
BucketSelector
(name, script, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.ScriptPipeline
-
KEY
= 'bucket_selector'¶
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.node.aggs.pipeline.
BucketSort
(name, script, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.ScriptPipeline
-
KEY
= 'bucket_sort'¶
-
VALUE_ATTRS
= None¶
-
-
class
pandagg.node.aggs.pipeline.
CumulativeSum
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'cumulative_sum'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
Derivative
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'derivative'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
ExtendedStatsBucket
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'extended_stats_bucket'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']¶
-
-
class
pandagg.node.aggs.pipeline.
MaxBucket
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'max_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
MinBucket
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'min_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
MovingAvg
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'moving_avg'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
PercentilesBucket
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'percentiles_bucket'¶
-
VALUE_ATTRS
= ['values']¶
-
-
class
pandagg.node.aggs.pipeline.
SerialDiff
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'serial_diff'¶
-
VALUE_ATTRS
= ['value']¶
-
-
class
pandagg.node.aggs.pipeline.
StatsBucket
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'stats_bucket'¶
-
VALUE_ATTRS
= ['count', 'min', 'max', 'avg', 'sum']¶
-
-
class
pandagg.node.aggs.pipeline.
SumBucket
(name, buckets_path, gap_policy=None, meta=None, **body)[source]¶ Bases:
pandagg.node.aggs.abstract.Pipeline
-
KEY
= 'sum_bucket'¶
-
VALUE_ATTRS
= ['value']¶
-
pandagg.node.mapping package¶
-
class
pandagg.node.mapping.abstract.
Field
(name, key, **body)[source]¶ Bases:
pandagg.node._node.Node
-
body
¶
-
-
class
pandagg.node.mapping.abstract.
ShadowRoot
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedComplexField
-
KEY
= '_'¶
-
-
class
pandagg.node.mapping.abstract.
UnnamedComplexField
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
-
KEY
= None¶
-
-
class
pandagg.node.mapping.abstract.
UnnamedField
(**body)[source]¶ Bases:
object
-
KEY
= None¶
-
classmethod
get_dsl_class
(name)¶
-
-
class
pandagg.node.mapping.abstract.
UnnamedRegularField
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
-
KEY
= None¶
-
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
-
class
pandagg.node.mapping.field_datatypes.
Alias
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Defines an alias to an existing field.
-
KEY
= 'alias'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Binary
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'binary'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Boolean
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'boolean'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Byte
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'byte'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Completion
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To provide auto-complete suggestions
-
KEY
= 'completion'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Date
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'date'¶
-
-
class
pandagg.node.mapping.field_datatypes.
DateNanos
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'date_nanos'¶
-
-
class
pandagg.node.mapping.field_datatypes.
DateRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'date_range'¶
-
-
class
pandagg.node.mapping.field_datatypes.
DenseVector
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record dense vectors of float values.
-
KEY
= 'dense_vector'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Double
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'double'¶
-
-
class
pandagg.node.mapping.field_datatypes.
DoubleRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'double_range'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Flattened
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Allows an entire JSON object to be indexed as a single field.
-
KEY
= 'flattened'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Float
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'float'¶
-
-
class
pandagg.node.mapping.field_datatypes.
FloatRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'float_range'¶
-
-
class
pandagg.node.mapping.field_datatypes.
GeoPoint
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For lat/lon points
-
KEY
= 'geo_point'¶
-
-
class
pandagg.node.mapping.field_datatypes.
GeoShape
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For complex shapes like polygons
-
KEY
= 'geo_shape'¶
-
-
class
pandagg.node.mapping.field_datatypes.
HalfFloat
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'half_float'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Histogram
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For pre-aggregated numerical values for percentiles aggregations.
-
KEY
= 'histogram'¶
-
-
class
pandagg.node.mapping.field_datatypes.
IP
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
for IPv4 and IPv6 addresses
-
KEY
= 'IP'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Integer
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'integer'¶
-
-
class
pandagg.node.mapping.field_datatypes.
IntegerRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'integer_range'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Join
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Defines parent/child relation for documents within the same index
-
KEY
= 'join'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Keyword
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'keyword'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Long
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'long'¶
-
-
class
pandagg.node.mapping.field_datatypes.
LongRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'long_range'¶
-
-
class
pandagg.node.mapping.field_datatypes.
MapperAnnotatedText
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To index text containing special markup (typically used for identifying named entities)
-
KEY
= 'annotated-text'¶
-
-
class
pandagg.node.mapping.field_datatypes.
MapperMurMur3
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To compute hashes of values at index-time and store them in the index
-
KEY
= 'murmur3'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Nested
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedComplexField
-
KEY
= 'nested'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Object
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedComplexField
-
KEY
= 'object'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Percolator
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Accepts queries from the query-dsl
-
KEY
= 'percolator'¶
-
-
class
pandagg.node.mapping.field_datatypes.
RankFeature
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record numeric feature to boost hits at query time.
-
KEY
= 'rank_feature'¶
-
-
class
pandagg.node.mapping.field_datatypes.
RankFeatures
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record numeric features to boost hits at query time.
-
KEY
= 'rank_features'¶
-
-
class
pandagg.node.mapping.field_datatypes.
ScaledFloat
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'scaled_float'¶
-
-
class
pandagg.node.mapping.field_datatypes.
SearchAsYouType
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
A text-like field optimized for queries to implement as-you-type completion
-
KEY
= 'search_as_you_type'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Shape
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For arbitrary cartesian geometries.
-
KEY
= 'shape'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Short
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'short'¶
-
-
class
pandagg.node.mapping.field_datatypes.
SparseVector
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record sparse vectors of float values.
-
KEY
= 'sparse_vector'¶
-
-
class
pandagg.node.mapping.field_datatypes.
Text
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'text'¶
-
-
class
pandagg.node.mapping.field_datatypes.
TokenCount
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To count the number of tokens in a string
-
KEY
= 'token_count'¶
-
-
class
pandagg.node.mapping.meta_fields.
FieldNames
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
All fields in the document which contain non-null values.
-
KEY
= '_field_names'¶
-
-
class
pandagg.node.mapping.meta_fields.
Id
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The document’s ID.
-
KEY
= '_id'¶
-
-
class
pandagg.node.mapping.meta_fields.
Ignored
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
All fields in the document that have been ignored at index time because of ignore_malformed.
-
KEY
= '_ignored'¶
-
-
class
pandagg.node.mapping.meta_fields.
Index
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The index to which the document belongs.
-
KEY
= '_index'¶
-
-
class
pandagg.node.mapping.meta_fields.
Meta
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
Application specific metadata.
-
KEY
= '_meta'¶
-
-
class
pandagg.node.mapping.meta_fields.
Routing
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
A custom routing value which routes a document to a particular shard.
-
KEY
= '_routing'¶
-
-
class
pandagg.node.mapping.meta_fields.
Size
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The size of the _source field in bytes, provided by the mapper-size plugin.
-
KEY
= '_size'¶
-
-
class
pandagg.node.mapping.meta_fields.
Source
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The original JSON representing the body of the document.
-
KEY
= '_source'¶
-
-
class
pandagg.node.mapping.meta_fields.
Type
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The document’s mapping type.
-
KEY
= '_type'¶
-
pandagg.node.query package¶
-
class
pandagg.node.query.abstract.
AbstractSingleFieldQueryClause
(field, _name=None, **body)[source]¶
-
class
pandagg.node.query.abstract.
FlatFieldQueryClause
(field, _name=None, **body)[source]¶ Bases:
pandagg.node.query.abstract.AbstractSingleFieldQueryClause
Query clause applied on one single field. Example:
Exists: {“exists”: {“field”: “user”}} -> field = “user” -> body = {“field”: “user”} q = Exists(field=”user”)
DistanceFeature: {“distance_feature”: {“field”: “production_date”, “pivot”: “7d”, “origin”: “now”}} -> field = “production_date” -> body = {“field”: “production_date”, “pivot”: “7d”, “origin”: “now”} q = DistanceFeature(field=”production_date”, pivot=”7d”, origin=”now”)
-
class
pandagg.node.query.abstract.
KeyFieldQueryClause
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.AbstractSingleFieldQueryClause
Clause with field used as key in clause body:
Term: {“term”: {“user”: {“value”: “Kimchy”, “boost”: 1}}} -> field = “user” -> body = {“user”: {“value”: “Kimchy”, “boost”: 1}} q1 = Term(user={“value”: “Kimchy”, “boost”: 1}}) q2 = Term(field=”user”, value=”Kimchy”, boost=1}})
Can accept a “_implicit_param” attribute specifying which is the equivalent key when inner body isn’t a dict but a raw value. For Term: _implicit_param = “value” q = Term(user=”Kimchy”) {“term”: {“user”: {“value”: “Kimchy”}}} -> field = “user” -> body = {“term”: {“user”: {“value”: “Kimchy”}}}
-
class
pandagg.node.query.abstract.
ParentParameterClause
(**body)[source]¶ Bases:
pandagg.node.query.abstract.QueryClause
-
MULTIPLE
= False¶
-
-
class
pandagg.node.query.compound.
Bool
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'bool'¶
-
-
class
pandagg.node.query.compound.
Boosting
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'boosting'¶
-
-
class
pandagg.node.query.compound.
CompoundClause
(**body)[source]¶ Bases:
pandagg.node.query.abstract.QueryClause
Compound clauses can encapsulate other query clauses:
-
class
pandagg.node.query.compound.
ConstantScore
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'constant_score'¶
-
-
class
pandagg.node.query.compound.
DisMax
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'dis_max'¶
-
-
class
pandagg.node.query.compound.
FunctionScore
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'function_score'¶
-
-
class
pandagg.node.query.full_text.
Common
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'common'¶
-
-
class
pandagg.node.query.full_text.
Intervals
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'intervals'¶
-
-
class
pandagg.node.query.full_text.
Match
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'match'¶
-
-
class
pandagg.node.query.full_text.
MatchBoolPrefix
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'match_bool_prefix'¶
-
-
class
pandagg.node.query.full_text.
MatchPhrase
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'match_phrase'¶
-
-
class
pandagg.node.query.full_text.
MatchPhrasePrefix
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'match_phrase_prefix'¶
-
-
class
pandagg.node.query.full_text.
MultiMatch
(fields, _name=None, **body)[source]¶ Bases:
pandagg.node.query.abstract.MultiFieldsQueryClause
-
KEY
= 'multi_match'¶
-
-
class
pandagg.node.query.full_text.
QueryString
(**body)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'query_string'¶
-
-
class
pandagg.node.query.full_text.
SimpleQueryString
(**body)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'simple_string'¶
-
-
class
pandagg.node.query.geo.
GeoBoundingBox
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'geo_bounding_box'¶
-
-
class
pandagg.node.query.geo.
GeoDistance
(distance, **body)[source]¶ Bases:
pandagg.node.query.abstract.AbstractSingleFieldQueryClause
-
KEY
= 'geo_distance'¶
-
-
class
pandagg.node.query.geo.
GeoPolygone
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'geo_polygon'¶
-
-
class
pandagg.node.query.geo.
GeoShape
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'geo_shape'¶
-
-
class
pandagg.node.query.joining.
HasChild
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'has_child'¶
-
-
class
pandagg.node.query.joining.
HasParent
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'has_parent'¶
-
-
class
pandagg.node.query.joining.
Nested
(path, **kwargs)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'nested'¶
-
-
class
pandagg.node.query.joining.
ParentId
(**body)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'parent_id'¶
-
-
class
pandagg.node.query.shape.
Shape
(**body)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'shape'¶
-
-
class
pandagg.node.query.specialized.
DistanceFeature
(field, _name=None, **body)[source]¶ Bases:
pandagg.node.query.abstract.FlatFieldQueryClause
-
KEY
= 'distance_feature'¶
-
-
class
pandagg.node.query.specialized.
MoreLikeThis
(fields, _name=None, **body)[source]¶ Bases:
pandagg.node.query.abstract.MultiFieldsQueryClause
-
KEY
= 'more_like_this'¶
-
-
class
pandagg.node.query.specialized.
Percolate
(field, _name=None, **body)[source]¶ Bases:
pandagg.node.query.abstract.FlatFieldQueryClause
-
KEY
= 'percolate'¶
-
-
class
pandagg.node.query.specialized.
RankFeature
(field, _name=None, **body)[source]¶ Bases:
pandagg.node.query.abstract.FlatFieldQueryClause
-
KEY
= 'rank_feature'¶
-
-
class
pandagg.node.query.specialized.
Script
(**body)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'script'¶
-
-
class
pandagg.node.query.specialized.
Wrapper
(**body)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'wrapper'¶
-
-
class
pandagg.node.query.specialized_compound.
PinnedQuery
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'pinned'¶
-
-
class
pandagg.node.query.specialized_compound.
ScriptScore
(**body)[source]¶ Bases:
pandagg.node.query.compound.CompoundClause
-
KEY
= 'script_score'¶
-
-
class
pandagg.node.query.term_level.
Exists
(field, _name=None)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'exists'¶
-
-
class
pandagg.node.query.term_level.
Fuzzy
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'fuzzy'¶
-
-
class
pandagg.node.query.term_level.
Ids
(values, _name=None)[source]¶ Bases:
pandagg.node.query.abstract.LeafQueryClause
-
KEY
= 'ids'¶
-
-
class
pandagg.node.query.term_level.
Prefix
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'prefix'¶
-
-
class
pandagg.node.query.term_level.
Range
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'range'¶
-
-
class
pandagg.node.query.term_level.
Regexp
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'regexp'¶
-
-
class
pandagg.node.query.term_level.
Term
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'term'¶
-
-
class
pandagg.node.query.term_level.
Terms
(**body)[source]¶ Bases:
pandagg.node.query.abstract.AbstractSingleFieldQueryClause
-
KEY
= 'terms'¶
-
-
class
pandagg.node.query.term_level.
TermsSet
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'terms_set'¶
-
-
class
pandagg.node.query.term_level.
Type
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'type'¶
-
-
class
pandagg.node.query.term_level.
Wildcard
(field=None, _name=None, _expand__to_dot=True, **params)[source]¶ Bases:
pandagg.node.query.abstract.KeyFieldQueryClause
-
KEY
= 'wildcard'¶
-
pandagg.node.response package¶
-
class
pandagg.node.response.bucket.
Bucket
(value, key=None, level=None)[source]¶ Bases:
pandagg.node._node.Node
-
ROOT_NAME
= 'root'¶
-
attr_name
¶ Determine under which attribute name the bucket will be available in response tree. Dots are replaced by _ characters so that they don’t prevent from accessing as attribute.
Resulting attribute unfit for python attribute name syntax is still possible and will be accessible through item access (dict like), see more in ‘utils.Obj’ for more details.
-
Module contents¶
pandagg.tree package¶
Subpackages¶
pandagg.tree.aggs package¶
-
class
pandagg.tree.aggs.aggs.
AbstractLeafAgg
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.Aggs
-
KEY
= None¶ Allow following syntax:
>>> a = Avg("my_terms_agg", field="yolo")
-
-
class
pandagg.tree.aggs.aggs.
AbstractParentAgg
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.Aggs
-
KEY
= None¶ Allow following syntax:
>>> a = Terms("my_terms_agg", field="yolo", aggs={...})
-
-
class
pandagg.tree.aggs.aggs.
Aggs
(*args, **kwargs)[source]¶ Bases:
pandagg.tree._tree.Tree
Combination of aggregation clauses. This class provides handful methods to build an aggregation (see
aggs()
andgroupby()
), and is used as well to parse aggregations response in handy formats.Mapping declaration is optional, but doing so validates aggregation validity and automatically handles missing nested clauses.
All following syntaxes are identical:
From a dict:
>>> Aggs({"per_user":{"terms":{"field":"user"}}})
Using shortcut declaration: first argument is the aggregation type, other arguments are aggregation body parameters:
>>> Aggs('terms', name='per_user', field='user')
Using DSL class:
>>> from pandagg.aggs import Terms >>> Aggs(Terms('per_user', field='user'))
Dict and DSL class syntaxes allow to provide multiple clauses aggregations:
>>> Aggs({"per_user":{"terms":{"field":"user"}, "aggs": {"avg_age": {"avg": {"field": "age"}}}}})
Which is similar to:
>>> from pandagg.aggs import Terms, Avg >>> Terms('per_user', field='user', aggs=Avg('avg_age', field='age'))
Keyword Arguments: - mapping (
dict
orpandagg.tree.mapping.Mapping
) – Mapping of requested indice(s). Providing it will validate aggregations validity, and add required nested clauses if missing. - nested_autocorrect (
bool
) – In case of missing nested clauses in aggregation, if True, automatically add missing nested clauses, else raise error. - remaining kwargs: Used as body in aggregation
-
aggs
(*args, **kwargs)[source]¶ Arrange passed aggregations “horizontally”.
Given the initial aggregation:
A──> B └──> C
If passing multiple aggregations with insert_below = ‘A’:
A──> B └──> C └──> new1 └──> new2
Note: those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity:
OK:
A──> B ─> C ─> new
KO:
A──> B └──> C
args accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
deepest_linear_bucket_agg
¶ Return deepest bucket aggregation node (pandagg.nodes.abstract.BucketAggNode) of that aggregation that neither has siblings, nor has an ancestor with siblings.
-
groupby
(*args, **kwargs)[source]¶ Arrange passed aggregations in vertical/nested manner, above or below another agg clause.
Given the initial aggregation:
A──> B └──> C
If insert_below = ‘A’:
A──> new──> B └──> C
If insert_above = ‘B’:
A──> new──> B └──> C
by argument accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children:
A──> B : OK generates A──> B ─> C ─> by A──> B : KO, ambiguous, must precise either A, B or C └──> C
Accepted all Aggs.__init__ syntaxes
>>> Aggs()\ >>> .groupby('terms', name='per_user_id', field='user_id') {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Passing a dict:
>>> Aggs().groupby({"terms_on_my_field":{"terms":{"field":"some_field"}}}) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Using DSL class:
>>> from pandagg.aggs import Terms >>> Aggs().groupby(Terms('terms_on_my_field', field='some_field')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Shortcut syntax for terms aggregation: creates a terms aggregation, using field as aggregation name
>>> Aggs().groupby('some_field') {"some_field":{"terms":{"field":"some_field"}}}
Using a Aggs object:
>>> Aggs().groupby(Aggs('per_user_id', 'terms', field='user_id')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Accepted declarations for multiple aggregations:
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - insert_above (
string
) – Aggregation name above which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
node_class
¶ alias of
pandagg.node.aggs.abstract.AggNode
-
show
(*args, **kwargs)[source]¶ Return tree structure in hierarchy style.
Parameters: - nid – Node identifier from which tree traversal will start. If None tree root will be used
- filter_ – filter function performed on nodes. Nodes excluded from filter function nor their children won’t be displayed
- reverse – the
reverse
param for sortingNode
objects in the same level - key – key used to order nodes of same parent
- reverse – reverse parameter applied at sorting
- line_type – display type choice
- limit – int, truncate tree display to this number of lines
- kwargs – kwargs params passed to node
line_repr
method
Return type: unicode in python2, str in python3
- mapping (
-
class
pandagg.tree.aggs.bucket.
Composite
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'composite'¶
-
-
class
pandagg.tree.aggs.bucket.
DateHistogram
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'date_histogram'¶
-
-
class
pandagg.tree.aggs.bucket.
DateRange
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'date_range'¶
-
-
class
pandagg.tree.aggs.bucket.
Filter
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'filter'¶
-
-
class
pandagg.tree.aggs.bucket.
Filters
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'filters'¶
-
-
class
pandagg.tree.aggs.bucket.
Global
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'global'¶
-
-
class
pandagg.tree.aggs.bucket.
Histogram
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'histogram'¶
-
-
class
pandagg.tree.aggs.bucket.
Missing
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'missing'¶
-
-
class
pandagg.tree.aggs.bucket.
Nested
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'nested'¶
-
-
class
pandagg.tree.aggs.bucket.
Range
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'range'¶
-
-
class
pandagg.tree.aggs.bucket.
ReverseNested
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'reverse_nested'¶
-
-
class
pandagg.tree.aggs.bucket.
Terms
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'terms'¶
-
-
class
pandagg.tree.aggs.metric.
Avg
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'avg'¶
-
-
class
pandagg.tree.aggs.metric.
Cardinality
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'cardinality'¶
-
-
class
pandagg.tree.aggs.metric.
ExtendedStats
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'extended_stats'¶
-
-
class
pandagg.tree.aggs.metric.
GeoBound
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'geo_bounds'¶
-
-
class
pandagg.tree.aggs.metric.
GeoCentroid
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'geo_centroid'¶
-
-
class
pandagg.tree.aggs.metric.
Max
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'max'¶
-
-
class
pandagg.tree.aggs.metric.
Min
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'min'¶
-
-
class
pandagg.tree.aggs.metric.
PercentileRanks
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'percentile_ranks'¶
-
-
class
pandagg.tree.aggs.metric.
Percentiles
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
Percents body argument can be passed to specify which percentiles to fetch.
-
KEY
= 'percentiles'¶
-
-
class
pandagg.tree.aggs.metric.
Stats
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'stats'¶
-
-
class
pandagg.tree.aggs.metric.
Sum
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'sum'¶
-
-
class
pandagg.tree.aggs.metric.
TopHits
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'top_hits'¶
-
-
class
pandagg.tree.aggs.metric.
ValueCount
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'value_count'¶
-
AbstractParentAgg aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations-pipeline.html
-
class
pandagg.tree.aggs.pipeline.
AvgBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'avg_bucket'¶
-
-
class
pandagg.tree.aggs.pipeline.
BucketScript
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'bucket_script'¶
-
-
class
pandagg.tree.aggs.pipeline.
BucketSelector
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'bucket_selector'¶
-
-
class
pandagg.tree.aggs.pipeline.
BucketSort
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'bucket_sort'¶
-
-
class
pandagg.tree.aggs.pipeline.
CumulativeSum
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'cumulative_sum'¶
-
-
class
pandagg.tree.aggs.pipeline.
Derivative
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'derivative'¶
-
-
class
pandagg.tree.aggs.pipeline.
ExtendedStatsBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'extended_stats_bucket'¶
-
-
class
pandagg.tree.aggs.pipeline.
MaxBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'max_bucket'¶
-
-
class
pandagg.tree.aggs.pipeline.
MinBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'min_bucket'¶
-
-
class
pandagg.tree.aggs.pipeline.
MovingAvg
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'moving_avg'¶
-
-
class
pandagg.tree.aggs.pipeline.
PercentilesBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'percentiles_bucket'¶
-
-
class
pandagg.tree.aggs.pipeline.
SerialDiff
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'serial_diff'¶
-
-
class
pandagg.tree.aggs.pipeline.
StatsBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'stats_bucket'¶
-
-
class
pandagg.tree.aggs.pipeline.
SumBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'sum_bucket'¶
-
pandagg.tree.query package¶
-
class
pandagg.tree.query.abstract.
Compound
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Query
-
KEY
= None¶
-
-
class
pandagg.tree.query.abstract.
Leaf
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Query
-
KEY
= None¶
-
-
class
pandagg.tree.query.abstract.
Query
(*args, **kwargs)[source]¶ Bases:
pandagg.tree._tree.Tree
Combination of query clauses.
Mapping declaration is optional, but doing so validates query validity and automatically inserts nested clauses when necessary.
Keyword Arguments: - mapping (
dict
orpandagg.tree.mapping.Mapping
) – Mapping of requested indice(s). Providing it will add validation features, and add required nested clauses if missing. - nested_autocorrect (
bool
) – In case of missing nested clauses in query, if True, automatically add missing nested clauses, else raise error. - remaining kwargs: Used as body in query clauses.
-
KEY
= None¶
-
node_class
¶
-
query
(*args, **kwargs)[source]¶ Insert new clause(s) in current query.
Inserted clause can accepts following syntaxes.
Given an empty query:
>>> from pandagg.query import Query >>> q = Query()
flat syntax: clause type, followed by query clause body as keyword arguments:
>>> q.query('term', some_field=23) {'term': {'some_field': 23}}
from regular Elasticsearch dict query:
>>> q.query({'term': {'some_field': 23}}) {'term': {'some_field': 23}}
using pandagg DSL:
>>> from pandagg.query import Term >>> q.query(Term(field=23)) {'term': {'some_field': 23}}
Keyword Arguments: - parent (
str
) – named query clause under which the inserted clauses should be placed. - parent_param (
str
optional parameter when using parent param) – parameter under which inserted clauses will be placed. For instance if parent clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - child (
str
) – named query clause above which the inserted clauses should be placed. - child_param (
str
optional parameter when using parent param) – parameter of inserted boolean clause under which child clauses will be placed. For instance if inserted clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - mode (
str
one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.- ‘add’ (default) : adds new clauses keeping initial ones
- ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
- ‘replace_all’ : existing compound clause is completely replaced by the new one
- parent (
-
show
(*args, **kwargs)[source]¶ Return tree structure in hierarchy style.
Parameters: - nid – Node identifier from which tree traversal will start. If None tree root will be used
- filter_ – filter function performed on nodes. Nodes excluded from filter function nor their children won’t be displayed
- reverse – the
reverse
param for sortingNode
objects in the same level - key – key used to order nodes of same parent
- reverse – reverse parameter applied at sorting
- line_type – display type choice
- limit – int, truncate tree display to this number of lines
- kwargs – kwargs params passed to node
line_repr
method
Return type: unicode in python2, str in python3
- mapping (
-
class
pandagg.tree.query.compound.
Bool
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'bool'¶
-
-
class
pandagg.tree.query.compound.
Boosting
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'boosting'¶
-
-
class
pandagg.tree.query.compound.
ConstantScore
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'constant_score'¶
-
-
class
pandagg.tree.query.compound.
DisMax
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'dis_max'¶
-
-
class
pandagg.tree.query.compound.
FunctionScore
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'function_score'¶
-
-
class
pandagg.tree.query.full_text.
Common
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'common'¶
-
-
class
pandagg.tree.query.full_text.
Intervals
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'intervals'¶
-
-
class
pandagg.tree.query.full_text.
Match
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match'¶
-
-
class
pandagg.tree.query.full_text.
MatchBoolPrefix
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match_bool_prefix'¶
-
-
class
pandagg.tree.query.full_text.
MatchPhrase
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match_phrase'¶
-
-
class
pandagg.tree.query.full_text.
MatchPhrasePrefix
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match_phrase_prefix'¶
-
-
class
pandagg.tree.query.full_text.
MultiMatch
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'multi_match'¶
-
-
class
pandagg.tree.query.full_text.
QueryString
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'query_string'¶
-
-
class
pandagg.tree.query.full_text.
SimpleQueryString
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'simple_string'¶
-
-
class
pandagg.tree.query.geo.
GeoBoundingBox
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_bounding_box'¶
-
-
class
pandagg.tree.query.geo.
GeoDistance
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_distance'¶
-
-
class
pandagg.tree.query.geo.
GeoPolygone
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_polygon'¶
-
-
class
pandagg.tree.query.geo.
GeoShape
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_shape'¶
-
-
class
pandagg.tree.query.joining.
HasChild
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'has_child'¶
-
-
class
pandagg.tree.query.joining.
HasParent
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'has_parent'¶
-
-
class
pandagg.tree.query.joining.
Nested
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'nested'¶
-
-
class
pandagg.tree.query.joining.
ParentId
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'parent_id'¶
-
-
class
pandagg.tree.query.shape.
Shape
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'shape'¶
-
-
class
pandagg.tree.query.specialized.
DistanceFeature
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'distance_feature'¶
-
-
class
pandagg.tree.query.specialized.
MoreLikeThis
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'more_like_this'¶
-
-
class
pandagg.tree.query.specialized.
Percolate
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'percolate'¶
-
-
class
pandagg.tree.query.specialized.
RankFeature
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'rank_feature'¶
-
-
class
pandagg.tree.query.specialized.
Script
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'script'¶
-
-
class
pandagg.tree.query.specialized.
Wrapper
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'wrapper'¶
-
-
class
pandagg.tree.query.specialized_compound.
PinnedQuery
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'pinned'¶
-
-
class
pandagg.tree.query.specialized_compound.
ScriptScore
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'script_score'¶
-
-
class
pandagg.tree.query.term_level.
Exists
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'exists'¶
-
-
class
pandagg.tree.query.term_level.
Fuzzy
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'fuzzy'¶
-
-
class
pandagg.tree.query.term_level.
Ids
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'ids'¶
-
-
class
pandagg.tree.query.term_level.
Prefix
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'prefix'¶
-
-
class
pandagg.tree.query.term_level.
Range
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'range'¶
-
-
class
pandagg.tree.query.term_level.
Regexp
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'regexp'¶
-
-
class
pandagg.tree.query.term_level.
Term
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'term'¶
-
-
class
pandagg.tree.query.term_level.
Terms
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'terms'¶
-
-
class
pandagg.tree.query.term_level.
TermsSet
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'terms_set'¶
-
-
class
pandagg.tree.query.term_level.
Type
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'type'¶
-
-
class
pandagg.tree.query.term_level.
Wildcard
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'wildcard'¶
-
Submodules¶
pandagg.tree.mapping module¶
-
class
pandagg.tree.mapping.
Mapping
(*args, **kwargs)[source]¶ Bases:
pandagg.tree._tree.Tree
-
KEY
= None¶
-
get
(key)[source]¶ Get a node by its id. :param nid: str, identifier of node to fetch :rtype: lighttree.node.Node
-
node_class
¶ alias of
pandagg.node.mapping.abstract.Field
-
validate_agg_node
(agg_node, exc=True)[source]¶ Ensure if node has field or path that it exists in mapping, and that required aggregation type if allowed on this kind of field. :param agg_node: AggNode you want to validate on this mapping :param exc: boolean, if set to True raise exception if invalid :rtype: boolean
-
pandagg.tree.response module¶
-
class
pandagg.tree.response.
AggsResponseTree
(aggs, index)[source]¶ Bases:
pandagg.tree._tree.Tree
Tree representation of an ElasticSearch response.
-
bucket_properties
(bucket, properties=None, end_level=None, depth=None)[source]¶ Recursive method returning a given bucket’s properties in the form of an ordered dictionnary. Travel from current bucket through all ancestors until reaching root.
Parameters: - bucket – instance of pandagg.buckets.buckets.Bucket
- properties – OrderedDict accumulator of ‘level’ -> ‘key’
- end_level – optional parameter to specify until which level properties are fetched
- depth – optional parameter to specify a limit number of levels which are fetched
Returns: OrderedDict of structure ‘level’ -> ‘key’
-
get_bucket_filter
(nid)[source]¶ Build query filtering documents belonging to that bucket. Suppose the following configuration:
Base <- filter on base |── Nested_A no filter on A (nested still must be applied for children) | |── SubNested A1 | └── SubNested A2 <- filter on A2 └── Nested_B <- filter on B
-
parse
(raw_response)[source]¶ Build response tree from ElasticSearch aggregation response
Note: if the root aggregation node can generate multiple buckets, a response root is crafted to avoid having multiple roots.
Parameters: raw_response – ElasticSearch aggregation response Returns: self
-
show
(**kwargs)[source]¶ Return tree structure in hierarchy style.
Parameters: - nid – Node identifier from which tree traversal will start. If None tree root will be used
- filter_ – filter function performed on nodes. Nodes excluded from filter function nor their children won’t be displayed
- reverse – the
reverse
param for sortingNode
objects in the same level - key – key used to order nodes of same parent
- reverse – reverse parameter applied at sorting
- line_type – display type choice
- limit – int, truncate tree display to this number of lines
- kwargs – kwargs params passed to node
line_repr
method
Return type: unicode in python2, str in python3
-
Module contents¶
Submodules¶
pandagg.aggs module¶
-
class
pandagg.aggs.
Aggs
(*args, **kwargs)[source]¶ Bases:
pandagg.tree._tree.Tree
Combination of aggregation clauses. This class provides handful methods to build an aggregation (see
aggs()
andgroupby()
), and is used as well to parse aggregations response in handy formats.Mapping declaration is optional, but doing so validates aggregation validity and automatically handles missing nested clauses.
All following syntaxes are identical:
From a dict:
>>> Aggs({"per_user":{"terms":{"field":"user"}}})
Using shortcut declaration: first argument is the aggregation type, other arguments are aggregation body parameters:
>>> Aggs('terms', name='per_user', field='user')
Using DSL class:
>>> from pandagg.aggs import Terms >>> Aggs(Terms('per_user', field='user'))
Dict and DSL class syntaxes allow to provide multiple clauses aggregations:
>>> Aggs({"per_user":{"terms":{"field":"user"}, "aggs": {"avg_age": {"avg": {"field": "age"}}}}})
Which is similar to:
>>> from pandagg.aggs import Terms, Avg >>> Terms('per_user', field='user', aggs=Avg('avg_age', field='age'))
Keyword Arguments: - mapping (
dict
orpandagg.tree.mapping.Mapping
) – Mapping of requested indice(s). Providing it will validate aggregations validity, and add required nested clauses if missing. - nested_autocorrect (
bool
) – In case of missing nested clauses in aggregation, if True, automatically add missing nested clauses, else raise error. - remaining kwargs: Used as body in aggregation
-
aggs
(*args, **kwargs)[source]¶ Arrange passed aggregations “horizontally”.
Given the initial aggregation:
A──> B └──> C
If passing multiple aggregations with insert_below = ‘A’:
A──> B └──> C └──> new1 └──> new2
Note: those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity:
OK:
A──> B ─> C ─> new
KO:
A──> B └──> C
args accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
deepest_linear_bucket_agg
¶ Return deepest bucket aggregation node (pandagg.nodes.abstract.BucketAggNode) of that aggregation that neither has siblings, nor has an ancestor with siblings.
-
groupby
(*args, **kwargs)[source]¶ Arrange passed aggregations in vertical/nested manner, above or below another agg clause.
Given the initial aggregation:
A──> B └──> C
If insert_below = ‘A’:
A──> new──> B └──> C
If insert_above = ‘B’:
A──> new──> B └──> C
by argument accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children:
A──> B : OK generates A──> B ─> C ─> by A──> B : KO, ambiguous, must precise either A, B or C └──> C
Accepted all Aggs.__init__ syntaxes
>>> Aggs()\ >>> .groupby('terms', name='per_user_id', field='user_id') {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Passing a dict:
>>> Aggs().groupby({"terms_on_my_field":{"terms":{"field":"some_field"}}}) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Using DSL class:
>>> from pandagg.aggs import Terms >>> Aggs().groupby(Terms('terms_on_my_field', field='some_field')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Shortcut syntax for terms aggregation: creates a terms aggregation, using field as aggregation name
>>> Aggs().groupby('some_field') {"some_field":{"terms":{"field":"some_field"}}}
Using a Aggs object:
>>> Aggs().groupby(Aggs('per_user_id', 'terms', field='user_id')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Accepted declarations for multiple aggregations:
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - insert_above (
string
) – Aggregation name above which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
node_class
¶ alias of
pandagg.node.aggs.abstract.AggNode
-
show
(*args, **kwargs)[source]¶ Return tree structure in hierarchy style.
Parameters: - nid – Node identifier from which tree traversal will start. If None tree root will be used
- filter_ – filter function performed on nodes. Nodes excluded from filter function nor their children won’t be displayed
- reverse – the
reverse
param for sortingNode
objects in the same level - key – key used to order nodes of same parent
- reverse – reverse parameter applied at sorting
- line_type – display type choice
- limit – int, truncate tree display to this number of lines
- kwargs – kwargs params passed to node
line_repr
method
Return type: unicode in python2, str in python3
- mapping (
-
class
pandagg.aggs.
Terms
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'terms'¶
-
-
class
pandagg.aggs.
Filters
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'filters'¶
-
-
class
pandagg.aggs.
Histogram
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'histogram'¶
-
-
class
pandagg.aggs.
DateHistogram
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'date_histogram'¶
-
-
class
pandagg.aggs.
Range
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'range'¶
-
-
class
pandagg.aggs.
Global
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'global'¶
-
-
class
pandagg.aggs.
Filter
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'filter'¶
-
-
class
pandagg.aggs.
Missing
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'missing'¶
-
-
class
pandagg.aggs.
Nested
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'nested'¶
-
-
class
pandagg.aggs.
ReverseNested
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'reverse_nested'¶
-
-
class
pandagg.aggs.
Avg
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'avg'¶
-
-
class
pandagg.aggs.
Max
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'max'¶
-
-
class
pandagg.aggs.
Sum
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'sum'¶
-
-
class
pandagg.aggs.
Min
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'min'¶
-
-
class
pandagg.aggs.
Cardinality
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'cardinality'¶
-
-
class
pandagg.aggs.
Stats
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'stats'¶
-
-
class
pandagg.aggs.
ExtendedStats
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'extended_stats'¶
-
-
class
pandagg.aggs.
Percentiles
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
Percents body argument can be passed to specify which percentiles to fetch.
-
KEY
= 'percentiles'¶
-
-
class
pandagg.aggs.
PercentileRanks
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'percentile_ranks'¶
-
-
class
pandagg.aggs.
GeoBound
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'geo_bounds'¶
-
-
class
pandagg.aggs.
GeoCentroid
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'geo_centroid'¶
-
-
class
pandagg.aggs.
TopHits
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'top_hits'¶
-
-
class
pandagg.aggs.
ValueCount
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractLeafAgg
-
KEY
= 'value_count'¶
-
-
class
pandagg.aggs.
AvgBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'avg_bucket'¶
-
-
class
pandagg.aggs.
Derivative
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'derivative'¶
-
-
class
pandagg.aggs.
MaxBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'max_bucket'¶
-
-
class
pandagg.aggs.
MinBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'min_bucket'¶
-
-
class
pandagg.aggs.
SumBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'sum_bucket'¶
-
-
class
pandagg.aggs.
StatsBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'stats_bucket'¶
-
-
class
pandagg.aggs.
ExtendedStatsBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'extended_stats_bucket'¶
-
-
class
pandagg.aggs.
PercentilesBucket
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'percentiles_bucket'¶
-
-
class
pandagg.aggs.
MovingAvg
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'moving_avg'¶
-
-
class
pandagg.aggs.
CumulativeSum
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'cumulative_sum'¶
-
-
class
pandagg.aggs.
BucketScript
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'bucket_script'¶
-
-
class
pandagg.aggs.
BucketSelector
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'bucket_selector'¶
-
-
class
pandagg.aggs.
BucketSort
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'bucket_sort'¶
-
-
class
pandagg.aggs.
SerialDiff
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.aggs.aggs.AbstractParentAgg
-
KEY
= 'serial_diff'¶
-
pandagg.connections module¶
-
class
pandagg.connections.
Connections
[source]¶ Bases:
object
Class responsible for holding connections to different clusters. Used as a singleton in this module.
-
configure
(**kwargs)[source]¶ Configure multiple connections at once, useful for passing in config dictionaries obtained from other sources, like Django’s settings or a configuration management tool.
Example:
connections.configure( default={'hosts': 'localhost'}, dev={'hosts': ['esdev1.example.com:9200'], 'sniff_on_start': True}, )
Connections will only be constructed lazily when requested through
get_connection
.
-
create_connection
(alias='default', **kwargs)[source]¶ Construct an instance of
elasticsearch.Elasticsearch
and register it under given alias.
-
get_connection
(alias='default')[source]¶ Retrieve a connection, construct it if necessary (only configuration was passed to us). If a non-string alias has been passed through we assume it’s already a client instance and will just return it as-is.
Raises
KeyError
if no client (or its definition) is registered under the alias.
-
pandagg.discovery module¶
pandagg.exceptions module¶
-
exception
pandagg.exceptions.
AbsentMappingFieldError
[source]¶ Bases:
pandagg.exceptions.MappingError
Field is not present in mapping.
-
exception
pandagg.exceptions.
InvalidAggregation
[source]¶ Bases:
Exception
Wrong aggregation definition
-
exception
pandagg.exceptions.
InvalidOperationMappingFieldError
[source]¶ Bases:
pandagg.exceptions.MappingError
Invalid aggregation type on this mapping field.
pandagg.mapping module¶
-
class
pandagg.mapping.
Mapping
(*args, **kwargs)[source]¶ Bases:
pandagg.tree._tree.Tree
-
KEY
= None¶
-
get
(key)[source]¶ Get a node by its id. :param nid: str, identifier of node to fetch :rtype: lighttree.node.Node
-
node_class
¶ alias of
pandagg.node.mapping.abstract.Field
-
validate_agg_node
(agg_node, exc=True)[source]¶ Ensure if node has field or path that it exists in mapping, and that required aggregation type if allowed on this kind of field. :param agg_node: AggNode you want to validate on this mapping :param exc: boolean, if set to True raise exception if invalid :rtype: boolean
-
-
class
pandagg.mapping.
IMapping
(*args, **kwargs)[source]¶ Bases:
lighttree.interactive.TreeBasedObj
Interactive wrapper upon mapping tree, allowing field navigation and quick access to single clause aggregations computation.
-
class
pandagg.mapping.
Text
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'text'¶
-
-
class
pandagg.mapping.
Keyword
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'keyword'¶
-
-
class
pandagg.mapping.
Long
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'long'¶
-
-
class
pandagg.mapping.
Integer
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'integer'¶
-
-
class
pandagg.mapping.
Short
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'short'¶
-
-
class
pandagg.mapping.
Byte
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'byte'¶
-
-
class
pandagg.mapping.
Double
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'double'¶
-
-
class
pandagg.mapping.
HalfFloat
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'half_float'¶
-
-
class
pandagg.mapping.
ScaledFloat
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'scaled_float'¶
-
-
class
pandagg.mapping.
Date
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'date'¶
-
-
class
pandagg.mapping.
DateNanos
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'date_nanos'¶
-
-
class
pandagg.mapping.
Boolean
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'boolean'¶
-
-
class
pandagg.mapping.
Binary
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'binary'¶
-
-
class
pandagg.mapping.
IntegerRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'integer_range'¶
-
-
class
pandagg.mapping.
Float
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'float'¶
-
-
class
pandagg.mapping.
FloatRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'float_range'¶
-
-
class
pandagg.mapping.
LongRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'long_range'¶
-
-
class
pandagg.mapping.
DoubleRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'double_range'¶
-
-
class
pandagg.mapping.
DateRange
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
-
KEY
= 'date_range'¶
-
-
class
pandagg.mapping.
Object
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedComplexField
-
KEY
= 'object'¶
-
-
class
pandagg.mapping.
Nested
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedComplexField
-
KEY
= 'nested'¶
-
-
class
pandagg.mapping.
GeoPoint
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For lat/lon points
-
KEY
= 'geo_point'¶
-
-
class
pandagg.mapping.
GeoShape
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For complex shapes like polygons
-
KEY
= 'geo_shape'¶
-
-
class
pandagg.mapping.
IP
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
for IPv4 and IPv6 addresses
-
KEY
= 'IP'¶
-
-
class
pandagg.mapping.
Completion
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To provide auto-complete suggestions
-
KEY
= 'completion'¶
-
-
class
pandagg.mapping.
TokenCount
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To count the number of tokens in a string
-
KEY
= 'token_count'¶
-
-
class
pandagg.mapping.
MapperMurMur3
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To compute hashes of values at index-time and store them in the index
-
KEY
= 'murmur3'¶
-
-
class
pandagg.mapping.
MapperAnnotatedText
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
To index text containing special markup (typically used for identifying named entities)
-
KEY
= 'annotated-text'¶
-
-
class
pandagg.mapping.
Percolator
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Accepts queries from the query-dsl
-
KEY
= 'percolator'¶
-
-
class
pandagg.mapping.
Join
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Defines parent/child relation for documents within the same index
-
KEY
= 'join'¶
-
-
class
pandagg.mapping.
RankFeature
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record numeric feature to boost hits at query time.
-
KEY
= 'rank_feature'¶
-
-
class
pandagg.mapping.
RankFeatures
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record numeric features to boost hits at query time.
-
KEY
= 'rank_features'¶
-
-
class
pandagg.mapping.
DenseVector
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record dense vectors of float values.
-
KEY
= 'dense_vector'¶
-
-
class
pandagg.mapping.
SparseVector
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Record sparse vectors of float values.
-
KEY
= 'sparse_vector'¶
-
-
class
pandagg.mapping.
SearchAsYouType
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
A text-like field optimized for queries to implement as-you-type completion
-
KEY
= 'search_as_you_type'¶
-
-
class
pandagg.mapping.
Alias
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Defines an alias to an existing field.
-
KEY
= 'alias'¶
-
-
class
pandagg.mapping.
Flattened
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
Allows an entire JSON object to be indexed as a single field.
-
KEY
= 'flattened'¶
-
-
class
pandagg.mapping.
Shape
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For arbitrary cartesian geometries.
-
KEY
= 'shape'¶
-
-
class
pandagg.mapping.
Histogram
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedRegularField
For pre-aggregated numerical values for percentiles aggregations.
-
KEY
= 'histogram'¶
-
-
class
pandagg.mapping.
Index
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The index to which the document belongs.
-
KEY
= '_index'¶
-
-
class
pandagg.mapping.
Type
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The document’s mapping type.
-
KEY
= '_type'¶
-
-
class
pandagg.mapping.
Id
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The document’s ID.
-
KEY
= '_id'¶
-
-
class
pandagg.mapping.
FieldNames
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
All fields in the document which contain non-null values.
-
KEY
= '_field_names'¶
-
-
class
pandagg.mapping.
Source
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The original JSON representing the body of the document.
-
KEY
= '_source'¶
-
-
class
pandagg.mapping.
Size
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
The size of the _source field in bytes, provided by the mapper-size plugin.
-
KEY
= '_size'¶
-
-
class
pandagg.mapping.
Ignored
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
All fields in the document that have been ignored at index time because of ignore_malformed.
-
KEY
= '_ignored'¶
-
-
class
pandagg.mapping.
Routing
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
A custom routing value which routes a document to a particular shard.
-
KEY
= '_routing'¶
-
-
class
pandagg.mapping.
Meta
(**body)[source]¶ Bases:
pandagg.node.mapping.abstract.UnnamedField
Application specific metadata.
-
KEY
= '_meta'¶
-
pandagg.query module¶
-
class
pandagg.query.
Query
(*args, **kwargs)[source]¶ Bases:
pandagg.tree._tree.Tree
Combination of query clauses.
Mapping declaration is optional, but doing so validates query validity and automatically inserts nested clauses when necessary.
Keyword Arguments: - mapping (
dict
orpandagg.tree.mapping.Mapping
) – Mapping of requested indice(s). Providing it will add validation features, and add required nested clauses if missing. - nested_autocorrect (
bool
) – In case of missing nested clauses in query, if True, automatically add missing nested clauses, else raise error. - remaining kwargs: Used as body in query clauses.
-
KEY
= None¶
-
node_class
¶
-
query
(*args, **kwargs)[source]¶ Insert new clause(s) in current query.
Inserted clause can accepts following syntaxes.
Given an empty query:
>>> from pandagg.query import Query >>> q = Query()
flat syntax: clause type, followed by query clause body as keyword arguments:
>>> q.query('term', some_field=23) {'term': {'some_field': 23}}
from regular Elasticsearch dict query:
>>> q.query({'term': {'some_field': 23}}) {'term': {'some_field': 23}}
using pandagg DSL:
>>> from pandagg.query import Term >>> q.query(Term(field=23)) {'term': {'some_field': 23}}
Keyword Arguments: - parent (
str
) – named query clause under which the inserted clauses should be placed. - parent_param (
str
optional parameter when using parent param) – parameter under which inserted clauses will be placed. For instance if parent clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - child (
str
) – named query clause above which the inserted clauses should be placed. - child_param (
str
optional parameter when using parent param) – parameter of inserted boolean clause under which child clauses will be placed. For instance if inserted clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - mode (
str
one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.- ‘add’ (default) : adds new clauses keeping initial ones
- ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
- ‘replace_all’ : existing compound clause is completely replaced by the new one
- parent (
-
show
(*args, **kwargs)[source]¶ Return tree structure in hierarchy style.
Parameters: - nid – Node identifier from which tree traversal will start. If None tree root will be used
- filter_ – filter function performed on nodes. Nodes excluded from filter function nor their children won’t be displayed
- reverse – the
reverse
param for sortingNode
objects in the same level - key – key used to order nodes of same parent
- reverse – reverse parameter applied at sorting
- line_type – display type choice
- limit – int, truncate tree display to this number of lines
- kwargs – kwargs params passed to node
line_repr
method
Return type: unicode in python2, str in python3
- mapping (
-
class
pandagg.query.
Exists
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'exists'¶
-
-
class
pandagg.query.
Fuzzy
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'fuzzy'¶
-
-
class
pandagg.query.
Ids
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'ids'¶
-
-
class
pandagg.query.
Prefix
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'prefix'¶
-
-
class
pandagg.query.
Range
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'range'¶
-
-
class
pandagg.query.
Regexp
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'regexp'¶
-
-
class
pandagg.query.
Term
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'term'¶
-
-
class
pandagg.query.
Terms
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'terms'¶
-
-
class
pandagg.query.
TermsSet
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'terms_set'¶
-
-
class
pandagg.query.
Type
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'type'¶
-
-
class
pandagg.query.
Wildcard
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'wildcard'¶
-
-
class
pandagg.query.
Intervals
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'intervals'¶
-
-
class
pandagg.query.
Match
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match'¶
-
-
class
pandagg.query.
MatchBoolPrefix
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match_bool_prefix'¶
-
-
class
pandagg.query.
MatchPhrase
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match_phrase'¶
-
-
class
pandagg.query.
MatchPhrasePrefix
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'match_phrase_prefix'¶
-
-
class
pandagg.query.
MultiMatch
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'multi_match'¶
-
-
class
pandagg.query.
Common
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'common'¶
-
-
class
pandagg.query.
QueryString
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'query_string'¶
-
-
class
pandagg.query.
SimpleQueryString
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'simple_string'¶
-
-
class
pandagg.query.
Bool
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'bool'¶
-
-
class
pandagg.query.
Boosting
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'boosting'¶
-
-
class
pandagg.query.
ConstantScore
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'constant_score'¶
-
-
class
pandagg.query.
FunctionScore
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'function_score'¶
-
-
class
pandagg.query.
DisMax
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'dis_max'¶
-
-
class
pandagg.query.
Nested
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'nested'¶
-
-
class
pandagg.query.
HasParent
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'has_parent'¶
-
-
class
pandagg.query.
HasChild
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'has_child'¶
-
-
class
pandagg.query.
ParentId
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'parent_id'¶
-
-
class
pandagg.query.
Shape
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'shape'¶
-
-
class
pandagg.query.
GeoShape
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_shape'¶
-
-
class
pandagg.query.
GeoPolygone
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_polygon'¶
-
-
class
pandagg.query.
GeoDistance
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_distance'¶
-
-
class
pandagg.query.
GeoBoundingBox
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'geo_bounding_box'¶
-
-
class
pandagg.query.
DistanceFeature
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'distance_feature'¶
-
-
class
pandagg.query.
MoreLikeThis
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'more_like_this'¶
-
-
class
pandagg.query.
Percolate
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'percolate'¶
-
-
class
pandagg.query.
RankFeature
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'rank_feature'¶
-
-
class
pandagg.query.
Script
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'script'¶
-
-
class
pandagg.query.
Wrapper
(*args, **kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Leaf
-
KEY
= 'wrapper'¶
-
-
class
pandagg.query.
ScriptScore
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'script_score'¶
-
-
class
pandagg.query.
PinnedQuery
(**kwargs)[source]¶ Bases:
pandagg.tree.query.abstract.Compound
-
KEY
= 'pinned'¶
-
pandagg.response module¶
-
class
pandagg.response.
Aggregations
(data, aggs, query, index, client)[source]¶ Bases:
object
-
serialize
(output='tabular', **kwargs)[source]¶ Parameters: - output – output format, one of “raw”, “tree”, “interactive_tree”, “normalized”, “tabular”, “dataframe”
- kwargs – tabular serialization kwargs
Returns:
-
to_tabular
(index_orient=True, grouped_by=None, expand_columns=True, expand_sep='|', normalize=True, with_single_bucket_groups=False)[source]¶ Build tabular view of ES response grouping levels (rows) until ‘grouped_by’ aggregation node included is reached, and using children aggregations of grouping level as values for each of generated groups (columns).
Suppose an aggregation of this shape (A & B bucket aggregations):
A──> B──> C1 ├──> C2 └──> C3
With grouped_by=’B’, breakdown ElasticSearch response (tree structure), into a tabular structure of this shape:
C1 C2 C3 A B wood blue 10 4 0 red 7 5 2 steel blue 1 9 0 red 23 4 2
Parameters: - index_orient – if True, level-key samples are returned as tuples, else in a dictionnary
- grouped_by – name of the aggregation node used as last grouping level
- normalize – if True, normalize columns buckets
Returns: index, index_names, values
-
pandagg.search module¶
-
class
pandagg.search.
MultiSearch
(**kwargs)[source]¶ Bases:
pandagg.search.Request
Combine multiple
Search
objects into a single request.
-
class
pandagg.search.
Request
(using, index=None)[source]¶ Bases:
object
-
index
(*index)[source]¶ Set the index for the search. If called empty it will remove all information.
Example:
s = Search() s = s.index(‘twitter-2015.01.01’, ‘twitter-2015.01.02’) s = s.index([‘twitter-2015.01.01’, ‘twitter-2015.01.02’])
-
params
(**kwargs)[source]¶ Specify query params to be used when executing the search. All the keyword arguments will override the current values. See https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search for all available parameters.
Example:
s = Search() s = s.params(routing='user-1', preference='local')
-
-
class
pandagg.search.
Search
(using=None, index=None, mapping=None, nested_autocorrect=False, repr_auto_execute=False)[source]¶ Bases:
pandagg.search.Request
-
aggs
(*args, **kwargs)[source]¶ Arrange passed aggregations “horizontally”.
Given the initial aggregation:
A──> B └──> C
If passing multiple aggregations with insert_below = ‘A’:
A──> B └──> C └──> new1 └──> new2
Note: those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity:
OK:
A──> B ─> C ─> new
KO:
A──> B └──> C
args accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
count
()[source]¶ Return the number of hits matching the query and filters. Note that only the actual number is returned.
-
classmethod
from_dict
(d)[source]¶ Construct a new Search instance from a raw dict containing the search body. Useful when migrating from raw dictionaries.
Example:
s = Search.from_dict({ "query": { "bool": { "must": [...] } }, "aggs": {...} }) s = s.filter('term', published=True)
-
groupby
(*args, **kwargs)[source]¶ Arrange passed aggregations in vertical/nested manner, above or below another agg clause.
Given the initial aggregation:
A──> B └──> C
If insert_below = ‘A’:
A──> new──> B └──> C
If insert_above = ‘B’:
A──> new──> B └──> C
by argument accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children:
A──> B : OK generates A──> B ─> C ─> by A──> B : KO, ambiguous, must precise either A, B or C └──> C
Accepted all Aggs.__init__ syntaxes
>>> Aggs()\ >>> .groupby('terms', name='per_user_id', field='user_id') {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Passing a dict:
>>> Aggs().groupby({"terms_on_my_field":{"terms":{"field":"some_field"}}}) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Using DSL class:
>>> from pandagg.aggs import Terms >>> Aggs().groupby(Terms('terms_on_my_field', field='some_field')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Shortcut syntax for terms aggregation: creates a terms aggregation, using field as aggregation name
>>> Aggs().groupby('some_field') {"some_field":{"terms":{"field":"some_field"}}}
Using a Aggs object:
>>> Aggs().groupby(Aggs('per_user_id', 'terms', field='user_id')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Accepted declarations for multiple aggregations:
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - insert_above (
string
) – Aggregation name above which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
highlight
(*fields, **kwargs)[source]¶ Request highlighting of some fields. All keyword arguments passed in will be used as parameters for all the fields in the
fields
parameter. Example:Search().highlight('title', 'body', fragment_size=50)
will produce the equivalent of:
{ "highlight": { "fields": { "body": {"fragment_size": 50}, "title": {"fragment_size": 50} } } }
If you want to have different options for different fields you can call
highlight
twice:Search().highlight('title', fragment_size=50).highlight('body', fragment_size=100)
which will produce:
{ "highlight": { "fields": { "body": {"fragment_size": 100}, "title": {"fragment_size": 50} } } }
-
highlight_options
(**kwargs)[source]¶ Update the global highlighting options used for this request. For example:
s = Search() s = s.highlight_options(order='score')
-
query
(*args, **kwargs)[source]¶ Insert new clause(s) in current query.
Inserted clause can accepts following syntaxes.
Given an empty query:
>>> from pandagg.query import Query >>> q = Query()
flat syntax: clause type, followed by query clause body as keyword arguments:
>>> q.query('term', some_field=23) {'term': {'some_field': 23}}
from regular Elasticsearch dict query:
>>> q.query({'term': {'some_field': 23}}) {'term': {'some_field': 23}}
using pandagg DSL:
>>> from pandagg.query import Term >>> q.query(Term(field=23)) {'term': {'some_field': 23}}
Keyword Arguments: - parent (
str
) – named query clause under which the inserted clauses should be placed. - parent_param (
str
optional parameter when using parent param) – parameter under which inserted clauses will be placed. For instance if parent clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - child (
str
) – named query clause above which the inserted clauses should be placed. - child_param (
str
optional parameter when using parent param) – parameter of inserted boolean clause under which child clauses will be placed. For instance if inserted clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - mode (
str
one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.- ‘add’ (default) : adds new clauses keeping initial ones
- ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
- ‘replace_all’ : existing compound clause is completely replaced by the new one
- parent (
-
scan
()[source]¶ Turn the search into a scan search and return a generator that will iterate over all the documents matching the query.
Use
params
method to specify any additional arguments you with to pass to the underlyingscan
helper fromelasticsearch-py
- https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan
-
script_fields
(**kwargs)[source]¶ Define script fields to be calculated on hits. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html for more details.
Example:
s = Search() s = s.script_fields(times_two="doc['field'].value * 2") s = s.script_fields( times_three={ 'script': { 'inline': "doc['field'].value * params.n", 'params': {'n': 3} } } )
-
sort
(*keys)[source]¶ Add sorting information to the search request. If called without arguments it will remove all sort requirements. Otherwise it will replace them. Acceptable arguments are:
'some.field' '-some.other.field' {'different.field': {'any': 'dict'}}
so for example:
s = Search().sort( 'category', '-title', {"price" : {"order" : "asc", "mode" : "avg"}} )
will sort by
category
,title
(in descending order) andprice
in ascending order using theavg
mode.The API returns a copy of the Search object and can thus be chained.
-
source
(fields=None, **kwargs)[source]¶ Selectively control how the _source field is returned.
Parameters: fields – wildcard string, array of wildcards, or dictionary of includes and excludes If
fields
is None, the entire document will be returned for each hit. If fields is a dictionary with keys of ‘includes’ and/or ‘excludes’ the fields will be either included or excluded appropriately.Calling this multiple times with the same named parameter will override the previous values with the new ones.
Example:
s = Search() s = s.source(includes=['obj1.*'], excludes=["*.description"]) s = Search() s = s.source(includes=['obj1.*']).source(excludes=["*.description"])
-
suggest
(name, text, **kwargs)[source]¶ Add a suggestions request to the search.
Parameters: - name – name of the suggestion
- text – text to suggest on
All keyword arguments will be added to the suggestions body. For example:
s = Search() s = s.suggest('suggestion-1', 'Elasticsearch', term={'field': 'body'})
-
to_dict
(count=False, **kwargs)[source]¶ Serialize the search into the dictionary that will be sent over as the request’s body.
Parameters: count – a flag to specify if we are interested in a body for count - no aggregations, no pagination bounds etc. All additional keyword arguments will be included into the dictionary.
-
pandagg.utils module¶
-
class
pandagg.utils.
DslMeta
(name, bases, attrs)[source]¶ Bases:
type
Base Metaclass for DslBase subclasses that builds a registry of all classes for given DslBase subclass (== all the query types for the Query subclass of DslBase).
It then uses the information from that registry (as well as name and deserializer attributes from the base class) to construct any subclass based on it’s name.
Module contents¶
Contributing to Pandagg¶
We want to make contributing to this project as easy and transparent as possible.
Our Development Process¶
We use github to host code, to track issues and feature requests, as well as accept pull requests.
Pull Requests¶
We actively welcome your pull requests.
- Fork the repo and create your branch from
master
. - If you’ve added code that should be tested, add tests.
- If you’ve changed APIs, update the documentation.
- Ensure the test suite passes.
- Make sure your code lints.
Any contributions you make will be under the MIT Software License¶
In short, when you submit code changes, your submissions are understood to be under the same MIT License that covers the project. Feel free to contact the maintainers if that’s a concern.
Issues¶
We use GitHub issues to track public bugs. Please ensure your description is clear and has sufficient instructions to be able to reproduce the issue.
Report bugs using Github’s issues¶
We use GitHub issues to track public bugs. Report a bug by opening a new issue; it’s that easy!
Write bug reports with detail, background, and sample code¶
Great Bug Reports tend to have:
- A quick summary and/or background
- Steps to reproduce
- Be specific!
- Give sample code if you can.
- What you expected would happen
- What actually happens
- Notes (possibly including why you think this might be happening, or stuff you tried that didn’t work)
License¶
By contributing, you agree that your contributions will be licensed under its MIT License.
References¶
This document was adapted from the open-source contribution guidelines of briandk’s gist
pandagg is a Python package providing a simple interface to manipulate ElasticSearch queries and aggregations. It brings the following features:
- flexible aggregation and search queries declaration
- query validation based on provided mapping
- parsing of aggregation results in handy format: interactive bucket tree, normalized tree or tabular breakdown
- mapping interactive navigation
Installing¶
pandagg can be installed with pip:
$ pip install pandagg
Alternatively, you can grab the latest source code from GitHub:
$ git clone git://github.com/alkemics/pandagg.git
$ python setup.py install
Usage¶
The User Guide is the place to go to learn how to use the library.
An example based on publicly available IMDB data is documented in repository examples/imdb directory, with a jupyter notebook to showcase some of pandagg functionalities: here it is.
The pandagg package documentation provides API-level documentation.
License¶
pandagg is made available under the Apache 2.0 License. For more details, see LICENSE.txt.
Contributing¶
We happily welcome contributions, please see Contributing to Pandagg for details.