pandagg

Principles

This library focuses on two principles:

  • stick to the tree structure of Elasticsearch objects
  • provide simple and flexible interfaces to make it easy and intuitive to use in an interactive usage

Elasticsearch tree structures

Many Elasticsearch objects have a tree structure, ie they are built from a hierarchy of nodes:

  • a mappings (tree) is a hierarchy of fields (nodes)
  • a query (tree) is a hierarchy of query clauses (nodes)
  • an aggregation (tree) is a hierarchy of aggregation clauses (nodes)
  • an aggregation response (tree) is a hierarchy of response buckets (nodes)

This library sticks to that structure by providing a flexible syntax distinguishing trees and nodes, trees all inherit from lighttree.Tree class, whereas nodes all inherit from lighttree.Node class.

Interactive usage

pandagg is designed for both for “regular” code repository usage, and “interactive” usage (ipython or jupyter notebook usage with autocompletion features inspired by pandas design).

Some classes are not intended to be used elsewhere than in interactive mode (ipython), since their purpose is to serve auto-completion features and convenient representations.

Namely:

  • IMapping: used to interactively navigate in mapping and run quick aggregations on some fields
  • IResponse: used to interactively navigate in an aggregation response

These use case will be detailed in following sections.

User Guide

pandagg library provides interfaces to perform read operations on cluster.

Query

The Query class provides :

  • multiple syntaxes to declare and udpate a query
  • query validation (with nested clauses validation)
  • ability to insert clauses at specific points
  • tree-like visual representation

Declaration

From native “dict” query

Given the following query:

>>> expected_query = {'bool': {'must': [
>>>    {'terms': {'genres': ['Action', 'Thriller']}},
>>>    {'range': {'rank': {'gte': 7}}},
>>>    {'nested': {
>>>        'path': 'roles',
>>>        'query': {'bool': {'must': [
>>>            {'term': {'roles.gender': {'value': 'F'}}},
>>>            {'term': {'roles.role': {'value': 'Reporter'}}}]}
>>>         }
>>>    }}
>>> ]}}

To instantiate Query, simply pass “dict” query as argument:

>>> from pandagg.query import Query
>>> q = Query(expected_query)

A visual representation of the query is available with show():

>>> q.show()
<Query>
bool
└── must
    ├── nested, path="roles"
    │   └── query
    │       └── bool
    │           └── must
    │               ├── term, field=roles.gender, value="F"
    │               └── term, field=roles.role, value="Reporter"
    ├── range, field=rank, gte=7
    └── terms, genres=["Action", "Thriller"]

Call to_dict() to convert it to native dict:

>>> q.to_dict()
{'bool': {
    'must': [
        {'range': {'rank': {'gte': 7}}},
        {'terms': {'genres': ['Action', 'Thriller']}},
        {'bool': {'must': [
            {'term': {'roles.role': {'value': 'Reporter'}}},
            {'term': {'roles.gender': {'value': 'F'}}}]}}}}
        ]}
    ]
}}
>>> from pandagg.utils import equal_queries
>>> equal_queries(q.to_dict(), expected_query)
True

Note

equal_queries function won’t consider order of clauses in must/should parameters since it actually doesn’t matter in Elasticsearch execution, ie

>>> equal_queries({'must': [A, B]}, {'must': [B, A]})
True
With DSL classes

Pandagg provides a DSL to declare this query in a quite similar fashion:

>>> from pandagg.query import Nested, Bool, Range, Term, Terms
>>> q = Bool(must=[
>>>     Terms(genres=['Action', 'Thriller']),
>>>     Range(rank={"gte": 7}),
>>>     Nested(
>>>         path='roles',
>>>         query=Bool(must=[
>>>             Term(roles__gender='F'),
>>>             Term(roles__role='Reporter')
>>>         ])
>>>     )
>>> ])

All these classes inherit from Query and thus provide the same interface.

>>> from pandagg.query import Query
>>> isinstance(q, Query)
True
With flattened syntax

In the flattened syntax, the query clause type is used as first argument:

>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])

Query enrichment

All methods described below return a new Query instance, and keep unchanged the initial query.

For instance:

>>> from pandagg.query import Query
>>> initial_q = Query()
>>> enriched_q = initial_q.query('terms', genres=['Comedy', 'Short'])
>>> initial_q.to_dict()
None
>>> enriched_q.to_dict()
{'terms': {'genres': ['Comedy', 'Short']}}

Note

Calling to_dict() on an empty Query returns None

>>> from pandagg.query import Query
>>> Query().to_dict()
None
query() method

The base method to enrich a Query is query().

Considering this query:

>>> from pandagg.query import Query
>>> q = Query()

query() accepts following syntaxes:

from dictionnary:

>>> q.query({"terms": {"genres": ['Comedy', 'Short']})

flattened syntax:

>>> q.query("terms", genres=['Comedy', 'Short'])

from Query instance (this includes DSL classes):

>>> from pandagg.query import Terms
>>> q.query(Terms(genres=['Action', 'Thriller']))
Compound clauses specific methods

Query instance also exposes following methods for specific compound queries:

(TODO: detail allowed syntaxes)

Specific to bool queries:

  • bool()
  • filter()
  • must()
  • must_not()
  • should()

Specific to other compound queries:

  • nested()
  • constant_score()
  • dis_max()
  • function_score()
  • has_child()
  • has_parent()
  • parent_id()
  • pinned_query()
  • script_score()
  • boost()
Inserted clause location

On all insertion methods detailed above, by default, the inserted clause is placed at the top level of your query, and generates a bool clause if necessary.

Considering the following query:

>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])
>>> q.show()
<Query>
terms, genres=["Action", "Thriller"]

A bool query will be created:

>>> q = q.query('range', rank={"gte": 7})
>>> q.show()
<Query>
bool
└── must
    ├── range, field=rank, gte=7
    └── terms, genres=["Action", "Thriller"]

And reused if necessary:

>>> q = q.must_not('range', year={"lte": 1970})
>>> q.show()
<Query>
bool
├── must
│   ├── range, field=rank, gte=7
│   └── terms, genres=["Action", "Thriller"]
└── must_not
    └── range, field=year, lte=1970

Specifying a specific location requires to name queries :

>>> from pandagg.query import Nested
>>> q = q.nested(path='roles', _name='nested_roles', query=Term('roles.gender', value='F'))
>>> q.show()
<Query>
bool
├── must
│   ├── nested, _name=nested_roles, path="roles"
│   │   └── query
│   │       └── term, field=roles.gender, value="F"
│   ├── range, field=rank, gte=7
│   └── terms, genres=["Action", "Thriller"]
└── must_not
    └── range, field=year, lte=1970

Doing so allows to insert clauses above/below given clause using parent/child parameters:

>>> q = q.query('term', roles__role='Reporter', parent='nested_roles')
>>> q.show()
<Query>
bool
├── must
│   ├── nested, _name=nested_roles, path="roles"
│   │   └── query
│   │       └── bool
│   │           └── must
│   │               ├── term, field=roles.role, value="Reporter"
│   │               └── term, field=roles.gender, value="F"
│   ├── range, field=rank, gte=7
│   └── terms, genres=["Action", "Thriller"]
└── must_not
    └── range, field=year, lte=1970

TODO: explain parent_param, child_param, mode merging strategies on same named clause etc..

Aggregation

The Aggs class provides :

  • multiple syntaxes to declare and udpate a aggregation
  • aggregation clause validation
  • ability to insert clauses at specific locations (and not just below last manipulated clause)

Declaration

From native “dict” query

Given the following aggregation:

>>> expected_aggs = {
>>>   "decade": {
>>>     "histogram": {"field": "year", "interval": 10},
>>>     "aggs": {
>>>       "genres": {
>>>         "terms": {"field": "genres", "size": 3},
>>>         "aggs": {
>>>           "max_nb_roles": {
>>>             "max": {"field": "nb_roles"}
>>>           },
>>>           "avg_rank": {
>>>             "avg": {"field": "rank"}
>>>           }
>>>         }
>>>       }
>>>     }
>>>   }
>>> }

To declare Aggs, simply pass “dict” query as argument:

>>> from pandagg.agg import Aggs
>>> a = Aggs(expected_aggs)

A visual representation of the query is available with show():

>>> a.show()
<Aggregations>
decade                                         <histogram, field="year", interval=10>
└── genres                                            <terms, field="genres", size=3>
    ├── max_nb_roles                                          <max, field="nb_roles">
    └── avg_rank                                                  <avg, field="rank">

Call to_dict() to convert it to native dict:

>>> a.to_dict() == expected_aggs
True
With DSL classes

Pandagg provides a DSL to declare this query in a quite similar fashion:

>>> from pandagg.agg import Histogram, Terms, Max, Avg
>>>
>>> a = Histogram("decade", field='year', interval=10, aggs=[
>>>     Terms("genres", field="genres", size=3, aggs=[
>>>         Max("max_nb_roles", field="nb_roles"),
>>>         Avg("avg_rank", field="range")
>>>     ]),
>>> ])

All these classes inherit from Aggs and thus provide the same interface.

>>> from pandagg.agg import Aggs
>>> isinstance(a, Aggs)
True
With flattened syntax

In the flattened syntax, the first argument is the aggregation name, the second argument is the aggregation type, the following keyword arguments define the aggregation body:

>>> from pandagg.query import Aggs
>>> a = Aggs('genres', 'terms', size=3)
>>> a.to_dict()
{'genres': {'terms': {'field': 'genres', 'size': 3}}}

Aggregations enrichment

Aggregations can be enriched using two methods:

  • aggs()
  • groupby()

Both methods return a new Aggs instance, and keep unchanged the initial Aggregation.

For instance:

>>> from pandagg.aggs import Aggs
>>> initial_a = Aggs()
>>> enriched_a = initial_a.agg('genres_agg', 'terms', field='genres')
>>> initial_q.to_dict()
None
>>> enriched_q.to_dict()
{'genres_agg': {'terms': {'field': 'genres'}}}

Note

Calling to_dict() on an empty Aggregation returns None

>>> from pandagg.agg import Aggs
        >>> Aggs().to_dict()
        None

TODO >>> Aggs().to_dict() None

TODO

Response

When executing a search request via execute() method of Search, a Response instance is returned.

>>> from elasticsearch import Elasticsearch
>>> from pandagg.search import Search
>>>
>>> client = ElasticSearch(hosts=['localhost:9200'])
>>> response = Search(using=client, index='movies')\
>>>     .size(2)\
>>>     .filter('term', genres='Documentary')\
>>>     .agg('avg_rank', 'avg', field='rank')\
>>>     .execute()
>>> response
<Response> took 9ms, success: True, total result >=10000, contains 2 hits
>>> response.__class__
pandagg.response.Response

ElasticSearch raw dict response is available under data attribute:

>>> response.data
{
    'took': 9, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
    'hits': {'total': {'value': 10000, 'relation': 'gte'},
    'max_score': 0.0,
    'hits': [{'_index': 'movies', ...}],
    'aggregations': {'avg_rank': {'value': 6.496829211219546}}
}

Hits

Hits are available under hits attribute:

>>> response.hits
<Hits> total: >10000, contains 2 hits
>>> response.hits.total
{'value': 10000, 'relation': 'gte'}
>>> response.hits.hits
[<Hit 642> score=0.00, <Hit 643> score=0.00]

Those hits are instances of Hit.

Directly iterating over Response will return those hits:

>>> list(response)
[<Hit 642> score=0.00, <Hit 643> score=0.00]
>>> hit = next(iter(response))

Each hit contains the raw dict under data attribute:

>>> hit.data
{'_index': 'movies',
 '_type': '_doc',
 '_id': '642',
 '_score': 0.0,
 '_source': {'movie_id': 642,
  'name': '10 Tage in Calcutta',
  'year': 1984,
  'genres': ['Documentary'],
  'roles': None,
  'nb_roles': 0,
  'directors': [{'director_id': 33096,
    'first_name': 'Reinhard',
    'last_name': 'Hauff',
    'full_name': 'Reinhard Hauff',
    'genres': ['Documentary', 'Drama', 'Musical', 'Short']}],
  'nb_directors': 1,
  'rank': None}}
>>> hit._index
'movies'
>>> hit._source
{'movie_id': 642,
 'name': '10 Tage in Calcutta',
 'year': 1984,
 'genres': ['Documentary'],
 'roles': None,
 'nb_roles': 0,
 'directors': [{'director_id': 33096,
   'first_name': 'Reinhard',
   'last_name': 'Hauff',
   'full_name': 'Reinhard Hauff',
   'genres': ['Documentary', 'Drama', 'Musical', 'Short']}],
 'nb_directors': 1,
 'rank': None}

If pandas dependency is installed, hits can be parsed as a dataframe:

>>> hits.to_dataframe()
     _index  _score _type                                                                                                                                                        directors         genres  movie_id                       name  nb_directors  nb_roles  rank roles  year
_id
642  movies     0.0  _doc  [{'director_id': 33096, 'first_name': 'Reinhard', 'last_name': 'Hauff', 'full_name': 'Reinhard Hauff', 'genres': ['Documentary', 'Drama', 'Musical', 'Short']}]  [Documentary]       642        10 Tage in Calcutta             1         0  None  None  1984
643  movies     0.0  _doc                               [{'director_id': 32148, 'first_name': 'Tanja', 'last_name': 'Hamilton', 'full_name': 'Tanja Hamilton', 'genres': ['Documentary']}]  [Documentary]       643  10 Tage, ein ganzes Leben             1         0  None  None  2004

Aggregations

Aggregations are handled differently, the aggregations attribute of a Response returns a Aggregations instance, that provides specific parsing abilities in addition to exposing raw aggregations response under data attribute.

Let’s build a bit more complex aggregation query to showcase its functionalities:

>>> from elasticsearch import Elasticsearch
>>> from pandagg.search import Search
>>>
>>> client = Elasticsearch(hosts=['localhost:9200'])
>>> response = Search(using=client, index='movies')\
>>>     .size(0)\
>>>     .groupby('decade', 'histogram', interval=10, field='year')\
>>>     .groupby('genres', size=3)\
>>>     .agg('avg_rank', 'avg', field='rank')\
>>>     .aggs('avg_nb_roles', 'avg', field='nb_roles')\
>>>     .filter('range', year={"gte": 1990})\
>>>     .execute()

Note

for more details about how to build aggregation query, consult Aggregation section

Using data attribute:

>>> response.aggregations.data
{'decade': {'buckets': [{'key': 1990.0,
'doc_count': 79495,
'genres': {'doc_count_error_upper_bound': 0,
 'sum_other_doc_count': 38060,
 'buckets': [{'key': 'Drama',
   'doc_count': 12232,
   'avg_nb_roles': {'value': 18.518067364290385},
   'avg_rank': {'value': 5.981429367965072}},
  {'key': 'Short',
...
Tree serialization

Using to_normalized():

>>> response.aggregations.to_normalized()
{'level': 'root',
 'key': None,
 'value': None,
 'children': [{'level': 'decade',
   'key': 1990.0,
   'value': 79495,
   'children': [{'level': 'genres',
     'key': 'Drama',
     'value': 12232,
     'children': [{'level': 'avg_rank',
       'key': None,
       'value': 5.981429367965072},
      {'level': 'avg_nb_roles', 'key': None, 'value': 18.518067364290385}]},
    {'level': 'genres',
     'key': 'Short',
     'value': 12197,
     'children': [{'level': 'avg_rank',
       'key': None,
       'value': 6.311325829450123},
    ...

Using to_interactive_tree():

>>> response.aggregations.to_interactive_tree()
<IResponse>
root
├── decade=1990                                        79495
│   ├── genres=Documentary                              8393
│   │   ├── avg_nb_roles                  3.7789824854045038
│   │   └── avg_rank                       6.517093241977517
│   ├── genres=Drama                                   12232
│   │   ├── avg_nb_roles                  18.518067364290385
│   │   └── avg_rank                       5.981429367965072
│   └── genres=Short                                   12197
│       ├── avg_nb_roles                   3.023284414200213
│       └── avg_rank                       6.311325829450123
└── decade=2000                                        57649
    ├── genres=Documentary                              8639
    │   ├── avg_nb_roles                   5.581433036231045
    │   └── avg_rank                       6.980897812811443
    ├── genres=Drama                                   11500
    │   ├── avg_nb_roles                  14.385391304347825
    │   └── avg_rank                       6.269675415719865
    └── genres=Short                                   13451
        ├── avg_nb_roles                   4.053081555274701
        └── avg_rank                        6.83625304327684
Tabular serialization

Doing so requires to identify a level that will draw the line between:

  • grouping levels: those which will be used to identify rows (here decades, and genres), and provide doc_count per row
  • columns levels: those which will be used to populate columns and cells (here avg_nb_roles and avg_rank)

The tabular format will suit especially well aggregations with a T shape.

Using to_dataframe():

>>> response.aggregations.to_dataframe()
                        avg_nb_roles  avg_rank  doc_count
decade genres
1990.0 Drama           18.518067  5.981429      12232
       Short            3.023284  6.311326      12197
       Documentary      3.778982  6.517093       8393
2000.0 Short            4.053082  6.836253      13451
       Drama           14.385391  6.269675      11500
       Documentary      5.581433  6.980898       8639

Using to_tabular():

>>> response.aggregations.to_tabular()
(['decade', 'genres'],
 {(1990.0, 'Drama'): {'doc_count': 12232,
   'avg_rank': 5.981429367965072,
   'avg_nb_roles': 18.518067364290385},
  (1990.0, 'Short'): {'doc_count': 12197,
   'avg_rank': 6.311325829450123,
   'avg_nb_roles': 3.023284414200213},
  (1990.0, 'Documentary'): {'doc_count': 8393,
   'avg_rank': 6.517093241977517,
   'avg_nb_roles': 3.7789824854045038},
  (2000.0, 'Short'): {'doc_count': 13451,
   'avg_rank': 6.83625304327684,
   'avg_nb_roles': 4.053081555274701},
  (2000.0, 'Drama'): {'doc_count': 11500,
   'avg_rank': 6.269675415719865,
   'avg_nb_roles': 14.385391304347825},
  (2000.0, 'Documentary'): {'doc_count': 8639,
   'avg_rank': 6.980897812811443,
   'avg_nb_roles': 5.581433036231045}})

Note

TODO - explain parameters:

  • index_orient
  • grouped_by
  • expand_columns
  • expand_sep
  • normalize
  • with_single_bucket_groups

Interactive features

Features described in this module are primarly designed for interactive usage, for instance in an ipython shell<https://ipython.org/>_, since one of the key features is the intuitive usage provided by auto-completion.

Cluster indices discovery

discover() function list all indices on a cluster matching a provided pattern:

>>> from elasticsearch import Elasticsearch
>>> from pandagg.discovery import discover
>>> client = Elasticsearch(hosts=['xxx'])
>>> indices = discover(client, index='mov*')
>>> indices
<Indices> ['movies', 'movies_fake']

Each of the indices is accessible via autocompletion:

>>> indices.movies
 <Index 'movies'>

An Index exposes: settings, mapping (interactive), aliases and name:

>>> movies = indices.movies
>>> movies.settings
{'index': {'creation_date': '1591824202943',
  'number_of_shards': '1',
  'number_of_replicas': '1',
  'uuid': 'v6Amj9x1Sk-trBShI-188A',
  'version': {'created': '7070199'},
  'provided_name': 'movies'}}
>>> movies.mapping
<Mapping>
_
├── directors                                                [Nested]
│   ├── director_id                                           Keyword
│   ├── first_name                                            Text
│   │   └── raw                                             ~ Keyword
│   ├── full_name                                             Text
│   │   └── raw                                             ~ Keyword
│   ├── genres                                                Keyword
│   └── last_name                                             Text
│       └── raw                                             ~ Keyword
├── genres                                                    Keyword
├── movie_id                                                  Keyword
├── name                                                      Text
│   └── raw                                                 ~ Keyword
├── nb_directors                                              Integer
├── nb_roles                                                  Integer
├── rank                                                      Float
├── roles                                                    [Nested]
│   ├── actor_id                                              Keyword
│   ├── first_name                                            Text
│   │   └── raw                                             ~ Keyword
│   ├── full_name                                             Text
│   │   └── raw                                             ~ Keyword
│   ├── gender                                                Keyword
│   ├── last_name                                             Text
│   │   └── raw                                             ~ Keyword
│   └── role                                                  Keyword
└── year                                                      Integer

Note

Examples will be based on IMDB dataset data.

Search class is intended to perform request (see Search)

>>> from pandagg.search import Search
>>>
>>> client = ElasticSearch(hosts=['localhost:9200'])
>>> search = Search(using=client, index='movies')\
>>>     .size(2)\
>>>     .groupby('decade', 'histogram', interval=10, field='year')\
>>>     .groupby('genres', size=3)\
>>>     .agg('avg_rank', 'avg', field='rank')\
>>>     .aggs('avg_nb_roles', 'avg', field='nb_roles')\
>>>     .filter('range', year={"gte": 1990})
>>> search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "year": {
              "gte": 1990
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "decade": {
      "histogram": {
        "field": "year",
        "interval": 10
      },
      "aggs": {
        "genres": {
          "terms": {
        ...
        ..truncated..
        ...
      }
    }
  },
  "size": 2
}

It relies on:

  • Query to build queries (see Query),

  • Aggs to build aggregations (see Aggregation)

    >>> search._query.show()
    <Query>
    bool
    └── filter
        └── range, field=year, gte=1990
    
    >>> search._aggs.show()
    <Aggregations>
    decade                                         <histogram, field="year", interval=10>
    └── genres                                            <terms, field="genres", size=3>
        ├── avg_nb_roles                                          <avg, field="nb_roles">
        └── avg_rank                                                  <avg, field="rank">
    

Executing a Search request using execute() will return a Response instance (see Response).

>>> response = search.execute()
>>> response
<Response> took 58ms, success: True, total result >=10000, contains 2 hits
>>> response.hits.hits
[<Hit 640> score=0.00, <Hit 641> score=0.00]
>>> response.aggregations.to_dataframe()
                        avg_nb_roles  avg_rank  doc_count
decade genres
1990.0 Drama           18.518067  5.981429      12232
       Short            3.023284  6.311326      12197
       Documentary      3.778982  6.517093       8393
2000.0 Short            4.053082  6.836253      13451
       Drama           14.385391  6.269675      11500
       Documentary      5.581433  6.980898       8639

On top of that some interactive features are available (see Interactive features).

IMDB dataset

You might know the Internet Movie Database, commonly called IMDB.

Well it’s a simple example to showcase some of Elasticsearch capabilities.

In this case, relational databases (SQL) are a good fit to store with consistence this kind of data. Yet indexing some of this data in a optimized search engine will allow more powerful queries.

Query requirements

In this example, we’ll suppose most usage/queries requirements will be around the concept of movie (rather than usages focused on fetching actors or directors, even though it will still be possible with this data structure).

The index should provide good performances trying to answer these kind question (non-exhaustive):

  • in which movies this actor played?
  • what movies genres were most popular among decades?
  • which actors have played in best-rated movies, or worst-rated movies?
  • which actors movies directors prefer to cast in their movies?
  • which are best ranked movies of last decade in Action or Documentary genres?

Data source

I exported following SQL tables from MariaDB following these instructions.

Relational schema is the following:

_images/imdb_ijs.svgimdb tables

Index mappings

Overview

The base unit (document) will be a movie, having a name, rank (ratings), year of release, a list of actors and a list of directors.

Schematically:

Movie:
 - name
 - year
 - rank
 - [] genres
 - [] directors
 - [] actor roles

Which fields require nesting?

Since genres contain a single keyword field, in no case we need it to be stored as a nested field. On the contrary, actor roles and directors require a nested field if we consider applying multiple simultanous query clauses on their sub-fields (for instance search movie in which actor is a woman AND whose role is nurse). More information on distinction between array and nested fields here.

Text or keyword fields?

Some fields are easy to choose, in no situation gender will require a full text search, thus we’ll store it as a keyword. On the other hand actors and directors names (first and last) will require full-text search, we’ll thus opt for a text field. Yet we might want to aggregate on exact keywords to count number of movies per actor for instance. More inforamtion on distinction between text and keyword fields here

Mappings

<Mappings>
_
├── directors                                                [Nested]
│   ├── director_id                                           Keyword
│   ├── first_name                                            Text
│   │   └── raw                                             ~ Keyword
│   ├── full_name                                             Text
│   │   └── raw                                             ~ Keyword
│   ├── genres                                                Keyword
│   └── last_name                                             Text
│       └── raw                                             ~ Keyword
├── genres                                                    Keyword
├── movie_id                                                  Keyword
├── name                                                      Text
│   └── raw                                                 ~ Keyword
├── nb_directors                                              Integer
├── nb_roles                                                  Integer
├── rank                                                      Float
├── roles                                                    [Nested]
│   ├── actor_id                                              Keyword
│   ├── first_name                                            Text
│   │   └── raw                                             ~ Keyword
│   ├── full_name                                             Text
│   │   └── raw                                             ~ Keyword
│   ├── gender                                                Keyword
│   ├── last_name                                             Text
│   │   └── raw                                             ~ Keyword
│   └── role                                                  Keyword
└── year                                                      Integer

Steps to start playing with your index

You can either directly use the demo index available here with credentials user: pandagg, password: pandagg:

Access it with following client instantiation:

from elasticsearch import Elasticsearch
client = Elasticsearch(
    hosts=['https://beba020ee88d49488d8f30c163472151.eu-west-2.aws.cloud.es.io:9243/'],
    http_auth=('pandagg', 'pandagg')
)

Or follow below steps to install it yourself locally. In this case, you can either generate yourself the files, or download them from here (file md5 b363dee23720052501e24d15361ed605).

Dump tables

Follow instruction on bottom of https://relational.fit.cvut.cz/dataset/IMDb page and dump following tables in a directory:

  • movies.csv
  • movies_genres.csv
  • movies_directors.csv
  • directors.csv
  • directors_genres.csv
  • roles.csv
  • actors.csv

Clone pandagg and setup environment

git clone git@github.com:alkemics/pandagg.git
cd pandagg

virtualenv env
python setup.py develop
pip install pandas simplejson jupyter seaborn

Then copy conf.py.dist file into conf.py and edit variables as suits you, for instance:

# your cluster address
ES_HOST = 'localhost:9200'

# where your table dumps are stored, and where serialized output will be written
DATA_DIR = '/path/to/dumps/'
OUTPUT_FILE_NAME = 'serialized.json'

Serialize movie documents and insert them

# generate serialized movies documents, ready to be inserted in ES
# can take a while
python examples/imdb/serialize.py

# create index with mappings if necessary, bulk insert documents in ES
python examples/imdb/load.py

Explore pandagg notebooks

An example notebook is available to showcase some of pandagg functionalities: here it is.

Code is present in examples/imdb/IMDB exploration.py file.

pandagg package

Subpackages

pandagg.interactive package

Submodules
pandagg.interactive.mappings module
class pandagg.interactive.mappings.IMappings(mappings: pandagg.tree.mappings.Mappings, client: Optional[elasticsearch.client.Elasticsearch] = None, index: Optional[List[str]] = None, depth: int = 1, root_path: Optional[str] = None, initial_tree: Optional[pandagg.tree.mappings.Mappings] = None)[source]

Bases: pandagg.utils.DSLMixin, lighttree.interactive.TreeBasedObj

Interactive wrapper upon mappings tree, allowing field navigation and quick access to single clause aggregations computation.

pandagg.interactive.response module
Module contents

pandagg.node package

Subpackages
pandagg.node.aggs package
Submodules
pandagg.node.aggs.abstract module
pandagg.node.aggs.abstract.A(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, **body) → pandagg.node.aggs.abstract.AggClause[source]

Accept multiple syntaxes, return a AggNode instance.

Parameters:
  • name – aggregation clause name
  • type_or_agg
  • body
Returns:

AggNode

class pandagg.node.aggs.abstract.AggClause(meta: Optional[Dict[str, Any]] = None, identifier: Optional[str] = None, **body)[source]

Bases: pandagg.node._node.Node

Wrapper around elasticsearch aggregation concept. https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations.html

Each aggregation can be seen both a Node that can be encapsulated in a parent agg.

Define a method to build aggregation request.

classmethod extract_bucket_value(response: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]], value_as_dict: bool = False) → Any[source]
extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Union[None, str, float, Dict[str, Union[str, float, None]]], Dict[str, Any]]][source]
is_convertible_to_composite_source() → bool[source]
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

to_dict() → Dict[str, Dict[str, Any]][source]

ElasticSearch aggregation queries follow this formatting:

{
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
    }
}

to_dict() returns the following part (without aggregation name):

{
    "<aggregation_type>" : {
        <aggregation_body>
    }
    [,"meta" : {  [<meta_data_body>] } ]?
}
classmethod valid_on_field_type(field_type: str) → bool[source]
class pandagg.node.aggs.abstract.BucketAggClause(**body)[source]

Bases: pandagg.node.aggs.abstract.AggClause

Bucket aggregation have special abilities: they can encapsulate other aggregations as children. Each time, the extracted value is a ‘doc_count’.

Provide methods: - to build aggregation request (with children aggregations) - to to extract buckets from raw response - to build query to filter documents belonging to that bucket

Note: the aggs attribute’s only purpose is for children initiation with the following syntax: >>> from pandagg.aggs import Terms, Avg >>> agg = Terms( >>> field=’some_path’, >>> aggs={ >>> ‘avg_agg’: Avg(field=’some_other_path’) >>> } >>> )

extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Union[None, str, float, Dict[str, Union[str, float, None]]], Dict[str, Any]]][source]
class pandagg.node.aggs.abstract.FieldOrScriptMetricAgg(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MetricAgg

Metric aggregation based on single field.

class pandagg.node.aggs.abstract.MetricAgg(meta: Optional[Dict[str, Any]] = None, identifier: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.AggClause

Metric aggregation are aggregations providing a single bucket, with value attributes to be extracted.

extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Union[None, str, float, Dict[str, Union[str, float, None]]], Dict[str, Any]]][source]
class pandagg.node.aggs.abstract.MultipleBucketAgg(keyed: bool = False, key_as_string: bool = False, **body)[source]

Bases: pandagg.node.aggs.abstract.BucketAggClause

IMPLICIT_KEYED = False
extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Union[None, str, float, Dict[str, Union[str, float, None]]], Dict[str, Any]]][source]
class pandagg.node.aggs.abstract.Pipeline(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

class pandagg.node.aggs.abstract.Root(meta: Optional[Dict[str, Any]] = None, identifier: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.AggClause

Not a real aggregation. Just the initial empty dict (used as lighttree.Tree root).

KEY = '_root'
classmethod extract_bucket_value(response: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]], value_as_dict: bool = False) → Any[source]
extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Union[None, str, float, Dict[str, Union[str, float, None]]], Dict[str, Any]]][source]
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.node.aggs.abstract.ScriptPipeline(script: pandagg.types.Script, buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

VALUE_ATTRS = ['value']
class pandagg.node.aggs.abstract.UniqueBucketAgg(**body)[source]

Bases: pandagg.node.aggs.abstract.BucketAggClause

Aggregations providing a single bucket.

extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Union[None, str, float, Dict[str, Union[str, float, None]]], Dict[str, Any]]][source]
pandagg.node.aggs.bucket module
class pandagg.node.aggs.bucket.AdjacencyMatrix(filters: Dict[str, Dict[str, Dict[str, Any]]], separator: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'adjacency_matrix'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.AutoDateHistogram(field: str, buckets: Optional[int] = None, format: Optional[str] = None, time_zone: Optional[str] = None, minimum_interval: Optional[str] = None, missing: Optional[str] = None, key_as_string: bool = True, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'auto_date_histogram'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.Children(type: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'children'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.DateHistogram(field: str, interval: str = None, calendar_interval: str = None, fixed_interval: str = None, key_as_string: bool = True, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'date_histogram'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['date']
is_convertible_to_composite_source() → bool[source]
class pandagg.node.aggs.bucket.DateRange(field: str, ranges: List[pandagg.types.RangeDict], keyed: bool = False, **body)[source]

Bases: pandagg.node.aggs.bucket.Range

KEY = 'date_range'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['date']
class pandagg.node.aggs.bucket.DiversifiedSampler(field: str, shard_size: Optional[int], max_docs_per_value: Optional[int] = None, execution_hint: Optional[typing_extensions.Literal['map', 'global_ordinals', 'bytes_hash'][map, global_ordinals, bytes_hash]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'diversified_sampler'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.Filter(filter: Optional[Dict[str, Dict[str, Any]]] = None, meta: Optional[Dict[str, Any]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'filter'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.Filters(filters: Dict[str, Dict[str, Dict[str, Any]]], other_bucket: bool = False, other_bucket_key: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

DEFAULT_OTHER_KEY = '_other_'
IMPLICIT_KEYED = True
KEY = 'filters'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.GeoDistance(field: str, origin: str, ranges: List[pandagg.types.RangeDict], unit: Optional[str] = None, distance_type: Optional[typing_extensions.Literal['arc', 'plane'][arc, plane]] = None, keyed: bool = False, **body)[source]

Bases: pandagg.node.aggs.bucket.Range

KEY = 'geo_distance'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.node.aggs.bucket.GeoHashGrid(field: str, precision: Optional[int] = None, bounds: Optional[Dict[KT, VT]] = None, size: Optional[int] = None, shard_size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'geohash_grid'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['geo_point', 'geo_shape']
class pandagg.node.aggs.bucket.GeoTileGrid(field: str, precision: Optional[int] = None, bounds: Optional[Dict[KT, VT]] = None, size: Optional[int] = None, shard_size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'geotile_grid'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['geo_point', 'geo_shape']
class pandagg.node.aggs.bucket.Global(**body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'global'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.Histogram(field: str, interval: int, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'histogram'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
is_convertible_to_composite_source() → bool[source]
class pandagg.node.aggs.bucket.IPRange(field: str, ranges: List[pandagg.types.RangeDict], keyed: bool = False, **body)[source]

Bases: pandagg.node.aggs.bucket.Range

KEY = 'ip_range'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['ip']
class pandagg.node.aggs.bucket.MatchAll(**body)[source]

Bases: pandagg.node.aggs.bucket.Filter

class pandagg.node.aggs.bucket.Missing(field: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'missing'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.MultiTerms(terms: List[Dict[KT, VT]], **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'multi_terms'
VALUE_ATTRS = ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']
class pandagg.node.aggs.bucket.Nested(path: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'nested'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['nested']
class pandagg.node.aggs.bucket.Parent(type: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'parent'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.Range(field: str, ranges: List[pandagg.types.RangeDict], keyed: bool = False, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'range'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.bucket.RareTerms(field: str, max_doc_count: Optional[int] = None, precision: Optional[float] = None, include: Union[str, List[str], None] = None, exclude: Union[str, List[str], None] = None, missing: Optional[Any] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'rare_terms'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.ReverseNested(path: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'reverse_nested'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['nested']
class pandagg.node.aggs.bucket.Sampler(shard_size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'sampler'
VALUE_ATTRS = ['doc_count']
class pandagg.node.aggs.bucket.SignificantTerms(field: str, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'significant_terms'
VALUE_ATTRS = ['doc_count', 'score', 'bg_count']
class pandagg.node.aggs.bucket.SignificantText(field: str, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'significant_text'
VALUE_ATTRS = ['doc_count', 'score', 'bg_count']
WHITELISTED_MAPPING_TYPES = ['text']
class pandagg.node.aggs.bucket.Terms(field: str, missing: Union[str, int, None] = None, size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

Terms aggregation.

KEY = 'terms'
VALUE_ATTRS = ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']
is_convertible_to_composite_source() → bool[source]
class pandagg.node.aggs.bucket.VariableWidthHistogram(field: str, buckets: int, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'variable_width_histogram'
VALUE_ATTRS = ['doc_count', 'min', 'max']
pandagg.node.aggs.composite module
class pandagg.node.aggs.composite.Composite(sources: List[Dict[str, Dict[str, Dict[str, Any]]]], size: Optional[int] = None, after: Optional[Dict[str, Any]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.BucketAggClause

KEY = 'composite'
VALUE_ATTRS = ['doc_count']
after
extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Dict[str, Union[str, float, None]], Dict[str, Any]]][source]
size
source_names
sources
pandagg.node.aggs.metric module
class pandagg.node.aggs.metric.Avg(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'avg'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.Cardinality(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'cardinality'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.metric.ExtendedStats(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'extended_stats'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.GeoBound(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'geo_bounds'
VALUE_ATTRS = ['bounds']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.node.aggs.metric.GeoCentroid(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'geo_centroid'
VALUE_ATTRS = ['location']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.node.aggs.metric.Max(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'max'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.Min(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'min'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.PercentileRanks(field: str, values: List[float], **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'percentile_ranks'
VALUE_ATTRS = ['values']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.Percentiles(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

Percents body argument can be passed to specify which percentiles to fetch.

KEY = 'percentiles'
VALUE_ATTRS = ['values']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.Stats(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'stats'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.Sum(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'sum'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.node.aggs.metric.TopHits(meta: Optional[Dict[str, Any]] = None, identifier: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MetricAgg

KEY = 'top_hits'
VALUE_ATTRS = ['hits']
class pandagg.node.aggs.metric.ValueCount(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'value_count'
VALUE_ATTRS = ['value']
pandagg.node.aggs.pipeline module

Pipeline aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations-pipeline.html

class pandagg.node.aggs.pipeline.AvgBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'avg_bucket'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.BucketScript(script: pandagg.types.Script, buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.ScriptPipeline

KEY = 'bucket_script'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.BucketSelector(script: pandagg.types.Script, buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.ScriptPipeline

KEY = 'bucket_selector'
VALUE_ATTRS = []
class pandagg.node.aggs.pipeline.BucketSort(script: pandagg.types.Script, buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.ScriptPipeline

KEY = 'bucket_sort'
VALUE_ATTRS = []
class pandagg.node.aggs.pipeline.CumulativeSum(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'cumulative_sum'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.Derivative(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'derivative'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.ExtendedStatsBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'extended_stats_bucket'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']
class pandagg.node.aggs.pipeline.MaxBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'max_bucket'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.MinBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'min_bucket'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.MovingAvg(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'moving_avg'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.PercentilesBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'percentiles_bucket'
VALUE_ATTRS = ['values']
class pandagg.node.aggs.pipeline.SerialDiff(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'serial_diff'
VALUE_ATTRS = ['value']
class pandagg.node.aggs.pipeline.StatsBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'stats_bucket'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']
class pandagg.node.aggs.pipeline.SumBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'sum_bucket'
VALUE_ATTRS = ['value']
Module contents
pandagg.node.mappings package
Submodules
pandagg.node.mappings.abstract module
class pandagg.node.mappings.abstract.ComplexField(properties: Optional[Union[Dict, Type[DocumentSource]]] = None, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

is_valid_value(v: Any) → bool[source]
class pandagg.node.mappings.abstract.Field(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node._node.Node

is_valid_value(v: Any) → bool[source]
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

to_dict() → Dict[str, Any][source]
class pandagg.node.mappings.abstract.RegularField(**body)[source]

Bases: pandagg.node.mappings.abstract.Field

is_valid_value(v: Any) → bool[source]
class pandagg.node.mappings.abstract.Root(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

KEY = ''
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

pandagg.node.mappings.field_datatypes module

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

class pandagg.node.mappings.field_datatypes.Alias(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Defines an alias to an existing field.

KEY = 'alias'
class pandagg.node.mappings.field_datatypes.Binary(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'binary'
class pandagg.node.mappings.field_datatypes.Boolean(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'boolean'
class pandagg.node.mappings.field_datatypes.Byte(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'byte'
class pandagg.node.mappings.field_datatypes.Completion(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To provide auto-complete suggestions

KEY = 'completion'
class pandagg.node.mappings.field_datatypes.ConstantKeyword(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'constant_keyword'
class pandagg.node.mappings.field_datatypes.Date(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'date'
class pandagg.node.mappings.field_datatypes.DateNanos(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'date_nanos'
class pandagg.node.mappings.field_datatypes.DateRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'date_range'
class pandagg.node.mappings.field_datatypes.DenseVector(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record dense vectors of float values.

KEY = 'dense_vector'
class pandagg.node.mappings.field_datatypes.Double(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'double'
class pandagg.node.mappings.field_datatypes.DoubleRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'double_range'
class pandagg.node.mappings.field_datatypes.Flattened(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Allows an entire JSON object to be indexed as a single field.

KEY = 'flattened'
class pandagg.node.mappings.field_datatypes.Float(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'float'
class pandagg.node.mappings.field_datatypes.FloatRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'float_range'
class pandagg.node.mappings.field_datatypes.GeoPoint(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For lat/lon points

KEY = 'geo_point'
class pandagg.node.mappings.field_datatypes.GeoShape(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For complex shapes like polygons

KEY = 'geo_shape'
class pandagg.node.mappings.field_datatypes.HalfFloat(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'half_float'
class pandagg.node.mappings.field_datatypes.Histogram(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For pre-aggregated numerical values for percentiles aggregations.

KEY = 'histogram'
class pandagg.node.mappings.field_datatypes.IP(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

for IPv4 and IPv6 addresses

KEY = 'ip'
class pandagg.node.mappings.field_datatypes.Integer(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'integer'
class pandagg.node.mappings.field_datatypes.IntegerRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'integer_range'
class pandagg.node.mappings.field_datatypes.IpRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'ip_range'
class pandagg.node.mappings.field_datatypes.Join(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Defines parent/child relation for documents within the same index

KEY = 'join'
class pandagg.node.mappings.field_datatypes.Keyword(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'keyword'
class pandagg.node.mappings.field_datatypes.Long(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'long'
class pandagg.node.mappings.field_datatypes.LongRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'long_range'
class pandagg.node.mappings.field_datatypes.MapperAnnotatedText(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To index text containing special markup (typically used for identifying named entities)

KEY = 'annotated-text'
class pandagg.node.mappings.field_datatypes.MapperMurMur3(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To compute hashes of values at index-time and store them in the index

KEY = 'murmur3'
class pandagg.node.mappings.field_datatypes.Nested(properties: Optional[Union[Dict, Type[DocumentSource]]] = None, **body)[source]

Bases: pandagg.node.mappings.abstract.ComplexField

KEY = 'nested'
class pandagg.node.mappings.field_datatypes.Object(properties: Optional[Union[Dict, Type[DocumentSource]]] = None, **body)[source]

Bases: pandagg.node.mappings.abstract.ComplexField

KEY = 'object'
class pandagg.node.mappings.field_datatypes.Percolator(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Accepts queries from the query-dsl

KEY = 'percolator'
class pandagg.node.mappings.field_datatypes.RankFeature(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record numeric feature to boost hits at query time.

KEY = 'rank_feature'
class pandagg.node.mappings.field_datatypes.RankFeatures(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record numeric features to boost hits at query time.

KEY = 'rank_features'
class pandagg.node.mappings.field_datatypes.ScaledFloat(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'scaled_float'
class pandagg.node.mappings.field_datatypes.SearchAsYouType(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

A text-like field optimized for queries to implement as-you-type completion

KEY = 'search_as_you_type'
class pandagg.node.mappings.field_datatypes.Shape(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For arbitrary cartesian geometries.

KEY = 'shape'
class pandagg.node.mappings.field_datatypes.Short(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'short'
class pandagg.node.mappings.field_datatypes.SparseVector(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record sparse vectors of float values.

KEY = 'sparse_vector'
class pandagg.node.mappings.field_datatypes.Text(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'text'
class pandagg.node.mappings.field_datatypes.TokenCount(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To count the number of tokens in a string

KEY = 'token_count'
class pandagg.node.mappings.field_datatypes.WildCard(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'wildcard'
pandagg.node.mappings.meta_fields module
class pandagg.node.mappings.meta_fields.FieldNames(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

All fields in the document which contain non-null values.

KEY = '_field_names'
class pandagg.node.mappings.meta_fields.Id(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The document’s ID.

KEY = '_id'
class pandagg.node.mappings.meta_fields.Ignored(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

All fields in the document that have been ignored at index time because of ignore_malformed.

KEY = '_ignored'
class pandagg.node.mappings.meta_fields.Index(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The index to which the document belongs.

KEY = '_index'
class pandagg.node.mappings.meta_fields.Meta(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

Application specific metadata.

KEY = '_meta'
class pandagg.node.mappings.meta_fields.Routing(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

A custom routing value which routes a document to a particular shard.

KEY = '_routing'
class pandagg.node.mappings.meta_fields.Size(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The size of the _source field in bytes, provided by the mapper-size plugin.

KEY = '_size'
class pandagg.node.mappings.meta_fields.Source(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The original JSON representing the body of the document.

KEY = '_source'
class pandagg.node.mappings.meta_fields.Type(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The document’s mappings type.

KEY = '_type'
Module contents
pandagg.node.query package
Submodules
pandagg.node.query.abstract module
class pandagg.node.query.abstract.AbstractSingleFieldQueryClause(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

class pandagg.node.query.abstract.FlatFieldQueryClause(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.AbstractSingleFieldQueryClause

Query clause applied on one single field. Example:

Exists: {“exists”: {“field”: “user”}} -> field = “user” -> body = {“field”: “user”} >>> from pandagg.query import Exists >>> q = Exists(field=”user”)

DistanceFeature: {“distance_feature”: {“field”: “production_date”, “pivot”: “7d”, “origin”: “now”}} -> field = “production_date” -> body = {“field”: “production_date”, “pivot”: “7d”, “origin”: “now”} >>> from pandagg.query import DistanceFeature >>> q = DistanceFeature(field=”production_date”, pivot=”7d”, origin=”now”)

class pandagg.node.query.abstract.KeyFieldQueryClause(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.AbstractSingleFieldQueryClause

Clause with field used as key in clause body:

Term: {“term”: {“user”: {“value”: “Kimchy”, “boost”: 1}}} -> field = “user” -> body = {“user”: {“value”: “Kimchy”, “boost”: 1}} >>> from pandagg.query import Term >>> q1 = Term(user={“value”: “Kimchy”, “boost”: 1}}) >>> q2 = Term(field=”user”, value=”Kimchy”, boost=1}})

Can accept a “_implicit_param” attribute specifying which is the equivalent key when inner body isn’t a dict but a raw value. For Term: _implicit_param = “value” >>> q = Term(user=”Kimchy”) {“term”: {“user”: {“value”: “Kimchy”}}} -> field = “user” -> body = {“term”: {“user”: {“value”: “Kimchy”}}}

line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.node.query.abstract.LeafQueryClause(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.QueryClause

class pandagg.node.query.abstract.MultiFieldsQueryClause(fields: List[str], _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.node.query.abstract.ParentParameterClause[source]

Bases: pandagg.node.query.abstract.QueryClause

line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

pandagg.node.query.abstract.Q(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, **body) → pandagg.node.query.abstract.QueryClause[source]

Accept multiple syntaxes, return a QueryClause node.

Parameters:
  • type_or_query
  • body
Returns:

QueryClause

class pandagg.node.query.abstract.QueryClause(_name: Optional[str] = None, accept_children: bool = True, keyed: bool = True, _children: Any = None, **body)[source]

Bases: pandagg.node._node.Node

line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

name
to_dict() → Dict[str, Any][source]
pandagg.node.query.compound module
class pandagg.node.query.compound.Bool(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

>>> Bool(must=[], should=[], filter=[], must_not=[], boost=1.2)
KEY = 'bool'
class pandagg.node.query.compound.Boosting(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'boosting'
class pandagg.node.query.compound.CompoundClause(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.QueryClause

Compound clauses can encapsulate other query clauses:

class pandagg.node.query.compound.ConstantScore(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'constant_score'
class pandagg.node.query.compound.DisMax(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'dis_max'
class pandagg.node.query.compound.FunctionScore(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'function_score'
pandagg.node.query.full_text module
class pandagg.node.query.full_text.Common(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'common'
class pandagg.node.query.full_text.Intervals(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'intervals'
class pandagg.node.query.full_text.Match(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match'
class pandagg.node.query.full_text.MatchBoolPrefix(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match_bool_prefix'
class pandagg.node.query.full_text.MatchPhrase(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match_phrase'
class pandagg.node.query.full_text.MatchPhrasePrefix(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match_phrase_prefix'
class pandagg.node.query.full_text.MultiMatch(fields: List[str], _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.MultiFieldsQueryClause

KEY = 'multi_match'
class pandagg.node.query.full_text.QueryString(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'query_string'
class pandagg.node.query.full_text.SimpleQueryString(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'simple_string'
pandagg.node.query.geo module
class pandagg.node.query.geo.GeoBoundingBox(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'geo_bounding_box'
class pandagg.node.query.geo.GeoDistance(distance: str, **body)[source]

Bases: pandagg.node.query.abstract.AbstractSingleFieldQueryClause

KEY = 'geo_distance'
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.node.query.geo.GeoPolygone(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'geo_polygon'
class pandagg.node.query.geo.GeoShape(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'geo_shape'
pandagg.node.query.joining module
class pandagg.node.query.joining.HasChild(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'has_child'
class pandagg.node.query.joining.HasParent(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'has_parent'
class pandagg.node.query.joining.Nested(path: str, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'nested'
class pandagg.node.query.joining.ParentId(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'parent_id'
pandagg.node.query.shape module
class pandagg.node.query.shape.Shape(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'shape'
pandagg.node.query.span module
pandagg.node.query.specialized module
class pandagg.node.query.specialized.DistanceFeature(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.FlatFieldQueryClause

KEY = 'distance_feature'
class pandagg.node.query.specialized.MoreLikeThis(fields: List[str], _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.MultiFieldsQueryClause

KEY = 'more_like_this'
class pandagg.node.query.specialized.Percolate(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.FlatFieldQueryClause

KEY = 'percolate'
class pandagg.node.query.specialized.RankFeature(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.FlatFieldQueryClause

KEY = 'rank_feature'
class pandagg.node.query.specialized.Script(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'script'
class pandagg.node.query.specialized.Wrapper(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'wrapper'
pandagg.node.query.specialized_compound module
class pandagg.node.query.specialized_compound.PinnedQuery(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'pinned'
class pandagg.node.query.specialized_compound.ScriptScore(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'script_score'
pandagg.node.query.term_level module
class pandagg.node.query.term_level.Exists(field: str, _name: Optional[str] = None)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'exists'
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.node.query.term_level.Fuzzy(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'fuzzy'
class pandagg.node.query.term_level.Ids(values: List[Union[str, int]], _name: Optional[str] = None)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'ids'
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.node.query.term_level.Prefix(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'prefix'
class pandagg.node.query.term_level.Range(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'range'
class pandagg.node.query.term_level.Regexp(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'regexp'
class pandagg.node.query.term_level.Term(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'term'
class pandagg.node.query.term_level.Terms(**body)[source]

Bases: pandagg.node.query.abstract.AbstractSingleFieldQueryClause

KEY = 'terms'
class pandagg.node.query.term_level.TermsSet(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'terms_set'
class pandagg.node.query.term_level.Type(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'type'
class pandagg.node.query.term_level.Wildcard(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'wildcard'
Module contents
pandagg.node.response package
Submodules
pandagg.node.response.bucket module
Module contents
Submodules
pandagg.node.types module
Module contents

pandagg.tree package

Submodules
pandagg.tree.aggs module
class pandagg.tree.aggs.Aggs(aggs: Union[Dict[str, Union[Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause]], Aggs, None] = None, mappings: Union[pandagg.types.MappingsDict, Mappings, None] = None, nested_autocorrect: bool = False, _groupby_ptr: Optional[str] = None)[source]

Bases: pandagg.tree._tree.TreeReprMixin, lighttree.tree.Tree

Combination of aggregation clauses. This class provides handful methods to build an aggregation (see aggs() and groupby()), and is used as well to parse aggregations response in easy to manipulate formats.

Mappings declaration is optional, but doing so validates aggregation validity and automatically handles missing nested clauses.

Accept following syntaxes:

from a dict: >>> Aggs({“per_user”: {“terms”: {“field”: “user”}}})

from an other Aggs instance: >>> Aggs(Aggs({“per_user”: {“terms”: {“field”: “user”}}}))

dict with AggClause instances as values: >>> from pandagg.aggs import Terms, Avg >>> Aggs({‘per_user’: Terms(field=’user’)})

Parameters:mappingsdict or pandagg.tree.mappings.Mappings Mappings of requested indice(s). If provided, will

check aggregations validity. :param nested_autocorrect: bool In case of missing nested clauses in aggregation, if True, automatically add missing nested clauses, else raise error. Ignored if mappings are not provided. :param _groupby_ptr: str identifier of aggregation clause used as grouping element (used by clone method).

agg(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → pandagg.tree.aggs.Aggs[source]

Insert provided agg clause in copy of initial Aggs.

Accept following syntaxes for type_or_agg argument:

string, with body provided in kwargs >>> Aggs().agg(name=’some_agg’, type_or_agg=’terms’, field=’some_field’)

python dict format: >>> Aggs().agg(name=’some_agg’, type_or_agg={‘terms’: {‘field’: ‘some_field’})

AggClause instance: >>> from pandagg.aggs import Terms >>> Aggs().agg(name=’some_agg’, type_or_agg=Terms(field=’some_field’))

Parameters:
  • name – inserted agg clause name
  • type_or_agg – either agg type (str), or agg clause of dict format, or AggClause instance
  • insert_below – name of aggregation below which provided aggs should be inserted
  • at_root – if True, aggregation is inserted at root
  • body – aggregation clause body when providing string type_of_agg (remaining kwargs)
Returns:

copy of initial Aggs with provided agg inserted

aggs(aggs: Union[Dict[str, Union[Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause]], Aggs], insert_below: Optional[str] = None, at_root: bool = False) → pandagg.tree.aggs.Aggs[source]

Insert provided aggs in copy of initial Aggs.

Accept following syntaxes for provided aggs:

python dict format: >>> Aggs().aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}})

Aggs instance: >>> Aggs().aggs(Aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}}))

dict with Agg clauses values: >>> from pandagg.aggs import Terms, Avg >>> Aggs().aggs({‘some_agg’: Terms(field=’some_field’), ‘other_agg’: Avg(field=’age’)})

Parameters:
  • aggs – aggregations to insert into existing aggregation
  • insert_below – name of aggregation below which provided aggs should be inserted
  • at_root – if True, aggregation is inserted at root
Returns:

copy of initial Aggs with provided aggs inserted

applied_nested_path_at_node(nid: str) → Optional[str][source]

Return nested path applied at a clause.

Parameters:nid – clause identifier
Returns:None if no nested is applied, else applied path (str)
apply_reverse_nested(nid: Optional[str] = None) → None[source]
as_composite(size: int, after: Optional[Dict[str, Any]] = None) → pandagg.tree.aggs.Aggs[source]

Convert current aggregation into composite aggregation. For now, simply support conversion of the root aggregation clause, and doesn’t handle multi-source.

get_composition_supporting_agg() → Tuple[str, pandagg.node.aggs.abstract.AggClause][source]

Return first composite-compatible aggregation clause if possible, raise an error otherwise.

groupby(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → pandagg.tree.aggs.Aggs[source]

Insert provided aggregation clause in copy of initial Aggs.

Given the initial aggregation:

A──> B
└──> C

If insert_below = ‘A’:

A──> new──> B
       └──> C
>>> Aggs().groupby('per_user_id', 'terms', field='user_id')
{"per_user_id":{"terms":{"field":"user_id"}}}
>>> Aggs().groupby('per_user_id', {'terms': {"field": "user_id"}})
{"per_user_id":{"terms":{"field":"user_id"}}}
>>> from pandagg.aggs import Terms
>>> Aggs().groupby('per_user_id', Terms(field="user_id"))
{"per_user_id":{"terms":{"field":"user_id"}}}
Return type:pandagg.aggs.Aggs
grouped_by(agg_name: Optional[str] = None, deepest: bool = False) → pandagg.tree.aggs.Aggs[source]

Define which aggregation will be used as grouping pointer.

Either provide an aggregation name, either specify ‘deepest=True’ to consider deepest linear eligible aggregation node as pointer.

id_from_key(key: str) → str[source]

Find node identifier based on key. If multiple nodes have the same key, takes the first one.

Useful because of how pandagg implements lighttree.Tree. A bit of context:

ElasticSearch allows queries to contain multiple similarly named clauses (for queries and aggregations). As a consequence clauses names are not used as clauses identifier in Trees, and internally pandagg (as lighttree ) uses auto-generated uuids to distinguish them.

But for usability reasons, notably when declaring that an aggregation clause must be placed relatively to another one, the latter is identified by its name rather than its internal id. Since it is technically possible that multiple clauses share the same name (not recommended, but allowed), some pandagg features are ambiguous and not recommended in such context.

show(*args, line_max_length: int = 80, **kwargs) → str[source]

Return compact representation of Aggs.

>>> Aggs({
>>>     "genres": {
>>>         "terms": {"field": "genres", "size": 3},
>>>         "aggs": {
>>>             "movie_decade": {
>>>                 "date_histogram": {"field": "year", "fixed_interval": "3650d"}
>>>             }
>>>         },
>>>     }
>>> }).show()
<Aggregations>
genres                                           <terms, field="genres", size=3>
└── movie_decade          <date_histogram, field="year", fixed_interval="3650d">

All *args and **kwargs are propagated to lighttree.Tree.show method. :return: str

to_dict(from_: Optional[str] = None, depth: Optional[int] = None) → Dict[str, Dict[str, Dict[str, Any]]][source]

Serialize Aggs as dict.

Parameters:from – identifier of aggregation clause, if provided, limits serialization to this clause and its

children (used for recursion, shouldn’t be useful) :param depth: integer, if provided, limit the serialization to a given depth :return: dict

pandagg.tree.mappings module
class pandagg.tree.mappings.Mappings(properties: Optional[Dict[str, Union[Dict[str, Any], pandagg.node.mappings.abstract.Field]]] = None, dynamic: Optional[bool] = None, **body)[source]

Bases: pandagg.tree._tree.TreeReprMixin, lighttree.tree.Tree

list_nesteds_at_field(field_path: str) → List[str][source]

List nested paths that apply at a given path.

>>> mappings = Mappings(dynamic=False, properties={
>>>     'id': {'type': 'keyword'},
>>>     'comments': {'type': 'nested', 'properties': {
>>>         'comment_text': {'type': 'text'},
>>>         'date': {'type': 'date'}
>>>     }}
>>> })
>>> mappings.list_nesteds_at_field('id')
[]
>>> mappings.list_nesteds_at_field('comments')
['comments']
>>> mappings.list_nesteds_at_field('comments.comment_text')
['comments']
mapping_type_of_field(field_path: str) → str[source]

Return field type of provided field path.

>>> mappings = Mappings(dynamic=False, properties={
>>>     'id': {'type': 'keyword'},
>>>     'comments': {'type': 'nested', 'properties': {
>>>         'comment_text': {'type': 'text'},
>>>         'date': {'type': 'date'}
>>>     }}
>>> })
>>> mappings.mapping_type_of_field('id')
'keyword'
>>> mappings.mapping_type_of_field('comments')
'nested'
>>> mappings.mapping_type_of_field('comments.comment_text')
'text'
nested_at_field(field_path: str) → Optional[str][source]

Return nested path applied on a given path. Return None is none applies.

>>> mappings = Mappings(dynamic=False, properties={
>>>     'id': {'type': 'keyword'},
>>>     'comments': {'type': 'nested', 'properties': {
>>>         'comment_text': {'type': 'text'},
>>>         'date': {'type': 'date'}
>>>     }}
>>> })
>>> mappings.nested_at_field('id')
None
>>> mappings.nested_at_field('comments')
'comments'
>>> mappings.nested_at_field('comments.comment_text')
'comments'
to_dict(from_: Optional[str] = None, depth: Optional[int] = None) → pandagg.types.MappingsDict[source]

Serialize Mappings as dict.

Parameters:from – identifier of a field, if provided, limits serialization to this field and its

children (used for recursion, shouldn’t be useful) :param depth: integer, if provided, limit the serialization to a given depth :return: dict

validate_agg_clause(agg_clause: pandagg.node.aggs.abstract.AggClause, exc: bool = True) → bool[source]

Ensure that if aggregation clause relates to a field (field or path) this field exists in mappings, and that required aggregation type is allowed on this kind of field.

Parameters:
  • agg_clause – AggClause you want to validate on these mappings
  • exc – boolean, if set to True raise exception if invalid
Return type:

boolean

validate_document(d: Union[DocSource, DocumentSource]) → None[source]
class pandagg.tree.mappings.MappingsDictOrNode[source]

Bases: dict

pandagg.tree.query module
class pandagg.tree.query.Query(q: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query, None] = None, mappings: Union[pandagg.types.MappingsDict, pandagg.tree.mappings.Mappings, None] = None, nested_autocorrect: bool = False)[source]

Bases: lighttree.tree.Tree

applied_nested_path_at_node(nid: str) → Optional[str][source]

Return nested path applied at a clause.

Parameters:nid – clause identifier
Returns:None if no nested is applied, else applied path (str)
bool(must: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, should: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, must_not: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, filter: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
>>> Query().bool(must={"term": {"some_field": "yolo"}})
boosting(positive: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, negative: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
constant_score(filter: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, boost: Optional[float] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
dis_max(queries: List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
filter(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Dict[str, Any] = None, **body) → pandagg.tree.query.Query[source]
function_score(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
has_child(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
has_parent(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
must(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → pandagg.tree.query.Query[source]

Create copy of initial Query and insert provided clause under “bool” query “must”.

>>> Query().must('term', some_field=1)
>>> Query().must({'term': {'some_field': 1}})
>>> from pandagg.query import Term
>>> Query().must(Term(some_field=1))
Keyword Arguments:
 
  • insert_below (str) – named query clause under which the inserted clauses should be placed.
  • compound_param (str) – param under which inserted clause will be placed in compound query
  • on (str) – named compound query clause on which the inserted compound clause should be merged.
  • mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
    • ‘add’ (default) : adds new clauses keeping initial ones
    • ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
    • ‘replace_all’ : existing compound clause is completely replaced by the new one
must_not(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Dict[str, Any] = None, **body) → pandagg.tree.query.Query[source]
nested(path: str, query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
pinned_query(organic: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
query(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', compound_param: str = None, **body) → pandagg.tree.query.Query[source]

Insert provided clause in copy of initial Query.

>>> from pandagg.query import Query
>>> Query().query('term', some_field=23)
{'term': {'some_field': 23}}
>>> from pandagg.query import Term
>>> Query()\
>>> .query({'term': {'some_field': 23})\
>>> .query(Term(other_field=24))\
{'bool': {'must': [{'term': {'some_field': 23}}, {'term': {'other_field': 24}}]}}
Keyword Arguments:
 
  • insert_below (str) – named query clause under which the inserted clauses should be placed.
  • compound_param (str) – param under which inserted clause will be placed in compound query
  • on (str) – named compound query clause on which the inserted compound clause should be merged.
  • mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
    • ‘add’ (default) : adds new clauses keeping initial ones
    • ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
    • ‘replace_all’ : existing compound clause is completely replaced by the new one
script_score(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
should(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → pandagg.tree.query.Query[source]
show(*args, line_max_length: int = 80, **kwargs) → str[source]

Return compact representation of Query.

>>> Query()        >>> .must({"exists": {"field": "some_field"}})        >>> .must({"term": {"other_field": {"value": 5}}})        >>> .show()
<Query>
bool
└── must
    ├── exists                                                  field=some_field
    └── term                                          field=other_field, value=5

All *args and **kwargs are propagated to lighttree.Tree.show method.

to_dict(from_: Optional[str] = None) → Optional[Dict[str, Dict[str, Any]]][source]
pandagg.tree.response module
Module contents

Submodules

pandagg.aggs module

class pandagg.aggs.Aggs(aggs: Union[Dict[str, Union[Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause]], Aggs, None] = None, mappings: Union[pandagg.types.MappingsDict, Mappings, None] = None, nested_autocorrect: bool = False, _groupby_ptr: Optional[str] = None)[source]

Bases: pandagg.tree._tree.TreeReprMixin, lighttree.tree.Tree

Combination of aggregation clauses. This class provides handful methods to build an aggregation (see aggs() and groupby()), and is used as well to parse aggregations response in easy to manipulate formats.

Mappings declaration is optional, but doing so validates aggregation validity and automatically handles missing nested clauses.

Accept following syntaxes:

from a dict: >>> Aggs({“per_user”: {“terms”: {“field”: “user”}}})

from an other Aggs instance: >>> Aggs(Aggs({“per_user”: {“terms”: {“field”: “user”}}}))

dict with AggClause instances as values: >>> from pandagg.aggs import Terms, Avg >>> Aggs({‘per_user’: Terms(field=’user’)})

Parameters:mappingsdict or pandagg.tree.mappings.Mappings Mappings of requested indice(s). If provided, will

check aggregations validity. :param nested_autocorrect: bool In case of missing nested clauses in aggregation, if True, automatically add missing nested clauses, else raise error. Ignored if mappings are not provided. :param _groupby_ptr: str identifier of aggregation clause used as grouping element (used by clone method).

agg(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → pandagg.tree.aggs.Aggs[source]

Insert provided agg clause in copy of initial Aggs.

Accept following syntaxes for type_or_agg argument:

string, with body provided in kwargs >>> Aggs().agg(name=’some_agg’, type_or_agg=’terms’, field=’some_field’)

python dict format: >>> Aggs().agg(name=’some_agg’, type_or_agg={‘terms’: {‘field’: ‘some_field’})

AggClause instance: >>> from pandagg.aggs import Terms >>> Aggs().agg(name=’some_agg’, type_or_agg=Terms(field=’some_field’))

Parameters:
  • name – inserted agg clause name
  • type_or_agg – either agg type (str), or agg clause of dict format, or AggClause instance
  • insert_below – name of aggregation below which provided aggs should be inserted
  • at_root – if True, aggregation is inserted at root
  • body – aggregation clause body when providing string type_of_agg (remaining kwargs)
Returns:

copy of initial Aggs with provided agg inserted

aggs(aggs: Union[Dict[str, Union[Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause]], Aggs], insert_below: Optional[str] = None, at_root: bool = False) → pandagg.tree.aggs.Aggs[source]

Insert provided aggs in copy of initial Aggs.

Accept following syntaxes for provided aggs:

python dict format: >>> Aggs().aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}})

Aggs instance: >>> Aggs().aggs(Aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}}))

dict with Agg clauses values: >>> from pandagg.aggs import Terms, Avg >>> Aggs().aggs({‘some_agg’: Terms(field=’some_field’), ‘other_agg’: Avg(field=’age’)})

Parameters:
  • aggs – aggregations to insert into existing aggregation
  • insert_below – name of aggregation below which provided aggs should be inserted
  • at_root – if True, aggregation is inserted at root
Returns:

copy of initial Aggs with provided aggs inserted

applied_nested_path_at_node(nid: str) → Optional[str][source]

Return nested path applied at a clause.

Parameters:nid – clause identifier
Returns:None if no nested is applied, else applied path (str)
apply_reverse_nested(nid: Optional[str] = None) → None[source]
as_composite(size: int, after: Optional[Dict[str, Any]] = None) → pandagg.tree.aggs.Aggs[source]

Convert current aggregation into composite aggregation. For now, simply support conversion of the root aggregation clause, and doesn’t handle multi-source.

get_composition_supporting_agg() → Tuple[str, pandagg.node.aggs.abstract.AggClause][source]

Return first composite-compatible aggregation clause if possible, raise an error otherwise.

groupby(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → pandagg.tree.aggs.Aggs[source]

Insert provided aggregation clause in copy of initial Aggs.

Given the initial aggregation:

A──> B
└──> C

If insert_below = ‘A’:

A──> new──> B
       └──> C
>>> Aggs().groupby('per_user_id', 'terms', field='user_id')
{"per_user_id":{"terms":{"field":"user_id"}}}
>>> Aggs().groupby('per_user_id', {'terms': {"field": "user_id"}})
{"per_user_id":{"terms":{"field":"user_id"}}}
>>> from pandagg.aggs import Terms
>>> Aggs().groupby('per_user_id', Terms(field="user_id"))
{"per_user_id":{"terms":{"field":"user_id"}}}
Return type:pandagg.aggs.Aggs
grouped_by(agg_name: Optional[str] = None, deepest: bool = False) → pandagg.tree.aggs.Aggs[source]

Define which aggregation will be used as grouping pointer.

Either provide an aggregation name, either specify ‘deepest=True’ to consider deepest linear eligible aggregation node as pointer.

id_from_key(key: str) → str[source]

Find node identifier based on key. If multiple nodes have the same key, takes the first one.

Useful because of how pandagg implements lighttree.Tree. A bit of context:

ElasticSearch allows queries to contain multiple similarly named clauses (for queries and aggregations). As a consequence clauses names are not used as clauses identifier in Trees, and internally pandagg (as lighttree ) uses auto-generated uuids to distinguish them.

But for usability reasons, notably when declaring that an aggregation clause must be placed relatively to another one, the latter is identified by its name rather than its internal id. Since it is technically possible that multiple clauses share the same name (not recommended, but allowed), some pandagg features are ambiguous and not recommended in such context.

show(*args, line_max_length: int = 80, **kwargs) → str[source]

Return compact representation of Aggs.

>>> Aggs({
>>>     "genres": {
>>>         "terms": {"field": "genres", "size": 3},
>>>         "aggs": {
>>>             "movie_decade": {
>>>                 "date_histogram": {"field": "year", "fixed_interval": "3650d"}
>>>             }
>>>         },
>>>     }
>>> }).show()
<Aggregations>
genres                                           <terms, field="genres", size=3>
└── movie_decade          <date_histogram, field="year", fixed_interval="3650d">

All *args and **kwargs are propagated to lighttree.Tree.show method. :return: str

to_dict(from_: Optional[str] = None, depth: Optional[int] = None) → Dict[str, Dict[str, Dict[str, Any]]][source]

Serialize Aggs as dict.

Parameters:from – identifier of aggregation clause, if provided, limits serialization to this clause and its

children (used for recursion, shouldn’t be useful) :param depth: integer, if provided, limit the serialization to a given depth :return: dict

class pandagg.aggs.Terms(field: str, missing: Union[str, int, None] = None, size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

Terms aggregation.

KEY = 'terms'
VALUE_ATTRS = ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']
is_convertible_to_composite_source() → bool[source]
class pandagg.aggs.Filters(filters: Dict[str, Dict[str, Dict[str, Any]]], other_bucket: bool = False, other_bucket_key: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

DEFAULT_OTHER_KEY = '_other_'
IMPLICIT_KEYED = True
KEY = 'filters'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.Histogram(field: str, interval: int, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'histogram'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
is_convertible_to_composite_source() → bool[source]
class pandagg.aggs.DateHistogram(field: str, interval: str = None, calendar_interval: str = None, fixed_interval: str = None, key_as_string: bool = True, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'date_histogram'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['date']
is_convertible_to_composite_source() → bool[source]
class pandagg.aggs.Range(field: str, ranges: List[pandagg.types.RangeDict], keyed: bool = False, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'range'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.Global(**body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'global'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.Filter(filter: Optional[Dict[str, Dict[str, Any]]] = None, meta: Optional[Dict[str, Any]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'filter'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.Missing(field: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'missing'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.Nested(path: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'nested'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['nested']
class pandagg.aggs.ReverseNested(path: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'reverse_nested'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['nested']
class pandagg.aggs.Avg(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'avg'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.Max(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'max'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.Sum(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'sum'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.Min(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'min'
VALUE_ATTRS = ['value']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.Cardinality(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'cardinality'
VALUE_ATTRS = ['value']
class pandagg.aggs.Stats(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'stats'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.ExtendedStats(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'extended_stats'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.Percentiles(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

Percents body argument can be passed to specify which percentiles to fetch.

KEY = 'percentiles'
VALUE_ATTRS = ['values']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.PercentileRanks(field: str, values: List[float], **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'percentile_ranks'
VALUE_ATTRS = ['values']
WHITELISTED_MAPPING_TYPES = ['long', 'integer', 'short', 'byte', 'double', 'float', 'half_float', 'scaled_float', 'ip', 'token_count', 'date', 'boolean']
class pandagg.aggs.GeoBound(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'geo_bounds'
VALUE_ATTRS = ['bounds']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.aggs.GeoCentroid(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'geo_centroid'
VALUE_ATTRS = ['location']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.aggs.TopHits(meta: Optional[Dict[str, Any]] = None, identifier: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MetricAgg

KEY = 'top_hits'
VALUE_ATTRS = ['hits']
class pandagg.aggs.ValueCount(field: Optional[str] = None, script: Optional[pandagg.types.Script] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.FieldOrScriptMetricAgg

KEY = 'value_count'
VALUE_ATTRS = ['value']
class pandagg.aggs.AvgBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'avg_bucket'
VALUE_ATTRS = ['value']
class pandagg.aggs.Derivative(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'derivative'
VALUE_ATTRS = ['value']
class pandagg.aggs.MaxBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'max_bucket'
VALUE_ATTRS = ['value']
class pandagg.aggs.MinBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'min_bucket'
VALUE_ATTRS = ['value']
class pandagg.aggs.SumBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'sum_bucket'
VALUE_ATTRS = ['value']
class pandagg.aggs.StatsBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'stats_bucket'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum']
class pandagg.aggs.ExtendedStatsBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'extended_stats_bucket'
VALUE_ATTRS = ['count', 'min', 'max', 'avg', 'sum', 'sum_of_squares', 'variance', 'std_deviation', 'std_deviation_bounds']
class pandagg.aggs.PercentilesBucket(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'percentiles_bucket'
VALUE_ATTRS = ['values']
class pandagg.aggs.MovingAvg(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'moving_avg'
VALUE_ATTRS = ['value']
class pandagg.aggs.CumulativeSum(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'cumulative_sum'
VALUE_ATTRS = ['value']
class pandagg.aggs.BucketScript(script: pandagg.types.Script, buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.ScriptPipeline

KEY = 'bucket_script'
VALUE_ATTRS = ['value']
class pandagg.aggs.BucketSelector(script: pandagg.types.Script, buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.ScriptPipeline

KEY = 'bucket_selector'
VALUE_ATTRS = []
class pandagg.aggs.BucketSort(script: pandagg.types.Script, buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.ScriptPipeline

KEY = 'bucket_sort'
VALUE_ATTRS = []
class pandagg.aggs.SerialDiff(buckets_path: str, gap_policy: Optional[typing_extensions.Literal['skip', 'insert_zeros', 'keep_values'][skip, insert_zeros, keep_values]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.Pipeline

KEY = 'serial_diff'
VALUE_ATTRS = ['value']
class pandagg.aggs.MatchAll(**body)[source]

Bases: pandagg.node.aggs.bucket.Filter

class pandagg.aggs.Composite(sources: List[Dict[str, Dict[str, Dict[str, Any]]]], size: Optional[int] = None, after: Optional[Dict[str, Any]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.BucketAggClause

KEY = 'composite'
VALUE_ATTRS = ['doc_count']
after
extract_buckets(response_value: Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]) → Iterator[Tuple[Dict[str, Union[str, float, None]], Dict[str, Any]]][source]
size
source_names
sources
class pandagg.aggs.GeoHashGrid(field: str, precision: Optional[int] = None, bounds: Optional[Dict[KT, VT]] = None, size: Optional[int] = None, shard_size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'geohash_grid'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['geo_point', 'geo_shape']
class pandagg.aggs.GeoDistance(field: str, origin: str, ranges: List[pandagg.types.RangeDict], unit: Optional[str] = None, distance_type: Optional[typing_extensions.Literal['arc', 'plane'][arc, plane]] = None, keyed: bool = False, **body)[source]

Bases: pandagg.node.aggs.bucket.Range

KEY = 'geo_distance'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['geo_point']
class pandagg.aggs.AdjacencyMatrix(filters: Dict[str, Dict[str, Dict[str, Any]]], separator: Optional[str] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'adjacency_matrix'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.AutoDateHistogram(field: str, buckets: Optional[int] = None, format: Optional[str] = None, time_zone: Optional[str] = None, minimum_interval: Optional[str] = None, missing: Optional[str] = None, key_as_string: bool = True, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'auto_date_histogram'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.VariableWidthHistogram(field: str, buckets: int, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'variable_width_histogram'
VALUE_ATTRS = ['doc_count', 'min', 'max']
class pandagg.aggs.SignificantTerms(field: str, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'significant_terms'
VALUE_ATTRS = ['doc_count', 'score', 'bg_count']
class pandagg.aggs.RareTerms(field: str, max_doc_count: Optional[int] = None, precision: Optional[float] = None, include: Union[str, List[str], None] = None, exclude: Union[str, List[str], None] = None, missing: Optional[Any] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'rare_terms'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.GeoTileGrid(field: str, precision: Optional[int] = None, bounds: Optional[Dict[KT, VT]] = None, size: Optional[int] = None, shard_size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'geotile_grid'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['geo_point', 'geo_shape']
class pandagg.aggs.IPRange(field: str, ranges: List[pandagg.types.RangeDict], keyed: bool = False, **body)[source]

Bases: pandagg.node.aggs.bucket.Range

KEY = 'ip_range'
VALUE_ATTRS = ['doc_count']
WHITELISTED_MAPPING_TYPES = ['ip']
class pandagg.aggs.Sampler(shard_size: Optional[int] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'sampler'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.DiversifiedSampler(field: str, shard_size: Optional[int], max_docs_per_value: Optional[int] = None, execution_hint: Optional[typing_extensions.Literal['map', 'global_ordinals', 'bytes_hash'][map, global_ordinals, bytes_hash]] = None, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'diversified_sampler'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.Children(type: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'children'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.Parent(type: str, **body)[source]

Bases: pandagg.node.aggs.abstract.UniqueBucketAgg

KEY = 'parent'
VALUE_ATTRS = ['doc_count']
class pandagg.aggs.SignificantText(field: str, **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'significant_text'
VALUE_ATTRS = ['doc_count', 'score', 'bg_count']
WHITELISTED_MAPPING_TYPES = ['text']
class pandagg.aggs.MultiTerms(terms: List[Dict[KT, VT]], **body)[source]

Bases: pandagg.node.aggs.abstract.MultipleBucketAgg

KEY = 'multi_terms'
VALUE_ATTRS = ['doc_count', 'doc_count_error_upper_bound', 'sum_other_doc_count']

pandagg.discovery module

class pandagg.discovery.Index(name: str, settings: Dict[str, Any], mappings: pandagg.types.MappingsDict, aliases: Any, client: Union[elasticsearch.client.Elasticsearch, NoneType] = None)[source]

Bases: object

client = None
imappings
search(nested_autocorrect: bool = True, repr_auto_execute: bool = True) → pandagg.search.Search[source]
class pandagg.discovery.Indices(**kwargs)[source]

Bases: lighttree.interactive.Obj

pandagg.discovery.discover(using: elasticsearch.client.Elasticsearch, index: str = '*') → pandagg.discovery.Indices[source]
Parameters:
  • using – Elasticsearch client
  • index – Comma-separated list or wildcard expression of index names used to limit the request.

pandagg.exceptions module

exception pandagg.exceptions.AbsentMappingFieldError[source]

Bases: pandagg.exceptions.MappingError

Field is not present in mappings.

exception pandagg.exceptions.InvalidAggregation[source]

Bases: Exception

Wrong aggregation definition

exception pandagg.exceptions.InvalidOperationMappingFieldError[source]

Bases: pandagg.exceptions.MappingError

Invalid aggregation type on this mappings field.

exception pandagg.exceptions.MappingError[source]

Bases: Exception

Basic Mappings Error

exception pandagg.exceptions.VersionIncompatibilityError[source]

Bases: Exception

Pandagg is not compatible with this ElasticSearch version.

pandagg.mappings module

class pandagg.mappings.Mappings(properties: Optional[Dict[str, Union[Dict[str, Any], pandagg.node.mappings.abstract.Field]]] = None, dynamic: Optional[bool] = None, **body)[source]

Bases: pandagg.tree._tree.TreeReprMixin, lighttree.tree.Tree

list_nesteds_at_field(field_path: str) → List[str][source]

List nested paths that apply at a given path.

>>> mappings = Mappings(dynamic=False, properties={
>>>     'id': {'type': 'keyword'},
>>>     'comments': {'type': 'nested', 'properties': {
>>>         'comment_text': {'type': 'text'},
>>>         'date': {'type': 'date'}
>>>     }}
>>> })
>>> mappings.list_nesteds_at_field('id')
[]
>>> mappings.list_nesteds_at_field('comments')
['comments']
>>> mappings.list_nesteds_at_field('comments.comment_text')
['comments']
mapping_type_of_field(field_path: str) → str[source]

Return field type of provided field path.

>>> mappings = Mappings(dynamic=False, properties={
>>>     'id': {'type': 'keyword'},
>>>     'comments': {'type': 'nested', 'properties': {
>>>         'comment_text': {'type': 'text'},
>>>         'date': {'type': 'date'}
>>>     }}
>>> })
>>> mappings.mapping_type_of_field('id')
'keyword'
>>> mappings.mapping_type_of_field('comments')
'nested'
>>> mappings.mapping_type_of_field('comments.comment_text')
'text'
nested_at_field(field_path: str) → Optional[str][source]

Return nested path applied on a given path. Return None is none applies.

>>> mappings = Mappings(dynamic=False, properties={
>>>     'id': {'type': 'keyword'},
>>>     'comments': {'type': 'nested', 'properties': {
>>>         'comment_text': {'type': 'text'},
>>>         'date': {'type': 'date'}
>>>     }}
>>> })
>>> mappings.nested_at_field('id')
None
>>> mappings.nested_at_field('comments')
'comments'
>>> mappings.nested_at_field('comments.comment_text')
'comments'
to_dict(from_: Optional[str] = None, depth: Optional[int] = None) → pandagg.types.MappingsDict[source]

Serialize Mappings as dict.

Parameters:from – identifier of a field, if provided, limits serialization to this field and its

children (used for recursion, shouldn’t be useful) :param depth: integer, if provided, limit the serialization to a given depth :return: dict

validate_agg_clause(agg_clause: pandagg.node.aggs.abstract.AggClause, exc: bool = True) → bool[source]

Ensure that if aggregation clause relates to a field (field or path) this field exists in mappings, and that required aggregation type is allowed on this kind of field.

Parameters:
  • agg_clause – AggClause you want to validate on these mappings
  • exc – boolean, if set to True raise exception if invalid
Return type:

boolean

validate_document(d: Union[DocSource, DocumentSource]) → None[source]
class pandagg.mappings.IMappings(mappings: pandagg.tree.mappings.Mappings, client: Optional[elasticsearch.client.Elasticsearch] = None, index: Optional[List[str]] = None, depth: int = 1, root_path: Optional[str] = None, initial_tree: Optional[pandagg.tree.mappings.Mappings] = None)[source]

Bases: pandagg.utils.DSLMixin, lighttree.interactive.TreeBasedObj

Interactive wrapper upon mappings tree, allowing field navigation and quick access to single clause aggregations computation.

class pandagg.mappings.IpRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'ip_range'
class pandagg.mappings.Text(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'text'
class pandagg.mappings.Keyword(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'keyword'
class pandagg.mappings.ConstantKeyword(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'constant_keyword'
class pandagg.mappings.WildCard(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'wildcard'
class pandagg.mappings.Long(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'long'
class pandagg.mappings.Integer(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'integer'
class pandagg.mappings.Short(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'short'
class pandagg.mappings.Byte(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'byte'
class pandagg.mappings.Double(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'double'
class pandagg.mappings.HalfFloat(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'half_float'
class pandagg.mappings.ScaledFloat(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'scaled_float'
class pandagg.mappings.Date(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'date'
class pandagg.mappings.DateNanos(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'date_nanos'
class pandagg.mappings.Boolean(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'boolean'
class pandagg.mappings.Binary(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'binary'
class pandagg.mappings.IntegerRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'integer_range'
class pandagg.mappings.Float(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'float'
class pandagg.mappings.FloatRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'float_range'
class pandagg.mappings.LongRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'long_range'
class pandagg.mappings.DoubleRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'double_range'
class pandagg.mappings.DateRange(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

KEY = 'date_range'
class pandagg.mappings.Object(properties: Optional[Union[Dict, Type[DocumentSource]]] = None, **body)[source]

Bases: pandagg.node.mappings.abstract.ComplexField

KEY = 'object'
class pandagg.mappings.Nested(properties: Optional[Union[Dict, Type[DocumentSource]]] = None, **body)[source]

Bases: pandagg.node.mappings.abstract.ComplexField

KEY = 'nested'
class pandagg.mappings.GeoPoint(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For lat/lon points

KEY = 'geo_point'
class pandagg.mappings.GeoShape(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For complex shapes like polygons

KEY = 'geo_shape'
class pandagg.mappings.IP(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

for IPv4 and IPv6 addresses

KEY = 'ip'
class pandagg.mappings.Completion(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To provide auto-complete suggestions

KEY = 'completion'
class pandagg.mappings.TokenCount(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To count the number of tokens in a string

KEY = 'token_count'
class pandagg.mappings.MapperMurMur3(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To compute hashes of values at index-time and store them in the index

KEY = 'murmur3'
class pandagg.mappings.MapperAnnotatedText(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

To index text containing special markup (typically used for identifying named entities)

KEY = 'annotated-text'
class pandagg.mappings.Percolator(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Accepts queries from the query-dsl

KEY = 'percolator'
class pandagg.mappings.Join(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Defines parent/child relation for documents within the same index

KEY = 'join'
class pandagg.mappings.RankFeature(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record numeric feature to boost hits at query time.

KEY = 'rank_feature'
class pandagg.mappings.RankFeatures(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record numeric features to boost hits at query time.

KEY = 'rank_features'
class pandagg.mappings.DenseVector(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record dense vectors of float values.

KEY = 'dense_vector'
class pandagg.mappings.SparseVector(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Record sparse vectors of float values.

KEY = 'sparse_vector'
class pandagg.mappings.SearchAsYouType(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

A text-like field optimized for queries to implement as-you-type completion

KEY = 'search_as_you_type'
class pandagg.mappings.Alias(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Defines an alias to an existing field.

KEY = 'alias'
class pandagg.mappings.Flattened(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

Allows an entire JSON object to be indexed as a single field.

KEY = 'flattened'
class pandagg.mappings.Shape(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For arbitrary cartesian geometries.

KEY = 'shape'
class pandagg.mappings.Histogram(**body)[source]

Bases: pandagg.node.mappings.abstract.RegularField

For pre-aggregated numerical values for percentiles aggregations.

KEY = 'histogram'
class pandagg.mappings.Index(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The index to which the document belongs.

KEY = '_index'
class pandagg.mappings.Type(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The document’s mappings type.

KEY = '_type'
class pandagg.mappings.Id(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The document’s ID.

KEY = '_id'
class pandagg.mappings.FieldNames(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

All fields in the document which contain non-null values.

KEY = '_field_names'
class pandagg.mappings.Source(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The original JSON representing the body of the document.

KEY = '_source'
class pandagg.mappings.Size(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

The size of the _source field in bytes, provided by the mapper-size plugin.

KEY = '_size'
class pandagg.mappings.Ignored(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

All fields in the document that have been ignored at index time because of ignore_malformed.

KEY = '_ignored'
class pandagg.mappings.Routing(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

A custom routing value which routes a document to a particular shard.

KEY = '_routing'
class pandagg.mappings.Meta(*, multiple: Optional[bool] = None, required: bool = False, **body)[source]

Bases: pandagg.node.mappings.abstract.Field

Application specific metadata.

KEY = '_meta'

pandagg.query module

class pandagg.query.Query(q: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query, None] = None, mappings: Union[pandagg.types.MappingsDict, pandagg.tree.mappings.Mappings, None] = None, nested_autocorrect: bool = False)[source]

Bases: lighttree.tree.Tree

applied_nested_path_at_node(nid: str) → Optional[str][source]

Return nested path applied at a clause.

Parameters:nid – clause identifier
Returns:None if no nested is applied, else applied path (str)
bool(must: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, should: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, must_not: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, filter: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
>>> Query().bool(must={"term": {"some_field": "yolo"}})
boosting(positive: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, negative: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
constant_score(filter: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, boost: Optional[float] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
dis_max(queries: List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
filter(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Dict[str, Any] = None, **body) → pandagg.tree.query.Query[source]
function_score(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
has_child(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
has_parent(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
must(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → pandagg.tree.query.Query[source]

Create copy of initial Query and insert provided clause under “bool” query “must”.

>>> Query().must('term', some_field=1)
>>> Query().must({'term': {'some_field': 1}})
>>> from pandagg.query import Term
>>> Query().must(Term(some_field=1))
Keyword Arguments:
 
  • insert_below (str) – named query clause under which the inserted clauses should be placed.
  • compound_param (str) – param under which inserted clause will be placed in compound query
  • on (str) – named compound query clause on which the inserted compound clause should be merged.
  • mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
    • ‘add’ (default) : adds new clauses keeping initial ones
    • ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
    • ‘replace_all’ : existing compound clause is completely replaced by the new one
must_not(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Dict[str, Any] = None, **body) → pandagg.tree.query.Query[source]
nested(path: str, query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
pinned_query(organic: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
query(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', compound_param: str = None, **body) → pandagg.tree.query.Query[source]

Insert provided clause in copy of initial Query.

>>> from pandagg.query import Query
>>> Query().query('term', some_field=23)
{'term': {'some_field': 23}}
>>> from pandagg.query import Term
>>> Query()\
>>> .query({'term': {'some_field': 23})\
>>> .query(Term(other_field=24))\
{'bool': {'must': [{'term': {'some_field': 23}}, {'term': {'other_field': 24}}]}}
Keyword Arguments:
 
  • insert_below (str) – named query clause under which the inserted clauses should be placed.
  • compound_param (str) – param under which inserted clause will be placed in compound query
  • on (str) – named compound query clause on which the inserted compound clause should be merged.
  • mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
    • ‘add’ (default) : adds new clauses keeping initial ones
    • ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
    • ‘replace_all’ : existing compound clause is completely replaced by the new one
script_score(query: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, None], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → pandagg.tree.query.Query[source]
should(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → pandagg.tree.query.Query[source]
show(*args, line_max_length: int = 80, **kwargs) → str[source]

Return compact representation of Query.

>>> Query()        >>> .must({"exists": {"field": "some_field"}})        >>> .must({"term": {"other_field": {"value": 5}}})        >>> .show()
<Query>
bool
└── must
    ├── exists                                                  field=some_field
    └── term                                          field=other_field, value=5

All *args and **kwargs are propagated to lighttree.Tree.show method.

to_dict(from_: Optional[str] = None) → Optional[Dict[str, Dict[str, Any]]][source]
class pandagg.query.Exists(field: str, _name: Optional[str] = None)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'exists'
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.query.Fuzzy(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'fuzzy'
class pandagg.query.Ids(values: List[Union[str, int]], _name: Optional[str] = None)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'ids'
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.query.Prefix(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'prefix'
class pandagg.query.Range(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'range'
class pandagg.query.Regexp(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'regexp'
class pandagg.query.Term(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'term'
class pandagg.query.Terms(**body)[source]

Bases: pandagg.node.query.abstract.AbstractSingleFieldQueryClause

KEY = 'terms'
class pandagg.query.TermsSet(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'terms_set'
class pandagg.query.Type(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'type'
class pandagg.query.Wildcard(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'wildcard'
class pandagg.query.Intervals(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'intervals'
class pandagg.query.Match(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match'
class pandagg.query.MatchBoolPrefix(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match_bool_prefix'
class pandagg.query.MatchPhrase(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match_phrase'
class pandagg.query.MatchPhrasePrefix(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'match_phrase_prefix'
class pandagg.query.MultiMatch(fields: List[str], _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.MultiFieldsQueryClause

KEY = 'multi_match'
class pandagg.query.Common(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'common'
class pandagg.query.QueryString(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'query_string'
class pandagg.query.SimpleQueryString(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'simple_string'
class pandagg.query.Bool(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

>>> Bool(must=[], should=[], filter=[], must_not=[], boost=1.2)
KEY = 'bool'
class pandagg.query.Boosting(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'boosting'
class pandagg.query.ConstantScore(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'constant_score'
class pandagg.query.FunctionScore(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'function_score'
class pandagg.query.DisMax(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'dis_max'
class pandagg.query.Nested(path: str, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'nested'
class pandagg.query.HasParent(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'has_parent'
class pandagg.query.HasChild(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'has_child'
class pandagg.query.ParentId(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'parent_id'
class pandagg.query.Shape(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'shape'
class pandagg.query.GeoShape(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'geo_shape'
class pandagg.query.GeoPolygone(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'geo_polygon'
class pandagg.query.GeoDistance(distance: str, **body)[source]

Bases: pandagg.node.query.abstract.AbstractSingleFieldQueryClause

KEY = 'geo_distance'
line_repr(depth: int, **kwargs) → Tuple[str, str][source]

Control how node is displayed in tree representation. First returned string is how node is represented on left, second string is how node is represented on right.

MyTree ├── one OneEnd │ └── two twoEnd └── three threeEnd

class pandagg.query.GeoBoundingBox(field: Optional[str] = None, _name: Optional[str] = None, _expand__to_dot: bool = True, **params)[source]

Bases: pandagg.node.query.abstract.KeyFieldQueryClause

KEY = 'geo_bounding_box'
class pandagg.query.DistanceFeature(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.FlatFieldQueryClause

KEY = 'distance_feature'
class pandagg.query.MoreLikeThis(fields: List[str], _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.MultiFieldsQueryClause

KEY = 'more_like_this'
class pandagg.query.Percolate(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.FlatFieldQueryClause

KEY = 'percolate'
class pandagg.query.RankFeature(field: str, _name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.FlatFieldQueryClause

KEY = 'rank_feature'
class pandagg.query.Script(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'script'
class pandagg.query.Wrapper(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.abstract.LeafQueryClause

KEY = 'wrapper'
class pandagg.query.ScriptScore(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'script_score'
class pandagg.query.PinnedQuery(_name: Optional[str] = None, **body)[source]

Bases: pandagg.node.query.compound.CompoundClause

KEY = 'pinned'

pandagg.response module

class pandagg.response.Aggregations(data: 'AggregationsResponseDict', _search: 'Search')[source]

Bases: object

keys() → List[str][source]
parse_group_by(*, response: Dict[str, Union[pandagg.types.BucketsWrapperDict, Dict[str, Any]]], until: Optional[str], with_single_bucket_groups: bool = False, row_as_tuple: bool = False) → Tuple[List[str], Union[List[Tuple[Tuple[Union[None, str, float], ...], Dict[str, Any]]], List[Tuple[Dict[str, Union[str, float, None]], Dict[str, Any]]]]][source]
to_dataframe(grouped_by: Optional[str] = None, normalize_children: bool = True, with_single_bucket_groups: bool = False) → pd.DataFrame[source]
to_normalized() → pandagg.response.NormalizedBucketDict[source]
to_tabular(*, index_orient: bool = True, grouped_by: Optional[str] = None, expand_columns: bool = True, expand_sep: str = '|', normalize: bool = True, with_single_bucket_groups: bool = False) → Tuple[List[str], Union[Dict[Tuple[Union[None, str, float], ...], Dict[str, Any]], List[Dict[str, Any]]]][source]

Build tabular view of ES response grouping levels (rows) until ‘grouped_by’ aggregation node included is reached, and using children aggregations of grouping level as values for each of generated groups (columns).

Suppose an aggregation of this shape (A & B bucket aggregations):

A──> B──> C1
     ├──> C2
     └──> C3

With grouped_by=’B’, breakdown ElasticSearch response (tree structure), into a tabular structure of this shape:

                      C1     C2    C3
A           B
wood        blue      10     4     0
            red       7      5     2
steel       blue      1      9     0
            red       23     4     2
Parameters:
  • index_orient – if True, level-key samples are returned as tuples, else in a dictionary
  • grouped_by – name of the aggregation node used as last grouping level
  • normalize – if True, normalize columns buckets
Returns:

index_names, values

class pandagg.response.Hit(data: 'HitDict', _document_class: 'Optional[DocumentMeta]')[source]

Bases: object

class pandagg.response.Hits(data: 'Optional[HitsDict]', _document_class: 'Optional[DocumentMeta]')[source]

Bases: object

hits
max_score
to_dataframe(expand_source: bool = True, source_only: bool = True) → pd.DataFrame[source]

Return hits as pandas dataframe. Requires pandas dependency. :param expand_source: if True, _source sub-fields are expanded as columns :param source_only: if True, doesn’t include hit metadata (except id which is used as dataframe index)

total
class pandagg.response.NormalizedBucketDict[source]

Bases: dict

class pandagg.response.SearchResponse(data: 'SearchResponseDict', _search: 'Search')[source]

Bases: object

aggregations
hits
profile
success
timed_out
took

pandagg.search module

class pandagg.search.MultiSearch(using: Optional[elasticsearch.client.Elasticsearch], index: Union[str, Tuple[str], List[str], None] = None)[source]

Bases: pandagg.search.Request

Combine multiple Search objects into a single request.

add(search: pandagg.search.Search) → MultiSearch[source]

Adds a new Search object to the request:

ms = MultiSearch(index='my-index')
ms = ms.add(Search(doc_type=Category).filter('term', category='python'))
ms = ms.add(Search(doc_type=Blog))
execute() → List[pandagg.types.SearchResponseDict][source]

Execute the multi search request and return a list of search results.

to_dict() → List[Union[Dict[KT, VT], pandagg.types.SearchDict]][source]
class pandagg.search.Request(using: Optional[elasticsearch.client.Elasticsearch], index: Union[str, Tuple[str], List[str], None] = None)[source]

Bases: object

index(*index) → T[source]

Set the index for the search. If called empty it will remove all information.

Example:

s = Search() s = s.index(‘twitter-2015.01.01’, ‘twitter-2015.01.02’) s = s.index([‘twitter-2015.01.01’, ‘twitter-2015.01.02’])
params(**kwargs) → T[source]

Specify query params to be used when executing the search. All the keyword arguments will override the current values. See https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search for all available parameters.

Example:

s = Search()
s = s.params(routing='user-1', preference='local')
using(client: elasticsearch.client.Elasticsearch) → T[source]

Associate the search request with an elasticsearch client. A fresh copy will be returned with current instance remaining unchanged.

Parameters:client – an instance of elasticsearch.Elasticsearch to use or an alias to look up in elasticsearch_dsl.connections
class pandagg.search.Search(using: Optional[Elasticsearch] = None, index: Optional[Union[str, Tuple[str], List[str]]] = None, mappings: Optional[Union[MappingsDict, Mappings]] = None, nested_autocorrect: bool = False, repr_auto_execute: bool = False, document_class: DocumentMeta = None)[source]

Bases: pandagg.utils.DSLMixin, pandagg.search.Request

agg(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → Search[source]

Insert provided agg clause in copy of initial Aggs.

Accept following syntaxes for type_or_agg argument:

string, with body provided in kwargs >>> Aggs().agg(name=’some_agg’, type_or_agg=’terms’, field=’some_field’)

python dict format: >>> Aggs().agg(name=’some_agg’, type_or_agg={‘terms’: {‘field’: ‘some_field’})

AggClause instance: >>> from pandagg.aggs import Terms >>> Aggs().agg(name=’some_agg’, type_or_agg=Terms(field=’some_field’))

Parameters:
  • name – inserted agg clause name
  • type_or_agg – either agg type (str), or agg clause of dict format, or AggClause instance
  • insert_below – name of aggregation below which provided aggs should be inserted
  • at_root – if True, aggregation is inserted at root
  • body – aggregation clause body when providing string type_of_agg (remaining kwargs)
Returns:

copy of initial Aggs with provided agg inserted

aggs(aggs: Union[Dict[str, Union[Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause]], Aggs], insert_below: Optional[str] = None, at_root: bool = False) → Search[source]

Insert provided aggs in copy of initial Aggs.

Accept following syntaxes for provided aggs:

python dict format: >>> Aggs().aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}})

Aggs instance: >>> Aggs().aggs(Aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}}))

dict with Agg clauses values: >>> from pandagg.aggs import Terms, Avg >>> Aggs().aggs({‘some_agg’: Terms(field=’some_field’), ‘other_agg’: Avg(field=’age’)})

Parameters:
  • aggs – aggregations to insert into existing aggregation
  • insert_below – name of aggregation below which provided aggs should be inserted
  • at_root – if True, aggregation is inserted at root
Returns:

copy of initial Aggs with provided aggs inserted

bool(must: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, should: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, must_not: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, filter: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → Search[source]
>>> Query().bool(must={"term": {"some_field": "yolo"}})
count() → int[source]

Return the number of hits matching the query and filters. Note that only the actual number is returned.

delete() executes the query by delegating to delete_by_query()[source]
exclude(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]

Must not wrapped in filter context.

execute() → pandagg.response.SearchResponse[source]

Execute the search and return an instance of Response wrapping all the data.

filter(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]
classmethod from_dict(d: Dict[KT, VT]) → Search[source]

Construct a new Search instance from a raw dict containing the search body. Useful when migrating from raw dictionaries.

Example:

s = Search.from_dict({
    "query": {
        "bool": {
            "must": [...]
        }
    },
    "aggs": {...}
})
s = s.filter('term', published=True)
groupby(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → Search[source]

Insert provided aggregation clause in copy of initial Aggs.

Given the initial aggregation:

A──> B
└──> C

If insert_below = ‘A’:

A──> new──> B
       └──> C
>>> Aggs().groupby('per_user_id', 'terms', field='user_id')
{"per_user_id":{"terms":{"field":"user_id"}}}
>>> Aggs().groupby('per_user_id', {'terms': {"field": "user_id"}})
{"per_user_id":{"terms":{"field":"user_id"}}}
>>> from pandagg.aggs import Terms
>>> Aggs().groupby('per_user_id', Terms(field="user_id"))
{"per_user_id":{"terms":{"field":"user_id"}}}
Return type:pandagg.aggs.Aggs
highlight(*fields, **kwargs) → Search[source]

Request highlighting of some fields. All keyword arguments passed in will be used as parameters for all the fields in the fields parameter. Example:

Search().highlight('title', 'body', fragment_size=50)

will produce the equivalent of:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 50},
            "title": {"fragment_size": 50}
        }
    }
}

If you want to have different options for different fields you can call highlight twice:

Search().highlight('title', fragment_size=50).highlight('body', fragment_size=100)

which will produce:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 100},
            "title": {"fragment_size": 50}
        }
    }
}
highlight_options(**kwargs) → Search[source]

Update the global highlighting options used for this request. For example:

s = Search()
s = s.highlight_options(order='score')
must(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]

Create copy of initial Query and insert provided clause under “bool” query “must”.

>>> Query().must('term', some_field=1)
>>> Query().must({'term': {'some_field': 1}})
>>> from pandagg.query import Term
>>> Query().must(Term(some_field=1))
Keyword Arguments:
 
  • insert_below (str) – named query clause under which the inserted clauses should be placed.
  • compound_param (str) – param under which inserted clause will be placed in compound query
  • on (str) – named compound query clause on which the inserted compound clause should be merged.
  • mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
    • ‘add’ (default) : adds new clauses keeping initial ones
    • ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
    • ‘replace_all’ : existing compound clause is completely replaced by the new one
must_not(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]
post_filter(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', compound_param: Optional[str] = None, **body) → Search[source]
query(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', compound_param: Optional[str] = None, **body) → Search[source]

Insert provided clause in copy of initial Query.

>>> from pandagg.query import Query
>>> Query().query('term', some_field=23)
{'term': {'some_field': 23}}
>>> from pandagg.query import Term
>>> Query()\
>>> .query({'term': {'some_field': 23})\
>>> .query(Term(other_field=24))\
{'bool': {'must': [{'term': {'some_field': 23}}, {'term': {'other_field': 24}}]}}
Keyword Arguments:
 
  • insert_below (str) – named query clause under which the inserted clauses should be placed.
  • compound_param (str) – param under which inserted clause will be placed in compound query
  • on (str) – named compound query clause on which the inserted compound clause should be merged.
  • mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
    • ‘add’ (default) : adds new clauses keeping initial ones
    • ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
    • ‘replace_all’ : existing compound clause is completely replaced by the new one
scan() → Iterator[pandagg.response.Hit][source]

Turn the search into a scan search and return a generator that will iterate over all the documents matching the query.

Use params method to specify any additional arguments you with to pass to the underlying scan helper from elasticsearch-py - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan

scan_composite_agg(size: int) → Iterator[Dict[str, Any]][source]

Iterate over the whole aggregation composed buckets, yields buckets.

scan_composite_agg_at_once(size: int) → pandagg.response.Aggregations[source]

Iterate over the whole aggregation composed buckets (converting Aggs into composite agg if possible), and return all buckets at once in a Aggregations instance.

script_fields(**kwargs) → Search[source]

Define script fields to be calculated on hits. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html for more details.

Example:

s = Search()
s = s.script_fields(times_two="doc['field'].value * 2")
s = s.script_fields(
    times_three={
        'script': {
            'inline': "doc['field'].value * params.n",
            'params': {'n': 3}
        }
    }
)
should(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]
size(size: int) → Search[source]

Equivalent to:

s = Search().params(size=size)
sort(*keys) → Search[source]

Add sorting information to the search request. If called without arguments it will remove all sort requirements. Otherwise it will replace them. Acceptable arguments are:

'some.field'
'-some.other.field'
{'different.field': {'any': 'dict'}}

so for example:

s = Search().sort(
    'category',
    '-title',
    {"price" : {"order" : "asc", "mode" : "avg"}}
)

will sort by category, title (in descending order) and price in ascending order using the avg mode.

The API returns a copy of the Search object and can thus be chained.

source(fields: Union[str, List[str], Dict[str, Any], None] = None, **kwargs) → Search[source]

Selectively control how the _source field is returned.

Parameters:fields – wildcard string, array of wildcards, or dictionary of includes and excludes

If fields is None, the entire document will be returned for each hit. If fields is a dictionary with keys of ‘includes’ and/or ‘excludes’ the fields will be either included or excluded appropriately.

Calling this multiple times with the same named parameter will override the previous values with the new ones.

Example:

s = Search()
s = s.source(includes=['obj1.*'], excludes=["*.description"])

s = Search()
s = s.source(includes=['obj1.*']).source(excludes=["*.description"])
suggest(name: str, text: str, **kwargs) → Search[source]

Add a suggestions request to the search.

Parameters:
  • name – name of the suggestion
  • text – text to suggest on

All keyword arguments will be added to the suggestions body. For example:

s = Search()
s = s.suggest('suggestion-1', 'Elasticsearch', term={'field': 'body'})
to_dict(count: bool = False, **kwargs) → pandagg.types.SearchDict[source]

Serialize the search into the dictionary that will be sent over as the request’s body.

Parameters:count – a flag to specify if we are interested in a body for count - no aggregations, no pagination bounds etc.

All additional keyword arguments will be included into the dictionary.

update_from_dict(d: Dict[KT, VT]) → Search[source]

Apply options from a serialized body to the current instance. Modifies the object in-place. Used mostly by from_dict.

pandagg.types module

class pandagg.types.Action[source]

Bases: dict

class pandagg.types.AliasValue[source]

Bases: dict

class pandagg.types.BucketsWrapperDict[source]

Bases: dict

class pandagg.types.DeleteByQueryResponse[source]

Bases: dict

class pandagg.types.FieldDict[source]

Bases: dict

class pandagg.types.HitDict[source]

Bases: dict

class pandagg.types.HitsDict[source]

Bases: dict

class pandagg.types.MappingsDict[source]

Bases: dict

class pandagg.types.PointInTimeDict[source]

Bases: dict

class pandagg.types.ProfileDict[source]

Bases: dict

class pandagg.types.ProfileShardDict[source]

Bases: dict

class pandagg.types.RangeDict

Bases: dict

class pandagg.types.RetriesDict[source]

Bases: dict

class pandagg.types.RunTimeMappingDict[source]

Bases: dict

class pandagg.types.Script[source]

Bases: dict

class pandagg.types.SearchDict

Bases: dict

class pandagg.types.SearchResponseDict[source]

Bases: dict

class pandagg.types.ShardsDict[source]

Bases: dict

class pandagg.types.SourceIncludeDict[source]

Bases: dict

class pandagg.types.SuggestedItemDict[source]

Bases: dict

class pandagg.types.TotalDict[source]

Bases: dict

pandagg.utils module

class pandagg.utils.DSLMixin[source]

Bases: object

Base class for all DSL objects - queries, filters, aggregations etc. Wraps a dictionary representing the object’s json.

classmethod get_dsl_class(name: str) → pandagg.utils.DslMeta[source]
static get_dsl_type(name: str) → pandagg.utils.DslMeta[source]
class pandagg.utils.DslMeta(name: str, bases: Tuple, attrs: Dict[KT, VT])[source]

Bases: type

Base Metaclass for DslBase subclasses that builds a registry of all classes for given DslBase subclass (== all the query types for the Query subclass of DslBase).

Types will be: ‘agg’, ‘query’, ‘field’

Each of those types will hold a _classes dictionary pointing to all classes of same type.

KEY = ''
pandagg.utils.equal_queries(d1: Any, d2: Any) → bool[source]

Compares if two queries are equivalent (do not consider nested list orders).

pandagg.utils.get_action_modifier(index_name: str, _op_type_overwrite: Optional[typing_extensions.Literal['create', 'index', 'update', 'delete'][create, index, update, delete]] = None) → Callable[source]
pandagg.utils.is_subset(subset: Any, superset: Any) → bool[source]
pandagg.utils.ordered(obj: Any) → Any[source]

Module contents

Contributing to Pandagg

We want to make contributing to this project as easy and transparent as possible.

Our Development Process

We use github to host code, to track issues and feature requests, as well as accept pull requests.

Pull Requests

We actively welcome your pull requests.

  1. Fork the repo and create your branch from master.
  2. If you’ve added code that should be tested, add tests.
  3. If you’ve changed APIs, update the documentation.
  4. Ensure the test suite passes.
  5. Make sure your code lints.

Any contributions you make will be under the MIT Software License

In short, when you submit code changes, your submissions are understood to be under the same MIT License that covers the project. Feel free to contact the maintainers if that’s a concern.

Issues

We use GitHub issues to track public bugs. Please ensure your description is clear and has sufficient instructions to be able to reproduce the issue.

Report bugs using Github’s issues

We use GitHub issues to track public bugs. Report a bug by opening a new issue; it’s that easy!

Write bug reports with detail, background, and sample code

Great Bug Reports tend to have:

  • A quick summary and/or background
  • Steps to reproduce
    • Be specific!
    • Give sample code if you can.
  • What you expected would happen
  • What actually happens
  • Notes (possibly including why you think this might be happening, or stuff you tried that didn’t work)

License

By contributing, you agree that your contributions will be licensed under its MIT License.

References

This document was adapted from the open-source contribution guidelines of briandk’s gist

pandagg is a Python package providing a simple interface to manipulate ElasticSearch queries and aggregations. It brings the following features:

  • flexible aggregation and search queries declaration
  • query validation based on provided mapping
  • parsing of aggregation results in handy format: interactive bucket tree, normalized tree or tabular breakdown
  • mapping interactive navigation

Installing

pandagg can be installed with pip:

$ pip install pandagg

Alternatively, you can grab the latest source code from GitHub:

$ git clone git://github.com/alkemics/pandagg.git
$ python setup.py install

Usage

The User Guide is the place to go to learn how to use the library.

An example based on publicly available IMDB data is documented in repository examples/imdb directory, with a jupyter notebook to showcase some of pandagg functionalities: here it is.

The pandagg package documentation provides API-level documentation.

License

pandagg is made available under the Apache 2.0 License. For more details, see LICENSE.txt.

Contributing

We happily welcome contributions, please see Contributing to Pandagg for details.