pandagg

User Guide

Introduction

Note

This is a work in progress. Some sections still need to be furnished.

About tree structure. About interactive objects.

Build Search query

The Query class allows multiple ways to declare and udpate an Elasticsearch query.

Let’s explore the multiple ways we have to declare the following query:

>>> expected_query = {'bool': {'must': [
>>>    {'terms': {'genres': ['Action', 'Thriller']}},
>>>    {'range': {'rank': {'gte': 7}}},
>>>    {'nested': {
>>>        'path': 'roles',
>>>        'query': {'bool': {'must': [
>>>            {'term': {'roles.gender': {'value': 'F'}}},
>>>            {'term': {'roles.role': {'value': 'Reporter'}}}]}
>>>         }
>>>    }}
>>> ]}}

Pandagg DSL

Pandagg provides a DSL to declare this query in a quite similar fashion:

>>> from pandagg.query import Nested, Bool, Query, Range, Term, Terms
>>> q = Query(
>>>    Bool(must=[
>>>        Terms('genres', terms=['Action', 'Thriller']),
>>>        Range('rank', gte=7),
>>>        Nested(
>>>            path='roles',
>>>            query=Bool(must=[
>>>                Term('roles.gender', value='F'),
>>>                Term('roles.role', value='Reporter')
>>>            ])
>>>        )
>>>    ])
>>>)

The serialized query is then available with query_dict method:

>>> q.query_dict() == expected_query
True

A visual representation of the query helps to have a clearer view:

>>> q
<Query>
bool
└── must
    ├── nested
    │   ├── path="roles"
    │   └── query
    │       └── bool
    │           └── must
    │               ├── term, field=roles.gender, value="F"
    │               └── term, field=roles.role, value="Reporter"
    ├── range, field=rank, gte=7
    └── terms, field=genres, values=['Action', 'Thriller']

Chaining

Another way to declare this query is through chaining:

>>> from pandagg.utils import equal_queries
>>> from pandagg.query import Nested, Bool, Query, Range, Term, Terms
>>> q = Query()\
>>>     .query({'terms': {'genres': ['Action', 'Thriller']}})\
>>>     .nested(path='roles', _name='nested_roles', query=Term('roles.gender', value='F'))\
>>>     .query(Range('rank', gte=7))\
>>>     .query(Term('roles.role', value='Reporter'), parent='nested_roles')
>>> equal_queries(q.query_dict(), expected_query)
True

Note

equal_queries function won’t consider order of clauses in must/should parameters since it actually doesn’t matter in Elasticsearch execution, ie

>>> equal_queries({'must': [A, B]}, {'must': [B, A]})
True

Regular syntax

Eventually, you can also use regular Elasticsearch dict syntax:

>>> q = Query(expected_query)
>>> q
<Query>
bool
└── must
    ├── nested
    │   ├── path="roles"
    │   └── query
    │       └── bool
    │           └── must
    │               ├── term, field=roles.gender, value="F"
    │               └── term, field=roles.role, value="Reporter"
    ├── range, field=rank, gte=7
    └── terms, field=genres, values=['Action', 'Thriller']

Build Aggregation query

TODO

Parse Aggregation response

TODO

Explore your cluster indices

TODO

Advanced usage

TODO

Usage example on IMDB

An example based on publicly available IMDB data is documented in repository examples/imdb directory, with a jupyter notebook to showcase some of pandagg functionalities: here it is.

pandagg package

Subpackages

pandagg.interactive package

Submodules
pandagg.interactive.abstract module
pandagg.interactive.client module
pandagg.interactive.index module
pandagg.interactive.mapping module
pandagg.interactive.response module
Module contents

pandagg.node package

Subpackages
pandagg.node.agg package
Submodules
pandagg.node.agg.abstract module
pandagg.node.agg.bucket module
pandagg.node.agg.deserializer module
pandagg.node.agg.metric module
pandagg.node.agg.pipeline module
Module contents
pandagg.node.mapping package
Submodules
pandagg.node.mapping.abstract module
pandagg.node.mapping.deserializer module
pandagg.node.mapping.field_datatypes module
pandagg.node.mapping.meta_fields module
Module contents
pandagg.node.query package
Submodules
pandagg.node.query.abstract module
pandagg.node.query.compound module
pandagg.node.query.deserializer module
pandagg.node.query.full_text module
pandagg.node.query.geo module
pandagg.node.query.joining module
pandagg.node.query.shape module
pandagg.node.query.span module
pandagg.node.query.specialized module
pandagg.node.query.specialized_compound module
pandagg.node.query.term_level module
Module contents
pandagg.node.response package
Submodules
pandagg.node.response.bucket module
Module contents
Submodules
pandagg.node.mixins module
pandagg.node.types module
Module contents

pandagg.tree package

Submodules
pandagg.tree.agg module
pandagg.tree.mapping module
pandagg.tree.query module
pandagg.tree.response module
Module contents

Submodules

pandagg.agg module

pandagg.client module

pandagg.exceptions module

pandagg.mapping module

pandagg.query module

pandagg.utils module

Module contents

Contributing

TODO

pandagg is a Python package providing a simple interface to manipulate ElasticSearch queries and aggregations. It brings the following features:

  • flexible aggregation and search queries declaration
  • query validation based on provided mapping
  • parsing of aggregation results in handy format: interactive bucket tree, normalized tree or tabular breakdown
  • mapping interactive navigation

Installing

pandagg can be installed with pip:

$ pip install pandagg

Alternatively, you can grab the latest source code from GitHub:

$ git clone git://github.com/alkemics/pandagg.git
$ python setup.py install

Usage

The User Guide is the place to go to learn how to use the library and accomplish common tasks. The more in-depth Advanced usage guide is the place to go for deeply nested queries.

An example based on publicly available IMDB data is documented in repository examples/imdb directory, with a jupyter notebook to showcase some of pandagg functionalities: here it is.

The pandagg package documentation provides API-level documentation.

License

pandagg is made available under the MIT License. For more details, see LICENSE.txt.

Contributing

We happily welcome contributions, please see Contributing for details.