pandagg¶
User Guide¶
Introduction¶
Note
This is a work in progress. Some sections still need to be furnished.
About tree structure. About interactive objects.
Build Search query¶
The Query
class allows multiple ways to declare and udpate an Elasticsearch query.
Let’s explore the multiple ways we have to declare the following query:
>>> expected_query = {'bool': {'must': [
>>> {'terms': {'genres': ['Action', 'Thriller']}},
>>> {'range': {'rank': {'gte': 7}}},
>>> {'nested': {
>>> 'path': 'roles',
>>> 'query': {'bool': {'must': [
>>> {'term': {'roles.gender': {'value': 'F'}}},
>>> {'term': {'roles.role': {'value': 'Reporter'}}}]}
>>> }
>>> }}
>>> ]}}
Pandagg DSL¶
Pandagg provides a DSL to declare this query in a quite similar fashion:
>>> from pandagg.query import Nested, Bool, Query, Range, Term, Terms
>>> q = Query(
>>> Bool(must=[
>>> Terms('genres', terms=['Action', 'Thriller']),
>>> Range('rank', gte=7),
>>> Nested(
>>> path='roles',
>>> query=Bool(must=[
>>> Term('roles.gender', value='F'),
>>> Term('roles.role', value='Reporter')
>>> ])
>>> )
>>> ])
>>>)
The serialized query is then available with query_dict method:
>>> q.query_dict() == expected_query
True
A visual representation of the query helps to have a clearer view:
>>> q
<Query>
bool
└── must
├── nested
│ ├── path="roles"
│ └── query
│ └── bool
│ └── must
│ ├── term, field=roles.gender, value="F"
│ └── term, field=roles.role, value="Reporter"
├── range, field=rank, gte=7
└── terms, field=genres, values=['Action', 'Thriller']
Chaining¶
Another way to declare this query is through chaining:
>>> from pandagg.utils import equal_queries
>>> from pandagg.query import Nested, Bool, Query, Range, Term, Terms
>>> q = Query()\
>>> .query({'terms': {'genres': ['Action', 'Thriller']}})\
>>> .nested(path='roles', _name='nested_roles', query=Term('roles.gender', value='F'))\
>>> .query(Range('rank', gte=7))\
>>> .query(Term('roles.role', value='Reporter'), parent='nested_roles')
>>> equal_queries(q.query_dict(), expected_query)
True
Note
equal_queries function won’t consider order of clauses in must/should parameters since it actually doesn’t matter in Elasticsearch execution, ie
>>> equal_queries({'must': [A, B]}, {'must': [B, A]})
True
Regular syntax¶
Eventually, you can also use regular Elasticsearch dict syntax:
>>> q = Query(expected_query)
>>> q
<Query>
bool
└── must
├── nested
│ ├── path="roles"
│ └── query
│ └── bool
│ └── must
│ ├── term, field=roles.gender, value="F"
│ └── term, field=roles.role, value="Reporter"
├── range, field=rank, gte=7
└── terms, field=genres, values=['Action', 'Thriller']
Build Aggregation query¶
TODO
Parse Aggregation response¶
TODO
Explore your cluster indices¶
TODO
Advanced usage¶
TODO
Usage example on IMDB¶
An example based on publicly available IMDB data is documented in repository examples/imdb directory, with a jupyter notebook to showcase some of pandagg functionalities: here it is.
pandagg package¶
Subpackages¶
pandagg.node package¶
Subpackages¶
pandagg.node.query package¶
Module contents¶
Submodules¶
Module contents¶
Contributing¶
TODO
pandagg is a Python package providing a simple interface to manipulate ElasticSearch queries and aggregations. It brings the following features:
- flexible aggregation and search queries declaration
- query validation based on provided mapping
- parsing of aggregation results in handy format: interactive bucket tree, normalized tree or tabular breakdown
- mapping interactive navigation
Installing¶
pandagg can be installed with pip:
$ pip install pandagg
Alternatively, you can grab the latest source code from GitHub:
$ git clone git://github.com/alkemics/pandagg.git
$ python setup.py install
Usage¶
The User Guide is the place to go to learn how to use the library and accomplish common tasks. The more in-depth Advanced usage guide is the place to go for deeply nested queries.
An example based on publicly available IMDB data is documented in repository examples/imdb directory, with a jupyter notebook to showcase some of pandagg functionalities: here it is.
The pandagg package documentation provides API-level documentation.
License¶
pandagg is made available under the MIT License. For more details, see LICENSE.txt.
Contributing¶
We happily welcome contributions, please see Contributing for details.