Query¶
The Query
class provides :
- multiple syntaxes to declare and udpate a query
- query validation (with nested clauses validation)
- ability to insert clauses at specific points
- tree-like visual representation
Declaration¶
From native “dict” query¶
Given the following query:
>>> expected_query = {'bool': {'must': [
>>> {'terms': {'genres': ['Action', 'Thriller']}},
>>> {'range': {'rank': {'gte': 7}}},
>>> {'nested': {
>>> 'path': 'roles',
>>> 'query': {'bool': {'must': [
>>> {'term': {'roles.gender': {'value': 'F'}}},
>>> {'term': {'roles.role': {'value': 'Reporter'}}}]}
>>> }
>>> }}
>>> ]}}
To instantiate Query
, simply pass “dict” query as argument:
>>> from pandagg.query import Query
>>> q = Query(expected_query)
A visual representation of the query is available with show()
:
>>> q.show()
<Query>
bool
└── must
├── nested, path="roles"
│ └── query
│ └── bool
│ └── must
│ ├── term, field=roles.gender, value="F"
│ └── term, field=roles.role, value="Reporter"
├── range, field=rank, gte=7
└── terms, genres=["Action", "Thriller"]
Call to_dict()
to convert it to native dict:
>>> q.to_dict()
{'bool': {
'must': [
{'range': {'rank': {'gte': 7}}},
{'terms': {'genres': ['Action', 'Thriller']}},
{'bool': {'must': [
{'term': {'roles.role': {'value': 'Reporter'}}},
{'term': {'roles.gender': {'value': 'F'}}}]}}}}
]}
]
}}
>>> from pandagg.utils import equal_queries
>>> equal_queries(q.to_dict(), expected_query)
True
Note
equal_queries function won’t consider order of clauses in must/should parameters since it actually doesn’t matter in Elasticsearch execution, ie
>>> equal_queries({'must': [A, B]}, {'must': [B, A]})
True
With DSL classes¶
Pandagg provides a DSL to declare this query in a quite similar fashion:
>>> from pandagg.query import Nested, Bool, Range, Term, Terms
>>> q = Bool(must=[
>>> Terms(genres=['Action', 'Thriller']),
>>> Range(rank={"gte": 7}),
>>> Nested(
>>> path='roles',
>>> query=Bool(must=[
>>> Term(roles__gender='F'),
>>> Term(roles__role='Reporter')
>>> ])
>>> )
>>> ])
All these classes inherit from Query
and thus provide the same interface.
>>> from pandagg.query import Query
>>> isinstance(q, Query)
True
With flattened syntax¶
In the flattened syntax, the query clause type is used as first argument:
>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])
Query enrichment¶
All methods described below return a new Query
instance, and keep unchanged the
initial query.
For instance:
>>> from pandagg.query import Query
>>> initial_q = Query()
>>> enriched_q = initial_q.query('terms', genres=['Comedy', 'Short'])
>>> initial_q.to_dict()
None
>>> enriched_q.to_dict()
{'terms': {'genres': ['Comedy', 'Short']}}
Note
Calling to_dict()
on an empty Query returns None
>>> from pandagg.query import Query
>>> Query().to_dict()
None
query() method¶
The base method to enrich a Query
is query()
.
Considering this query:
>>> from pandagg.query import Query
>>> q = Query()
query()
accepts following syntaxes:
from dictionnary:
>>> q.query({"terms": {"genres": ['Comedy', 'Short']})
flattened syntax:
>>> q.query("terms", genres=['Comedy', 'Short'])
from Query instance (this includes DSL classes):
>>> from pandagg.query import Terms
>>> q.query(Terms(genres=['Action', 'Thriller']))
Compound clauses specific methods¶
Query
instance also exposes following methods for specific compound queries:
(TODO: detail allowed syntaxes)
Specific to bool queries:
bool()
filter()
must()
must_not()
should()
Specific to other compound queries:
nested()
constant_score()
dis_max()
function_score()
has_child()
has_parent()
parent_id()
pinned_query()
script_score()
boost()
Inserted clause location¶
On all insertion methods detailed above, by default, the inserted clause is placed at the top level of your query, and generates a bool clause if necessary.
Considering the following query:
>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])
>>> q.show()
<Query>
terms, genres=["Action", "Thriller"]
A bool query will be created:
>>> q = q.query('range', rank={"gte": 7})
>>> q.show()
<Query>
bool
└── must
├── range, field=rank, gte=7
└── terms, genres=["Action", "Thriller"]
And reused if necessary:
>>> q = q.must_not('range', year={"lte": 1970})
>>> q.show()
<Query>
bool
├── must
│ ├── range, field=rank, gte=7
│ └── terms, genres=["Action", "Thriller"]
└── must_not
└── range, field=year, lte=1970
Specifying a specific location requires to name queries :
>>> from pandagg.query import Nested
>>> q = q.nested(path='roles', _name='nested_roles', query=Term('roles.gender', value='F'))
>>> q.show()
<Query>
bool
├── must
│ ├── nested, _name=nested_roles, path="roles"
│ │ └── query
│ │ └── term, field=roles.gender, value="F"
│ ├── range, field=rank, gte=7
│ └── terms, genres=["Action", "Thriller"]
└── must_not
└── range, field=year, lte=1970
Doing so allows to insert clauses above/below given clause using parent/child parameters:
>>> q = q.query('term', roles__role='Reporter', parent='nested_roles')
>>> q.show()
<Query>
bool
├── must
│ ├── nested, _name=nested_roles, path="roles"
│ │ └── query
│ │ └── bool
│ │ └── must
│ │ ├── term, field=roles.role, value="Reporter"
│ │ └── term, field=roles.gender, value="F"
│ ├── range, field=rank, gte=7
│ └── terms, genres=["Action", "Thriller"]
└── must_not
└── range, field=year, lte=1970
TODO: explain parent_param, child_param, mode merging strategies on same named clause etc..