Query

The Query class provides :

  • multiple syntaxes to declare and udpate a query
  • query validation (with nested clauses validation)
  • ability to insert clauses at specific points
  • tree-like visual representation

Declaration

From native “dict” query

Given the following query:

>>> expected_query = {'bool': {'must': [
>>>    {'terms': {'genres': ['Action', 'Thriller']}},
>>>    {'range': {'rank': {'gte': 7}}},
>>>    {'nested': {
>>>        'path': 'roles',
>>>        'query': {'bool': {'must': [
>>>            {'term': {'roles.gender': {'value': 'F'}}},
>>>            {'term': {'roles.role': {'value': 'Reporter'}}}]}
>>>         }
>>>    }}
>>> ]}}

To instantiate Query, simply pass “dict” query as argument:

>>> from pandagg.query import Query
>>> q = Query(expected_query)

A visual representation of the query is available with show():

>>> q.show()
<Query>
bool
└── must
    ├── nested, path="roles"
    │   └── query
    │       └── bool
    │           └── must
    │               ├── term, field=roles.gender, value="F"
    │               └── term, field=roles.role, value="Reporter"
    ├── range, field=rank, gte=7
    └── terms, genres=["Action", "Thriller"]

Call to_dict() to convert it to native dict:

>>> q.to_dict()
{'bool': {
    'must': [
        {'range': {'rank': {'gte': 7}}},
        {'terms': {'genres': ['Action', 'Thriller']}},
        {'bool': {'must': [
            {'term': {'roles.role': {'value': 'Reporter'}}},
            {'term': {'roles.gender': {'value': 'F'}}}]}}}}
        ]}
    ]
}}
>>> from pandagg.utils import equal_queries
>>> equal_queries(q.to_dict(), expected_query)
True

Note

equal_queries function won’t consider order of clauses in must/should parameters since it actually doesn’t matter in Elasticsearch execution, ie

>>> equal_queries({'must': [A, B]}, {'must': [B, A]})
True

With DSL classes

Pandagg provides a DSL to declare this query in a quite similar fashion:

>>> from pandagg.query import Nested, Bool, Range, Term, Terms
>>> q = Bool(must=[
>>>     Terms(genres=['Action', 'Thriller']),
>>>     Range(rank={"gte": 7}),
>>>     Nested(
>>>         path='roles',
>>>         query=Bool(must=[
>>>             Term(roles__gender='F'),
>>>             Term(roles__role='Reporter')
>>>         ])
>>>     )
>>> ])

All these classes inherit from Query and thus provide the same interface.

>>> from pandagg.query import Query
>>> isinstance(q, Query)
True

With flattened syntax

In the flattened syntax, the query clause type is used as first argument:

>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])

Query enrichment

All methods described below return a new Query instance, and keep unchanged the initial query.

For instance:

>>> from pandagg.query import Query
>>> initial_q = Query()
>>> enriched_q = initial_q.query('terms', genres=['Comedy', 'Short'])
>>> initial_q.to_dict()
None
>>> enriched_q.to_dict()
{'terms': {'genres': ['Comedy', 'Short']}}

Note

Calling to_dict() on an empty Query returns None

>>> from pandagg.query import Query
>>> Query().to_dict()
None

query() method

The base method to enrich a Query is query().

Considering this query:

>>> from pandagg.query import Query
>>> q = Query()

query() accepts following syntaxes:

from dictionnary:

>>> q.query({"terms": {"genres": ['Comedy', 'Short']})

flattened syntax:

>>> q.query("terms", genres=['Comedy', 'Short'])

from Query instance (this includes DSL classes):

>>> from pandagg.query import Terms
>>> q.query(Terms(genres=['Action', 'Thriller']))

Compound clauses specific methods

Query instance also exposes following methods for specific compound queries:

(TODO: detail allowed syntaxes)

Specific to bool queries:

  • bool()
  • filter()
  • must()
  • must_not()
  • should()

Specific to other compound queries:

  • nested()
  • constant_score()
  • dis_max()
  • function_score()
  • has_child()
  • has_parent()
  • parent_id()
  • pinned_query()
  • script_score()
  • boost()

Inserted clause location

On all insertion methods detailed above, by default, the inserted clause is placed at the top level of your query, and generates a bool clause if necessary.

Considering the following query:

>>> from pandagg.query import Query
>>> q = Query('terms', genres=['Action', 'Thriller'])
>>> q.show()
<Query>
terms, genres=["Action", "Thriller"]

A bool query will be created:

>>> q = q.query('range', rank={"gte": 7})
>>> q.show()
<Query>
bool
└── must
    ├── range, field=rank, gte=7
    └── terms, genres=["Action", "Thriller"]

And reused if necessary:

>>> q = q.must_not('range', year={"lte": 1970})
>>> q.show()
<Query>
bool
├── must
│   ├── range, field=rank, gte=7
│   └── terms, genres=["Action", "Thriller"]
└── must_not
    └── range, field=year, lte=1970

Specifying a specific location requires to name queries :

>>> from pandagg.query import Nested
>>> q = q.nested(path='roles', _name='nested_roles', query=Term('roles.gender', value='F'))
>>> q.show()
<Query>
bool
├── must
│   ├── nested, _name=nested_roles, path="roles"
│   │   └── query
│   │       └── term, field=roles.gender, value="F"
│   ├── range, field=rank, gte=7
│   └── terms, genres=["Action", "Thriller"]
└── must_not
    └── range, field=year, lte=1970

Doing so allows to insert clauses above/below given clause using parent/child parameters:

>>> q = q.query('term', roles__role='Reporter', parent='nested_roles')
>>> q.show()
<Query>
bool
├── must
│   ├── nested, _name=nested_roles, path="roles"
│   │   └── query
│   │       └── bool
│   │           └── must
│   │               ├── term, field=roles.role, value="Reporter"
│   │               └── term, field=roles.gender, value="F"
│   ├── range, field=rank, gte=7
│   └── terms, genres=["Action", "Thriller"]
└── must_not
    └── range, field=year, lte=1970

TODO: explain parent_param, child_param, mode merging strategies on same named clause etc..