pandagg.search module¶

class pandagg.search.MultiSearch(using: Optional[elasticsearch.client.Elasticsearch], index: Union[str, Tuple[str], List[str], None] = None)[source]¶

Bases: pandagg.search.Request

Combine multiple Search objects into a single request.

add(search: pandagg.search.Search) → MultiSearch[source]¶

Adds a new Search object to the request:

ms = MultiSearch(index='my-index')
ms = ms.add(Search(doc_type=Category).filter('term', category='python'))
ms = ms.add(Search(doc_type=Blog))

execute() → List[pandagg.types.SearchResponseDict][source]¶: Execute the multi search request and return a list of search results.

to_dict() → List[Union[Dict[KT, VT], pandagg.types.SearchDict]][source]¶

class pandagg.search.Request(using: Optional[elasticsearch.client.Elasticsearch], index: Union[str, Tuple[str], List[str], None] = None)[source]¶

Bases: object

index(*index) → T[source]¶

Set the index for the search. If called empty it will remove all information.

Example:

s = Search() s = s.index(‘twitter-2015.01.01’, ‘twitter-2015.01.02’) s = s.index([‘twitter-2015.01.01’, ‘twitter-2015.01.02’])

params(**kwargs) → T[source]¶

Specify query params to be used when executing the search. All the keyword arguments will override the current values. See https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search for all available parameters.

Example:

s = Search()
s = s.params(routing='user-1', preference='local')

using(client: elasticsearch.client.Elasticsearch) → T[source]¶

Associate the search request with an elasticsearch client. A fresh copy will be returned with current instance remaining unchanged.

Parameters:	client – an instance of `elasticsearch.Elasticsearch` to use or an alias to look up in `elasticsearch_dsl.connections`

class pandagg.search.Search(using: Optional[Elasticsearch] = None, index: Optional[Union[str, Tuple[str], List[str]]] = None, mappings: Optional[Union[MappingsDict, Mappings]] = None, nested_autocorrect: bool = False, repr_auto_execute: bool = False, document_class: DocumentMeta = None)[source]¶

Bases: pandagg.utils.DSLMixin, pandagg.search.Request

agg(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → Search[source]¶

Insert provided agg clause in copy of initial Aggs.

Accept following syntaxes for type_or_agg argument:

string, with body provided in kwargs >>> Aggs().agg(name=’some_agg’, type_or_agg=’terms’, field=’some_field’)

python dict format: >>> Aggs().agg(name=’some_agg’, type_or_agg={‘terms’: {‘field’: ‘some_field’})

AggClause instance: >>> from pandagg.aggs import Terms >>> Aggs().agg(name=’some_agg’, type_or_agg=Terms(field=’some_field’))

Parameters:	name – inserted agg clause name type_or_agg – either agg type (str), or agg clause of dict format, or AggClause instance insert_below – name of aggregation below which provided aggs should be inserted at_root – if True, aggregation is inserted at root body – aggregation clause body when providing string type_of_agg (remaining kwargs)
Returns:	copy of initial Aggs with provided agg inserted

aggs(aggs: Union[Dict[str, Union[Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause]], Aggs], insert_below: Optional[str] = None, at_root: bool = False) → Search[source]¶

Insert provided aggs in copy of initial Aggs.

Accept following syntaxes for provided aggs:

python dict format: >>> Aggs().aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}})

Aggs instance: >>> Aggs().aggs(Aggs({‘some_agg’: {‘terms’: {‘field’: ‘some_field’}}, ‘other_agg’: {‘avg’: {‘field’: ‘age’}}}))

dict with Agg clauses values: >>> from pandagg.aggs import Terms, Avg >>> Aggs().aggs({‘some_agg’: Terms(field=’some_field’), ‘other_agg’: Avg(field=’age’)})

Parameters:	aggs – aggregations to insert into existing aggregation insert_below – name of aggregation below which provided aggs should be inserted at_root – if True, aggregation is inserted at root
Returns:	copy of initial Aggs with provided aggs inserted

bool(must: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, should: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, must_not: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, filter: Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, List[Union[Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause]], None] = None, insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', **body) → Search[source]¶

>>> Query().bool(must={"term": {"some_field": "yolo"}})

count() → int[source]¶: Return the number of hits matching the query and filters. Note that only the actual number is returned.

delete() executes the query by delegating to delete_by_query()[source]¶

exclude(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]¶: Must not wrapped in filter context.

execute() → pandagg.response.SearchResponse[source]¶: Execute the search and return an instance of Response wrapping all the data.

filter(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]¶

classmethod from_dict(d: Dict[KT, VT]) → Search[source]¶

Construct a new Search instance from a raw dict containing the search body. Useful when migrating from raw dictionaries.

Example:

s = Search.from_dict({
    "query": {
        "bool": {
            "must": [...]
        }
    },
    "aggs": {...}
})
s = s.filter('term', published=True)

groupby(name: str, type_or_agg: Union[str, Dict[str, Dict[str, Any]], pandagg.node.aggs.abstract.AggClause, None] = None, insert_below: Optional[str] = None, at_root: bool = False, **body) → Search[source]¶

Insert provided aggregation clause in copy of initial Aggs.

Given the initial aggregation:

A──> B
└──> C

If insert_below = ‘A’:

A──> new──> B
       └──> C

>>> Aggs().groupby('per_user_id', 'terms', field='user_id')
{"per_user_id":{"terms":{"field":"user_id"}}}

>>> Aggs().groupby('per_user_id', {'terms': {"field": "user_id"}})
{"per_user_id":{"terms":{"field":"user_id"}}}

>>> from pandagg.aggs import Terms
>>> Aggs().groupby('per_user_id', Terms(field="user_id"))
{"per_user_id":{"terms":{"field":"user_id"}}}

Return type:	pandagg.aggs.Aggs

highlight(*fields, **kwargs) → Search[source]¶

Request highlighting of some fields. All keyword arguments passed in will be used as parameters for all the fields in the fields parameter. Example:

Search().highlight('title', 'body', fragment_size=50)

will produce the equivalent of:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 50},
            "title": {"fragment_size": 50}
        }
    }
}

If you want to have different options for different fields you can call highlight twice:

Search().highlight('title', fragment_size=50).highlight('body', fragment_size=100)

which will produce:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 100},
            "title": {"fragment_size": 50}
        }
    }
}

highlight_options(**kwargs) → Search[source]¶

Update the global highlighting options used for this request. For example:

s = Search()
s = s.highlight_options(order='score')

must(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]¶

Create copy of initial Query and insert provided clause under “bool” query “must”.
>>> Query().must('term', some_field=1)
>>> Query().must({'term': {'some_field': 1}})
>>> from pandagg.query import Term
>>> Query().must(Term(some_field=1))
Keyword Arguments:

insert_below (str) – named query clause under which the inserted clauses should be placed.
compound_param (str) – param under which inserted clause will be placed in compound query
on (str) – named compound query clause on which the inserted compound clause should be merged.
mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
- ‘add’ (default) : adds new clauses keeping initial ones
- ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
- ‘replace_all’ : existing compound clause is completely replaced by the new one

must_not(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]¶

post_filter(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', compound_param: Optional[str] = None, **body) → Search[source]¶

query(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', compound_param: Optional[str] = None, **body) → Search[source]¶

Insert provided clause in copy of initial Query.

>>> from pandagg.query import Query
>>> Query().query('term', some_field=23)
{'term': {'some_field': 23}}

>>> from pandagg.query import Term
>>> Query()\
>>> .query({'term': {'some_field': 23})\
>>> .query(Term(other_field=24))\
{'bool': {'must': [{'term': {'some_field': 23}}, {'term': {'other_field': 24}}]}}

Keyword Arguments:

insert_below (str) – named query clause under which the inserted clauses should be placed.
compound_param (str) – param under which inserted clause will be placed in compound query
on (str) – named compound query clause on which the inserted compound clause should be merged.
mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
- ‘add’ (default) : adds new clauses keeping initial ones
- ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
- ‘replace_all’ : existing compound clause is completely replaced by the new one

scan() → Iterator[pandagg.response.Hit][source]¶

Turn the search into a scan search and return a generator that will iterate over all the documents matching the query.

Use params method to specify any additional arguments you with to pass to the underlying scan helper from elasticsearch-py - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan

scan_composite_agg(size: int) → Iterator[Dict[str, Any]][source]¶: Iterate over the whole aggregation composed buckets, yields buckets.

scan_composite_agg_at_once(size: int) → pandagg.response.Aggregations[source]¶: Iterate over the whole aggregation composed buckets (converting Aggs into composite agg if possible), and return all buckets at once in a Aggregations instance.

script_fields(**kwargs) → Search[source]¶

Define script fields to be calculated on hits. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html for more details.

Example:

s = Search()
s = s.script_fields(times_two="doc['field'].value * 2")
s = s.script_fields(
    times_three={
        'script': {
            'inline': "doc['field'].value * params.n",
            'params': {'n': 3}
        }
    }
)

should(type_or_query: Union[str, Dict[str, Dict[str, Any]], pandagg.node.query.abstract.QueryClause, Query], insert_below: Optional[str] = None, on: Optional[str] = None, mode: typing_extensions.Literal['add', 'replace', 'replace_all'][add, replace, replace_all] = 'add', bool_body: Optional[Dict[str, Any]] = None, **body) → Search[source]¶

size(size: int) → Search[source]¶

Equivalent to:

s = Search().params(size=size)

sort(*keys) → Search[source]¶

Add sorting information to the search request. If called without arguments it will remove all sort requirements. Otherwise it will replace them. Acceptable arguments are:

'some.field'
'-some.other.field'
{'different.field': {'any': 'dict'}}

so for example:

s = Search().sort(
    'category',
    '-title',
    {"price" : {"order" : "asc", "mode" : "avg"}}
)

will sort by category, title (in descending order) and price in ascending order using the avg mode.

The API returns a copy of the Search object and can thus be chained.

source(fields: Union[str, List[str], Dict[str, Any], None] = None, **kwargs) → Search[source]¶

Selectively control how the _source field is returned.

Parameters:	fields – wildcard string, array of wildcards, or dictionary of includes and excludes

If fields is None, the entire document will be returned for each hit. If fields is a dictionary with keys of ‘includes’ and/or ‘excludes’ the fields will be either included or excluded appropriately.

Calling this multiple times with the same named parameter will override the previous values with the new ones.

Example:

s = Search()
s = s.source(includes=['obj1.*'], excludes=["*.description"])

s = Search()
s = s.source(includes=['obj1.*']).source(excludes=["*.description"])

suggest(name: str, text: str, **kwargs) → Search[source]¶

Add a suggestions request to the search.

Parameters:	name – name of the suggestion text – text to suggest on

All keyword arguments will be added to the suggestions body. For example:

s = Search()
s = s.suggest('suggestion-1', 'Elasticsearch', term={'field': 'body'})

to_dict(count: bool = False, **kwargs) → pandagg.types.SearchDict[source]¶

Serialize the search into the dictionary that will be sent over as the request’s body.

Parameters:	count – a flag to specify if we are interested in a body for count - no aggregations, no pagination bounds etc.

All additional keyword arguments will be included into the dictionary.

update_from_dict(d: Dict[KT, VT]) → Search[source]¶: Apply options from a serialized body to the current instance. Modifies the object in-place. Used mostly by from_dict.