pandagg.search module¶

class pandagg.search.MultiSearch(**kwargs)[source]¶

Bases: pandagg.search.Request

Combine multiple Search objects into a single request.

add(search)[source]¶

Adds a new Search object to the request:

ms = MultiSearch(index='my-index')
ms = ms.add(Search(doc_type=Category).filter('term', category='python'))
ms = ms.add(Search(doc_type=Blog))

execute()[source]¶: Execute the multi search request and return a list of search results.

to_dict()[source]¶

class pandagg.search.Request(using, index=None)[source]¶

Bases: object

index(*index)[source]¶

Set the index for the search. If called empty it will remove all information.

Example:

s = Search() s = s.index(‘twitter-2015.01.01’, ‘twitter-2015.01.02’) s = s.index([‘twitter-2015.01.01’, ‘twitter-2015.01.02’])

params(**kwargs)[source]¶

Specify query params to be used when executing the search. All the keyword arguments will override the current values. See https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search for all available parameters.

Example:

s = Search()
s = s.params(routing='user-1', preference='local')

using(client)[source]¶

Associate the search request with an elasticsearch client. A fresh copy will be returned with current instance remaining unchanged.

Parameters:	client – an instance of `elasticsearch.Elasticsearch` to use or an alias to look up in `elasticsearch_dsl.connections`

class pandagg.search.Search(using=None, index=None, mapping=None, nested_autocorrect=False)[source]¶

Bases: pandagg.search.Request

aggs(*args, **kwargs)[source]¶

Arrange passed aggregations “horizontally”.

Given the initial aggregation:

A──> B
└──> C

If passing multiple aggregations with insert_below = ‘A’:

A──> B
└──> C
└──> new1
└──> new2

Note: those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity:

OK:

A──> B ─> C ─> new

KO:

A──> B
└──> C

args accepts single occurrence or sequence of following formats:

string (for terms agg concise declaration)
regular Elasticsearch dict syntax
AggNode instance (for instance Terms, Filters etc)

Keyword Arguments:
	insert_below (`string`) – Parent aggregation name under which these aggregations should be placed at_root (`string`) – Insert aggregations at root of aggregation query remaining kwargs: Used as body in aggregation
Return type:	pandagg.aggs.Aggs

bool(*args, **kwargs)[source]¶

count()[source]¶: Return the number of hits matching the query and filters. Note that only the actual number is returned.

delete() executes the query by delegating to delete_by_query()[source]¶

exclude(*args, **kwargs)[source]¶: Must not wrapped in filter context.

execute()[source]¶: Execute the search and return an instance of Response wrapping all the data.

filter(*args, **kwargs)[source]¶

classmethod from_dict(d)[source]¶

Construct a new Search instance from a raw dict containing the search body. Useful when migrating from raw dictionaries.

Example:

s = Search.from_dict({
    "query": {
        "bool": {
            "must": [...]
        }
    },
    "aggs": {...}
})
s = s.filter('term', published=True)

groupby(*args, **kwargs)[source]¶

Arrange passed aggregations in vertical/nested manner, above or below another agg clause.

Given the initial aggregation:

A──> B
└──> C

If insert_below = ‘A’:

A──> new──> B
      └──> C

If insert_above = ‘B’:

A──> new──> B
└──> C

by argument accepts single occurrence or sequence of following formats:

string (for terms agg concise declaration)
regular Elasticsearch dict syntax
AggNode instance (for instance Terms, Filters etc)

If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children:

A──> B      : OK generates     A──> B ─> C ─> by

A──> B      : KO, ambiguous, must precise either A, B or C
└──> C

Accepted all Aggs.__init__ syntaxes

>>> Aggs()\
>>> .groupby('terms', name='per_user_id', field='user_id')
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Passing a dict:

>>> Aggs().groupby({"terms_on_my_field":{"terms":{"field":"some_field"}}})
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Using DSL class:

>>> from pandagg.aggs import Terms
>>> Aggs().groupby(Terms('terms_on_my_field', field='some_field'))
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Shortcut syntax for terms aggregation: creates a terms aggregation, using field as aggregation name

>>> Aggs().groupby('some_field')
{"some_field":{"terms":{"field":"some_field"}}}

Using a Aggs object:

>>> Aggs().groupby(Aggs('per_user_id', 'terms', field='user_id'))
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Accepted declarations for multiple aggregations:

Keyword Arguments:
	insert_below (`string`) – Parent aggregation name under which these aggregations should be placed insert_above (`string`) – Aggregation name above which these aggregations should be placed at_root (`string`) – Insert aggregations at root of aggregation query remaining kwargs: Used as body in aggregation
Return type:	pandagg.aggs.Aggs

highlight(*fields, **kwargs)[source]¶

Request highlighting of some fields. All keyword arguments passed in will be used as parameters for all the fields in the fields parameter. Example:

Search().highlight('title', 'body', fragment_size=50)

will produce the equivalent of:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 50},
            "title": {"fragment_size": 50}
        }
    }
}

If you want to have different options for different fields you can call highlight twice:

Search().highlight('title', fragment_size=50).highlight('body', fragment_size=100)

which will produce:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 100},
            "title": {"fragment_size": 50}
        }
    }
}

highlight_options(**kwargs)[source]¶

Update the global highlighting options used for this request. For example:

s = Search()
s = s.highlight_options(order='score')

must(*args, **kwargs)[source]¶

must_not(*args, **kwargs)[source]¶

post_filter(*args, **kwargs)[source]¶

query(*args, **kwargs)[source]¶

Insert new clause(s) in current query.

Inserted clause can accepts following syntaxes.

Given an empty query:
>>> from pandagg.query import Query
>>> q = Query()
flat syntax: clause type, followed by query clause body as keyword arguments:
>>> q.query('term', some_field=23)
{'term': {'some_field': 23}}
from regular Elasticsearch dict query:
>>> q.query({'term': {'some_field': 23}})
{'term': {'some_field': 23}}
using pandagg DSL:
>>> from pandagg.query import Term
>>> q.query(Term(field=23))
{'term': {'some_field': 23}}
Keyword Arguments:

parent (str) – named query clause under which the inserted clauses should be placed.
parent_param (str optional parameter when using parent param) – parameter under which inserted clauses will be placed. For instance if parent clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’.
child (str) – named query clause above which the inserted clauses should be placed.
child_param (str optional parameter when using parent param) – parameter of inserted boolean clause under which child clauses will be placed. For instance if inserted clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’.
mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
- ‘add’ (default) : adds new clauses keeping initial ones
- ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
- ‘replace_all’ : existing compound clause is completely replaced by the new one

scan()[source]¶

Turn the search into a scan search and return a generator that will iterate over all the documents matching the query.

Use params method to specify any additional arguments you with to pass to the underlying scan helper from elasticsearch-py - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan

script_fields(**kwargs)[source]¶

Define script fields to be calculated on hits. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html for more details.

Example:

s = Search()
s = s.script_fields(times_two="doc['field'].value * 2")
s = s.script_fields(
    times_three={
        'script': {
            'inline': "doc['field'].value * params.n",
            'params': {'n': 3}
        }
    }
)

should(*args, **kwargs)[source]¶

size(size)[source]¶

Equivalent to:

s = Search().params(size=size)

sort(*keys)[source]¶

Add sorting information to the search request. If called without arguments it will remove all sort requirements. Otherwise it will replace them. Acceptable arguments are:

'some.field'
'-some.other.field'
{'different.field': {'any': 'dict'}}

so for example:

s = Search().sort(
    'category',
    '-title',
    {"price" : {"order" : "asc", "mode" : "avg"}}
)

will sort by category, title (in descending order) and price in ascending order using the avg mode.

The API returns a copy of the Search object and can thus be chained.

source(fields=None, **kwargs)[source]¶

Selectively control how the _source field is returned.

Parameters:	fields – wildcard string, array of wildcards, or dictionary of includes and excludes

If fields is None, the entire document will be returned for each hit. If fields is a dictionary with keys of ‘includes’ and/or ‘excludes’ the fields will be either included or excluded appropriately.

Calling this multiple times with the same named parameter will override the previous values with the new ones.

Example:

s = Search()
s = s.source(includes=['obj1.*'], excludes=["*.description"])

s = Search()
s = s.source(includes=['obj1.*']).source(excludes=["*.description"])

suggest(name, text, **kwargs)[source]¶

Add a suggestions request to the search.

Parameters:	name – name of the suggestion text – text to suggest on

All keyword arguments will be added to the suggestions body. For example:

s = Search()
s = s.suggest('suggestion-1', 'Elasticsearch', term={'field': 'body'})

to_dict(count=False, **kwargs)[source]¶

Serialize the search into the dictionary that will be sent over as the request’s body.

Parameters:	count – a flag to specify if we are interested in a body for count - no aggregations, no pagination bounds etc.

All additional keyword arguments will be included into the dictionary.

update_from_dict(d)[source]¶: Apply options from a serialized body to the current instance. Modifies the object in-place. Used mostly by from_dict.