pandagg.search module

class pandagg.search.MultiSearch(**kwargs)[source]

Bases: pandagg.search.Request

Combine multiple Search objects into a single request.

add(search)[source]

Adds a new Search object to the request:

ms = MultiSearch(index='my-index')
ms = ms.add(Search(doc_type=Category).filter('term', category='python'))
ms = ms.add(Search(doc_type=Blog))
execute()[source]

Execute the multi search request and return a list of search results.

to_dict()[source]
class pandagg.search.Request(using, index=None)[source]

Bases: object

index(*index)[source]

Set the index for the search. If called empty it will remove all information.

Example:

s = Search() s = s.index(‘twitter-2015.01.01’, ‘twitter-2015.01.02’) s = s.index([‘twitter-2015.01.01’, ‘twitter-2015.01.02’])
params(**kwargs)[source]

Specify query params to be used when executing the search. All the keyword arguments will override the current values. See https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search for all available parameters.

Example:

s = Search()
s = s.params(routing='user-1', preference='local')
using(client)[source]

Associate the search request with an elasticsearch client. A fresh copy will be returned with current instance remaining unchanged.

Parameters:client – an instance of elasticsearch.Elasticsearch to use or an alias to look up in elasticsearch_dsl.connections
class pandagg.search.Search(using=None, index=None, mapping=None, nested_autocorrect=False)[source]

Bases: pandagg.search.Request

aggs(*args, **kwargs)[source]

Arrange passed aggregations “horizontally”.

Given the initial aggregation:

A──> B
└──> C

If passing multiple aggregations with insert_below = ‘A’:

A──> B
└──> C
└──> new1
└──> new2

Note: those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity:

OK:

A──> B ─> C ─> new

KO:

A──> B
└──> C

args accepts single occurrence or sequence of following formats:

  • string (for terms agg concise declaration)
  • regular Elasticsearch dict syntax
  • AggNode instance (for instance Terms, Filters etc)
Keyword Arguments:
 
  • insert_below (string) – Parent aggregation name under which these aggregations should be placed
  • at_root (string) – Insert aggregations at root of aggregation query
  • remaining kwargs: Used as body in aggregation
Return type:

pandagg.aggs.Aggs

bool(*args, **kwargs)[source]
count()[source]

Return the number of hits matching the query and filters. Note that only the actual number is returned.

delete() executes the query by delegating to delete_by_query()[source]
exclude(*args, **kwargs)[source]

Must not wrapped in filter context.

execute()[source]

Execute the search and return an instance of Response wrapping all the data.

filter(*args, **kwargs)[source]
classmethod from_dict(d)[source]

Construct a new Search instance from a raw dict containing the search body. Useful when migrating from raw dictionaries.

Example:

s = Search.from_dict({
    "query": {
        "bool": {
            "must": [...]
        }
    },
    "aggs": {...}
})
s = s.filter('term', published=True)
groupby(*args, **kwargs)[source]

Arrange passed aggregations in vertical/nested manner, above or below another agg clause.

Given the initial aggregation:

A──> B
└──> C

If insert_below = ‘A’:

A──> new──> B
      └──> C

If insert_above = ‘B’:

A──> new──> B
└──> C

by argument accepts single occurrence or sequence of following formats:

  • string (for terms agg concise declaration)
  • regular Elasticsearch dict syntax
  • AggNode instance (for instance Terms, Filters etc)

If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children:

A──> B      : OK generates     A──> B ─> C ─> by

A──> B      : KO, ambiguous, must precise either A, B or C
└──> C

Accepted all Aggs.__init__ syntaxes

>>> Aggs()\
>>> .groupby('terms', name='per_user_id', field='user_id')
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Passing a dict:

>>> Aggs().groupby({"terms_on_my_field":{"terms":{"field":"some_field"}}})
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Using DSL class:

>>> from pandagg.aggs import Terms
>>> Aggs().groupby(Terms('terms_on_my_field', field='some_field'))
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Shortcut syntax for terms aggregation: creates a terms aggregation, using field as aggregation name

>>> Aggs().groupby('some_field')
{"some_field":{"terms":{"field":"some_field"}}}

Using a Aggs object:

>>> Aggs().groupby(Aggs('per_user_id', 'terms', field='user_id'))
{"terms_on_my_field":{"terms":{"field":"some_field"}}}

Accepted declarations for multiple aggregations:

Keyword Arguments:
 
  • insert_below (string) – Parent aggregation name under which these aggregations should be placed
  • insert_above (string) – Aggregation name above which these aggregations should be placed
  • at_root (string) – Insert aggregations at root of aggregation query
  • remaining kwargs: Used as body in aggregation
Return type:

pandagg.aggs.Aggs

highlight(*fields, **kwargs)[source]

Request highlighting of some fields. All keyword arguments passed in will be used as parameters for all the fields in the fields parameter. Example:

Search().highlight('title', 'body', fragment_size=50)

will produce the equivalent of:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 50},
            "title": {"fragment_size": 50}
        }
    }
}

If you want to have different options for different fields you can call highlight twice:

Search().highlight('title', fragment_size=50).highlight('body', fragment_size=100)

which will produce:

{
    "highlight": {
        "fields": {
            "body": {"fragment_size": 100},
            "title": {"fragment_size": 50}
        }
    }
}
highlight_options(**kwargs)[source]

Update the global highlighting options used for this request. For example:

s = Search()
s = s.highlight_options(order='score')
must(*args, **kwargs)[source]
must_not(*args, **kwargs)[source]
post_filter(*args, **kwargs)[source]
query(*args, **kwargs)[source]

Insert new clause(s) in current query.

Inserted clause can accepts following syntaxes.

Given an empty query:

>>> from pandagg.query import Query
>>> q = Query()

flat syntax: clause type, followed by query clause body as keyword arguments:

>>> q.query('term', some_field=23)
{'term': {'some_field': 23}}

from regular Elasticsearch dict query:

>>> q.query({'term': {'some_field': 23}})
{'term': {'some_field': 23}}

using pandagg DSL:

>>> from pandagg.query import Term
>>> q.query(Term(field=23))
{'term': {'some_field': 23}}
Keyword Arguments:
 
  • parent (str) – named query clause under which the inserted clauses should be placed.
  • parent_param (str optional parameter when using parent param) – parameter under which inserted clauses will be placed. For instance if parent clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’.
  • child (str) – named query clause above which the inserted clauses should be placed.
  • child_param (str optional parameter when using parent param) – parameter of inserted boolean clause under which child clauses will be placed. For instance if inserted clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’.
  • mode (str one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.
    • ‘add’ (default) : adds new clauses keeping initial ones
    • ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
    • ‘replace_all’ : existing compound clause is completely replaced by the new one
scan()[source]

Turn the search into a scan search and return a generator that will iterate over all the documents matching the query.

Use params method to specify any additional arguments you with to pass to the underlying scan helper from elasticsearch-py - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan

script_fields(**kwargs)[source]

Define script fields to be calculated on hits. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html for more details.

Example:

s = Search()
s = s.script_fields(times_two="doc['field'].value * 2")
s = s.script_fields(
    times_three={
        'script': {
            'inline': "doc['field'].value * params.n",
            'params': {'n': 3}
        }
    }
)
should(*args, **kwargs)[source]
size(size)[source]

Equivalent to:

s = Search().params(size=size)
sort(*keys)[source]

Add sorting information to the search request. If called without arguments it will remove all sort requirements. Otherwise it will replace them. Acceptable arguments are:

'some.field'
'-some.other.field'
{'different.field': {'any': 'dict'}}

so for example:

s = Search().sort(
    'category',
    '-title',
    {"price" : {"order" : "asc", "mode" : "avg"}}
)

will sort by category, title (in descending order) and price in ascending order using the avg mode.

The API returns a copy of the Search object and can thus be chained.

source(fields=None, **kwargs)[source]

Selectively control how the _source field is returned.

Parameters:fields – wildcard string, array of wildcards, or dictionary of includes and excludes

If fields is None, the entire document will be returned for each hit. If fields is a dictionary with keys of ‘includes’ and/or ‘excludes’ the fields will be either included or excluded appropriately.

Calling this multiple times with the same named parameter will override the previous values with the new ones.

Example:

s = Search()
s = s.source(includes=['obj1.*'], excludes=["*.description"])

s = Search()
s = s.source(includes=['obj1.*']).source(excludes=["*.description"])
suggest(name, text, **kwargs)[source]

Add a suggestions request to the search.

Parameters:
  • name – name of the suggestion
  • text – text to suggest on

All keyword arguments will be added to the suggestions body. For example:

s = Search()
s = s.suggest('suggestion-1', 'Elasticsearch', term={'field': 'body'})
to_dict(count=False, **kwargs)[source]

Serialize the search into the dictionary that will be sent over as the request’s body.

Parameters:count – a flag to specify if we are interested in a body for count - no aggregations, no pagination bounds etc.

All additional keyword arguments will be included into the dictionary.

update_from_dict(d)[source]

Apply options from a serialized body to the current instance. Modifies the object in-place. Used mostly by from_dict.