pandagg.search module¶
-
class
pandagg.search.
MultiSearch
(**kwargs)[source]¶ Bases:
pandagg.search.Request
Combine multiple
Search
objects into a single request.
-
class
pandagg.search.
Request
(using, index=None)[source]¶ Bases:
object
-
index
(*index)[source]¶ Set the index for the search. If called empty it will remove all information.
Example:
s = Search() s = s.index(‘twitter-2015.01.01’, ‘twitter-2015.01.02’) s = s.index([‘twitter-2015.01.01’, ‘twitter-2015.01.02’])
-
params
(**kwargs)[source]¶ Specify query params to be used when executing the search. All the keyword arguments will override the current values. See https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.search for all available parameters.
Example:
s = Search() s = s.params(routing='user-1', preference='local')
-
-
class
pandagg.search.
Search
(using=None, index=None, mapping=None, nested_autocorrect=False, repr_auto_execute=False)[source]¶ Bases:
pandagg.search.Request
-
aggs
(*args, **kwargs)[source]¶ Arrange passed aggregations “horizontally”.
Given the initial aggregation:
A──> B └──> C
If passing multiple aggregations with insert_below = ‘A’:
A──> B └──> C └──> new1 └──> new2
Note: those will be placed under the insert_below aggregation clause id if provided, else under the deepest linear bucket aggregation if there is no ambiguity:
OK:
A──> B ─> C ─> new
KO:
A──> B └──> C
args accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
count
()[source]¶ Return the number of hits matching the query and filters. Note that only the actual number is returned.
-
classmethod
from_dict
(d)[source]¶ Construct a new Search instance from a raw dict containing the search body. Useful when migrating from raw dictionaries.
Example:
s = Search.from_dict({ "query": { "bool": { "must": [...] } }, "aggs": {...} }) s = s.filter('term', published=True)
-
groupby
(*args, **kwargs)[source]¶ Arrange passed aggregations in vertical/nested manner, above or below another agg clause.
Given the initial aggregation:
A──> B └──> C
If insert_below = ‘A’:
A──> new──> B └──> C
If insert_above = ‘B’:
A──> new──> B └──> C
by argument accepts single occurrence or sequence of following formats:
- string (for terms agg concise declaration)
- regular Elasticsearch dict syntax
- AggNode instance (for instance Terms, Filters etc)
If insert_below nor insert_above is provided by will be placed between the the deepest linear bucket aggregation if there is no ambiguity, and its children:
A──> B : OK generates A──> B ─> C ─> by A──> B : KO, ambiguous, must precise either A, B or C └──> C
Accepted all Aggs.__init__ syntaxes
>>> Aggs()\ >>> .groupby('terms', name='per_user_id', field='user_id') {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Passing a dict:
>>> Aggs().groupby({"terms_on_my_field":{"terms":{"field":"some_field"}}}) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Using DSL class:
>>> from pandagg.aggs import Terms >>> Aggs().groupby(Terms('terms_on_my_field', field='some_field')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Shortcut syntax for terms aggregation: creates a terms aggregation, using field as aggregation name
>>> Aggs().groupby('some_field') {"some_field":{"terms":{"field":"some_field"}}}
Using a Aggs object:
>>> Aggs().groupby(Aggs('per_user_id', 'terms', field='user_id')) {"terms_on_my_field":{"terms":{"field":"some_field"}}}
Accepted declarations for multiple aggregations:
Keyword Arguments: - insert_below (
string
) – Parent aggregation name under which these aggregations should be placed - insert_above (
string
) – Aggregation name above which these aggregations should be placed - at_root (
string
) – Insert aggregations at root of aggregation query - remaining kwargs: Used as body in aggregation
Return type:
-
highlight
(*fields, **kwargs)[source]¶ Request highlighting of some fields. All keyword arguments passed in will be used as parameters for all the fields in the
fields
parameter. Example:Search().highlight('title', 'body', fragment_size=50)
will produce the equivalent of:
{ "highlight": { "fields": { "body": {"fragment_size": 50}, "title": {"fragment_size": 50} } } }
If you want to have different options for different fields you can call
highlight
twice:Search().highlight('title', fragment_size=50).highlight('body', fragment_size=100)
which will produce:
{ "highlight": { "fields": { "body": {"fragment_size": 100}, "title": {"fragment_size": 50} } } }
-
highlight_options
(**kwargs)[source]¶ Update the global highlighting options used for this request. For example:
s = Search() s = s.highlight_options(order='score')
-
query
(*args, **kwargs)[source]¶ Insert new clause(s) in current query.
Inserted clause can accepts following syntaxes.
Given an empty query:
>>> from pandagg.query import Query >>> q = Query()
flat syntax: clause type, followed by query clause body as keyword arguments:
>>> q.query('term', some_field=23) {'term': {'some_field': 23}}
from regular Elasticsearch dict query:
>>> q.query({'term': {'some_field': 23}}) {'term': {'some_field': 23}}
using pandagg DSL:
>>> from pandagg.query import Term >>> q.query(Term(field=23)) {'term': {'some_field': 23}}
Keyword Arguments: - parent (
str
) – named query clause under which the inserted clauses should be placed. - parent_param (
str
optional parameter when using parent param) – parameter under which inserted clauses will be placed. For instance if parent clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - child (
str
) – named query clause above which the inserted clauses should be placed. - child_param (
str
optional parameter when using parent param) – parameter of inserted boolean clause under which child clauses will be placed. For instance if inserted clause is a boolean, can be ‘must’, ‘filter’, ‘should’, ‘must_not’. - mode (
str
one of ‘add’, ‘replace’, ‘replace_all’) – merging strategy when inserting clauses on a existing compound clause.- ‘add’ (default) : adds new clauses keeping initial ones
- ‘replace’ : for each parameter (for instance in ‘bool’ case : ‘filter’, ‘must’, ‘must_not’, ‘should’), replace existing clauses under this parameter, by new ones only if declared in inserted compound query
- ‘replace_all’ : existing compound clause is completely replaced by the new one
- parent (
-
scan
()[source]¶ Turn the search into a scan search and return a generator that will iterate over all the documents matching the query.
Use
params
method to specify any additional arguments you with to pass to the underlyingscan
helper fromelasticsearch-py
- https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan
-
script_fields
(**kwargs)[source]¶ Define script fields to be calculated on hits. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html for more details.
Example:
s = Search() s = s.script_fields(times_two="doc['field'].value * 2") s = s.script_fields( times_three={ 'script': { 'inline': "doc['field'].value * params.n", 'params': {'n': 3} } } )
-
sort
(*keys)[source]¶ Add sorting information to the search request. If called without arguments it will remove all sort requirements. Otherwise it will replace them. Acceptable arguments are:
'some.field' '-some.other.field' {'different.field': {'any': 'dict'}}
so for example:
s = Search().sort( 'category', '-title', {"price" : {"order" : "asc", "mode" : "avg"}} )
will sort by
category
,title
(in descending order) andprice
in ascending order using theavg
mode.The API returns a copy of the Search object and can thus be chained.
-
source
(fields=None, **kwargs)[source]¶ Selectively control how the _source field is returned.
Parameters: fields – wildcard string, array of wildcards, or dictionary of includes and excludes If
fields
is None, the entire document will be returned for each hit. If fields is a dictionary with keys of ‘includes’ and/or ‘excludes’ the fields will be either included or excluded appropriately.Calling this multiple times with the same named parameter will override the previous values with the new ones.
Example:
s = Search() s = s.source(includes=['obj1.*'], excludes=["*.description"]) s = Search() s = s.source(includes=['obj1.*']).source(excludes=["*.description"])
-
suggest
(name, text, **kwargs)[source]¶ Add a suggestions request to the search.
Parameters: - name – name of the suggestion
- text – text to suggest on
All keyword arguments will be added to the suggestions body. For example:
s = Search() s = s.suggest('suggestion-1', 'Elasticsearch', term={'field': 'body'})
-
to_dict
(count=False, **kwargs)[source]¶ Serialize the search into the dictionary that will be sent over as the request’s body.
Parameters: count – a flag to specify if we are interested in a body for count - no aggregations, no pagination bounds etc. All additional keyword arguments will be included into the dictionary.
-