Principles

Note

This is a work in progress. Some sections still need to be furnished.

pandagg is designed for both for “regular” code repository usage, and “interactive” usage (ipython or jupyter notebook usage with autocompletion features inspired by pandas design).

This library focuses on two principles:

  • stick to the tree structure of Elasticsearch objects
  • provide simple and flexible interfaces to make it easy and intuitive to use in an interactive usage

Elasticsearch tree structures

Many Elasticsearch objects have a tree structure, ie they are built from a hierarchy of nodes:

  • a mapping (tree) is a hierarchy of fields (nodes)
  • a query (tree) is a hierarchy of query clauses (nodes)
  • an aggregation (tree) is a hierarchy of aggregation clauses (nodes)
  • an aggregation response (tree) is a hierarchy of response buckets (nodes)

This library aims to stick to that structure by providing a flexible syntax distinguishing trees and nodes.

Interactive usage

Some classes are not intended to be used elsewhere than in interactive mode (ipython), since their purpose is to serve auto-completion features and convenient representations.

They won’t serve you for any other usage than interactive ones.

Namely:

  • pandagg.mapping.IMapping: used to interactively navigate in mapping and run quick aggregations on some fields
  • pandagg.client.Elasticsearch: used to discover cluster indices, and eventually navigate their mappings, or run quick access aggregations or queries.
  • pandagg.agg.AggResponse: used to interactively navigate in an aggregation response

These use case will be detailed in following sections.