Getting started with extrapol8

extrapol8 is available as both a Scala library and Python library that enables you to use Apache Spark for general analytics on a GeoSpock dataset.

The primary abstraction is a Spark dataset, with a schema that contains columns for:

  • latitude
  • longitude
  • time
  • (optionally) additional dataset-specific columns

extrapol8 gives you programmatic access to GeoSpock datasets to create bespoke data queries and extract source input data, using a library built on top of Apache Spark. You can load data from any geo-temporal region into an Apache Spark dataset in parallel and then explore your data using standard Spark constructs, including SQL.

Prerequisites

To use the extrapol8 libraries, you need:

  • a deployment of GeoSpock stack and extrapol8 in your account on your cloud services platform, such as Amazon Web Services (AWS)
  • a dataset ingested into your GeoSpock stack

Dependencies

extrapol8 has the following dependencies:

  • Spark version 2.3
  • Python 3.4

Using extrapol8

To find out more about:

Be aware that this is an early release of the API, and that it is unfinalized. Changes to the API will continue to happen without notice.

Other resources

To find out more about:

  • illumin8: a data analytics and visualization tool, that dynamically updates the map view and statistics as you modify the filter; see Getting started with illumin8
  • dataset management: for more information about your datasets, see Viewing your datasets
  • user management: for information about creating accounts and assinging roles, see Managing users