Getting started with extrapol8

extrapol8 is available as both a Scala library and Python library that enables you to use Apache Spark for general analytics on a GeoSpock dataset.

The primary abstraction is a Spark dataset, with a schema that contains columns for:

  • latitude
  • longitude
  • time
  • (optionally) additional dataset-specific columns

extrapol8 gives you programmatic access to GeoSpock datasets to create bespoke data queries and extract source input data, using a library built on top of Apache Spark. You can load data from any geo-temporal region into an Apache Spark dataset in parallel and then explore your data using standard Spark constructs, including SQL.


To use the extrapol8 libraries, you need:

  • a deployment of GeoSpock stack and extrapol8 in your account on your cloud services platform, such as Amazon Web Services (AWS)
  • a dataset ingested into your GeoSpock stack


extrapol8 has the following dependencies:

  • Spark version 2.3
  • Python 3.4

Using extrapol8

Be aware that this is an early release of the API, and that it is unfinalized. Changes to the API will continue to happen without notice.

