The GeoSpock database architecture

The GeoSpock database components run on a set of Amazon Web Services (AWS) resources in an AWS cluster, with the exception of your Identity Provider (IdP), which may hosted outside of AWS (see User authentication and authorization in the GeoSpock database for more information).

GeoSpock DB

The GeoSpock database AWS resources include:

  • the core services, including an ingestor cluster
  • a deployment machine
  • the SQL access cluster
  • databases and S3 buckets for data persistence

Core services

The core services provide the services and resources which enable you to ingest and then access your data. They include the following:

  • Management service: this service provides you with access to your datasets. The service runs on a single machine EC2 instance that is responsible for the monitoring and management of data load (ingest) tasks.
  • Dataset administration: this enables you to manage your datasets
  • Authorization service: The service runs on an Elastic beanstalk instance that manages user authorization for the GeoSpock database
  • Ingestor cluster: The ingestor cluster contains components that process your source input data and create the associated indexes. The ingestor cluster is automatically deployed when a data ingest is triggered.

Deployment machine

The deployment machine is used to trigger the deployment of the GeoSpock database components using Terraspock, a command-line orchestration tool, to create and configure the required AWS resources. This is the first component of the GeoSpock database stack that you should deploy.

SQL access

The SQL access cluster is an EMR machine cluster that contains the GeoSpock database SQL engine, providing you with optimized SQL functions for analyzing your ingested data. This is the last component of the GeoSpock database you deploy.

Data persistence

The data persistence resources store the GeoSpock database's metadata and log files, as well as the ingested data. This includes resources for:

  • the database
  • the metadata for the GeoSpock database
  • backups of the supporting databases
  • logging and auditing

These resources are created automatically when you deploy the core services for the GeoSpock database.