Introduction to Geospatial Queries

This document provides an introduction to SQL queries in GeoSpock DB. Visit the "Query your Data" section of the main product documentation for further details.

GeoSpock DB uses indexing to optimize retrieval operations that filter data based on conditions on indexed columns. GeoSpock DB can index on location (latitude and longitude) and timestamp, helping to minimize the amount of data scanned in geospatial and geotemporal queries. Additionally, an index can be applied to a column specifying some sort of identifier in order to improve the performance of related queries.

Warning: We strongly advise against running queries that perform full table scans on large datasets, such as SELECT COUNT(*) FROM geospock.default.sg_erp2_synthesis;. Since the computational cost of a query is proportional to the amount of data scanned, these query types require large amounts of resources. Note that, in most cases, queries limit the amount of data scanned by filtering on an indexed field in the WHERE section of the query. In case queries like this are run, they can be stopped by cancelling the query.

There are several types of geotemporal queries that are frequently used in geotemporal analysis. In this document we describe several of these data access patterns and show how they can be implemented within GeoSpock DB.

1. Data access pattern: historical localized snapshot

  • Dataset A: a large dataset of events data
  • Task: retrieve data within a specific bounding box for a given timeframe

Example query

SELECT *
  FROM geospock.default.A as events
 WHERE events.longitude BETWEEN 103.950711575 AND 103.955711575 AND events.latitude BETWEEN 1.36305117308 AND 1.36805117308
   AND (events.timestamp BETWEEN TIMESTAMP '2019-08-30 17:50:00' AND TIMESTAMP '2019-08-30 17:51:00')

Example use cases

Vertical Use case
Maritime Study port or vessel incident
Insurance / automotive Accident analysis or anomaly exploration
MaaS / smart cities Anomaly exploration

2. Data access pattern: polygon queries for region filtering

  • Dataset A: a large dataset of events, POIs or Sensor data
  • Dataset B (Optional): polygons defining regions. Note that this dataset is optional, as the polygon can be specified directly in the query in WKT format.
  • Task: extract data for A within a specific polygon in B

Example query

SELECT events.*
  FROM geospock.default.A as events
  JOIN geospock.default.B as regions
    ON ST_Within( ST_Point(events.longitude, events.latitude), st_geometryfromtext( regions.polygon )
 WHERE regions.id = 'desired_region'

Example use cases across industries

Vertical Use case
Government Provide a centralized data repository that can serve datasets for different regions (nations, districts, counties, local authorities, cities, neighborhoods … )
Maritime Provide access to AIS data to different ports or areas. Added advantage of usually also requiring device and time filters for other use cases.
Automobile Analyze traffic in cities, road segments

3. Data access pattern: polygon queries for regional statistics

  • Dataset A: a large dataset of events, POIs or Sensor data
  • Dataset B: polygons defining regions
  • Task: extract summary statistics for data in A for each polygon in B

Example query

  SELECT count(events.id) as desired_stat, regions.id
    FROM geospock.default.A AS events
    JOIN geospock.default.B AS regions
      ON ST_Within( ST_Point(events.longitude, events.latitude), st_geometryfromtext( regions.polygon )
GROUP BY regions.id

Example use cases across industries

Vertical Use case
Government Provide reporting summary stats for regions
Maritime Provide reporting summary stats for regions/ports

4. Data access pattern: unique vehicle or fleet history by device ID set

  • Dataset A: a really large dataset of events data for devices (doesn’t have to have a geospatial component)
  • Task: Retrieve data for a small set of devices

Example query

SELECT *
  FROM geospock.default.A AS events
 WHERE events.id IN  ('deviceid1', 'deviceid2',...)
 AND events.timestamp BETWEEN TIMESTAMP '2019-01-01 00:01:01' AND TIMESTAMP '2019-01-14 23:59:59'

Example use cases across industries

Vertical Use case
Maritime / supply chain Analysis for each fleet owner

5. Data access pattern: POI filter & device tracking

  • Dataset A: a really large dataset of events data
  • Task: calculate which devices have been within a specific polygon/POI and then go through each device and see the history of the device’s locations

Example query

  SELECT events.deviceid
    FROM geospock.default.A as events
   WHERE ST_DISTANCE(ST_POINT(21.4, 22.1), ST_POINT(signals.lon, signals.lat)) < 0.0045)
     AND events.timestamp BETWEEN TIMESTAMP '2019-01-04 00:01:01' AND TIMESTAMP '2019-01-04 23:59:59'
GROUP BY events.deviceid

SELECT events.*
  FROM geospock.default.A as events
 WHERE events.deviceid in ( 'deviceid1', 'deviceid2',...)
   AND events.timestamp BETWEEN TIMESTAMP '2019-01-01 00:01:01' AND TIMESTAMP '2019-01-14 23:59:59'

Example use cases across industries

Vertical Use case
Datatech A datatech company wants to see what other places have been visited by people that have visited a specific retailer
National security / public health Contact tracing: given a location that poses a risk, identify those that have been there and where else they have been
Insurance / fintech Given a location that poses a risk (financial or health), identify those that have been there and where else they have been

6. Data access pattern: Spatial join of events vs. POIs

  • Dataset A: a dataset of events data
  • Dataset B: a dataset of POIs
  • Task: Obtain events (of dataset A) that are in proximity to each of the POIs in dataset B

Note: instead of all attributes for events, in some cases you may just want to get unique IDs or even an aggregate count of unique IDs

Example query

SELECT events.*,pois.id
  FROM geospock.default.A as events
  JOIN geospock.default.B as pois
    ON ST_DISTANCE(ST_POINT(pois.lon, pois.lat), ST_POINT(events.lon, events.lat)) < 0.0045)

Example use cases across industries

Vertical Use case
Datatech A datatech company wants to calculate footfall in different public venues for profiling and estimating a target audience
A datatech company wants to calculate footfall in different public venues for pricing for area-specific ads
Telecom A telecom wants to calculate footfall in different public venues for profiling users
Wants to see the occupancy of each cell tower (events close to a cell tower)
Smart city Wants to estimate air quality (or other environmental sensors) in or close to specific areas, where sensors are installed in mobile vehicles
Assets and logistics / automobile / motor insurance Calculate approximate number of vehicles in key highways/road segments