Querying indexed data
Users can potentially benefit from the indexing of certain fields (latitude, longitude, timestamp and source id) in order to improve query performance. For the best performance, datasets should be indexed according to the types of queries that users will run. For more information on how to index data, see Specifying the type of dataset.
In the following sections we will provide some guidelines for creating queries that take advantage of indexing. Although each section describes a different index (time, location and source id), it is important to note that queries can use multiple indexes in conjunction.
For examples of how GeoSpock indexing is used to optimize different use cases, please visit the Introduction to Geospatial Queries
Location (latitude and longitude)
As part of your data analysis, you may want to focus on a specific regions to answer different business questions. The GeoSpock database provides the following optimized geometry expressions which benefit in performance with location indexing:
- Simple bounding box
- Points within/intersects with a geometry shape
- Line locate point
- Recorded geometries intersect with a query geometry shape
Optimized standard geospatial functions include:
Additional custom GeoSpock functions include:
Using geometry functions provides detailed guidance on how to query location data with the GeoSpock database. Geospatial joins describes in detail how to perform such operations, which are of particularly useful in many practical use cases.
In your data analysis, you may be interested in events related to a specific time range. You can use the GeoSpock database's optimized time queries by specifying a:
- single time window in the SQL query, for example:
SELECT * FROM geospock.default.largetable WHERE logtime BETWEEN timestamp '2019-02-01 00:00:00' AND timestamp '2019-03-01 00:00:00'
- time interval around a specific time point set per POI, for example:
SELECT large.* FROM geospock.default.largetable large JOIN geospock.default.smalltable small ON ST_Distance(ST_Point(large.longitude, large.latitude), ST_Point(small.longitude, small.latitude)) < 0.001 AND small.poitime - large.eventtime < interval '1' day;
Both of these examples benefit from having the time/timestamp column indexed for events-based data.
There are some cases where indexing won't benefit query performance because an operation has been applied on the timestamp column
SELECT * FROM geospock.default.largetable WHERE ( logtime + interval '2' day ) < timestamp '2019-02-01 00:00:00'
However, this query can re-written and by moving the function to the other side as shown in the next example, it take advantage of the indexing:
SELECT * FROM geospock.default.largetable WHERE logtime < ( timestamp '2019-02-01 00:00:00' - interval '2' day )
During ingest, the GeoSpock database converts timestamps to the UTC time zone for storage, so by default timestamps returned by queries are in UTC. However, any timezone adjustments you make in your queries will be respected.
Source (source id)
In your data analysis, you may be interested in one or more specific sources of data, which can represent entities such as sensors or vehicles. Accessing them efficiently would be done by source id filtering.
SELECT * FROM geospock.default.largetable WHERE deviceid= 'id'
To have multiple source id filters, use the OR statement
SELECT * FROM geospock.default.largetable WHERE deviceid= 'id1' OR deviceid='id2'