Using the GeoSpock DB Presto connector

The GeoSpock DB Presto connector enables you to run SQL queries on your ingested data, and has been optimized to run a subset of SQL functions. You can use other SQL functions in your queries but they may run more slowly.

The following guidelines will enable you to get the best performance from the GeoSpock DB Presto connector when constructing your queries. These guidelines cover:

Joining tables

To get the best performance from the GeoSpock database Presto connector, the table containing event data must be on the left of the JOIN and the table containing point of interest (POI) data on the right. For example:

SELECT *
FROM geospock.default.events AS large
JOIN geospock.default.poi AS small
ON large.exampleid = small.exampleid;

Defining geofences

The size of the geofence can be provided as a radius in either degrees or meters. For the best performance, you should define this value in your query, rather than deriving the geofence radius from an object. To define a geofence using:

  • degrees, your query will look like this:

    SELECT *
    FROM geospock.default.largetable AS large,
    geospock.default.smalltable AS small
    WHERE ST_DISTANCE(ST_POINT(small.longitude, small.latitude), ST_POINT(large.longitude, large.latitude)) < 0.0005;
  • meters, your query will look like this:

    SELECT * 
    FROM geospock.default.events event
    JOIN geospock.default.pois poi
    ON GS_Distance_Within(to_spherical_geography(ST_POINT(poi.longitude, poi.latitude)), to_spherical_geography(ST_POINT(event.longitude, event.latitude)), 10);

    Note that this query uses the GeoSpock custom function GS_Distance_Within which has been optimized to run spatial join queries based on the distance (in meters) between two points using spherical geography. Alternatively, you could use the other optimized spatial join function, gs_great_circle_distance_within, instead:

    SELECT *
    FROM geospock.default.events event
    JOIN geospock.default.pois poi
    ON gs_great_circle_distance_within(poi.longitude, poi.latitude, event.longitude, event.latitude, 10);

Using a stored radius

If the POI table also includes a defined radius for each POI, you can use this value in the SQL query. However, in this case, you cannot apply an operational expression to this radius, such as multiplying the radius by a factor. If the radius is stored in degrees, for example, the SQL query would look something like this:

SELECT *
FROM geospock.default.largetable AS large,
geospock.default.smalltable AS small
WHERE ST_DISTANCE(ST_POINT(small.longitude, small.latitude), ST_POINT(large.longitude, large.latitude)) < small.radius;

Calculating the distance between two spatial locations

To get the best performance, use the following spatial function with the GeoSpock database:

ST_Distance(ST_POINT(longitude, latitude), ST_POINT(longitude, latitude))

The GeoSpock database can run other spatial functions but these queries may take longer to execute.

Naming POI table columns

The table containing the point of interest data must label the columns containing the location points as longitude and latitude respectively.

Specifying catalogs and schemas

GeoSpock database does not support the USE function for specifying catalogs or schemas implicitly in queries. You must use fully-qualified table names when running queries against your ingested data into the GeoSpock database, in the format:

<catalog>.<schema>.<table> 

For example:

SELECT *
FROM geospock.default.largetable AS large
JOIN geospock.default.smalltable AS small
ON large.exampleid = small.exampleid;

Creating tables for results

When running queries involving large amounts of data, you can improve the GeoSpock database's memory usage and execution time of spatial JOIN queries by creating a table using the spatial_partitioning(Geometry) function. For example:

CREATE TABLE hive.default.t_partitioning AS SELECT 
spatial_partitioning(ST_POINT(longitude, latitude)) AS v FROM
geospock.<schema_name>.<table_name>;

To use this table, set the SESSION property spatial_partitioning_table_name to the name of the table you have created, as follows:

SET SESSION spatial_partitioning_table_name = 'hive.default.t_partitioning';