Saving query results to an external table

Saving your query results to a table rather than streaming them back to the client allows you to perform further queries on the data, gives you a faster way to output the results, as well as creating a place where the results can be stored.

You can create and manage external tables using the Hive connector; follow the documentation on how to configure the Hive connector to be able to read from and write to an Amazon S3 bucket.

Be aware that:

  • an external table does not use the GeoSpock database's indexing and optimizations that have been applied to the ingested data
  • you will lose access to your external tables if the SQL cluster is destroyed or redeployed (e.g. to resize the cluster)

Prerequisites

To use an external table you need the name of the S3 bucket designated as the Hive storage bucket for your deployment. This bucket has read / write access for the GeoSpock database and will be used when creating the Hive schema.

Before creating a new table, make sure that the file location for the table is empty; if the file location already contains a table or schema, follow the instructions for deleting a table and schema.

Creating a table and its schema

You create an external table by creating a schema in an external location (making sure to specify <hive_storage_bucket_name> which is the designated Hive storage bucket selected during the deployment of the SQL cluster) and then creating a table. Contact your system administrator to get the name of the designated bucket.

To create a schema, at the Presto prompt, use the following command:

CREATE SCHEMA hive.<schema_name> WITH (location = 's3://<hive_storage_bucket_name>/<path_to_folder>');

To create a table, at the Presto prompt, use the following command:

CREATE TABLE hive.<schema_name>.<table_name> WITH (format = 'TEXTFILE') AS <your_query>

When you create the table, you can specify its storage format. In the example above, the table is stored in TEXTFILE format, a human-readable, plain text format. You may find that ORC or PARQUET provide better performance, but this may depend upon your use case. See the Hive documentation for a full list of the formats that are supported.

Destroying or resizing a GeoSpock database cluster will cause any schemas and tables created in Hive to be destroyed. Note that the data in S3 will not be deleted.

Deleting a table and its schema

You should delete an external table before creating a new table in an existing file location.

To delete a table, at the Presto prompt, use the following command:

DROP TABLE IF EXISTS hive.<schema_name>.<table_name>;

To delete a schema, at the Presto prompt, use the following command:

DROP SCHEMA IF EXISTS hive.<schema_name>;