Saving query results to an external table

Saving your query results to a table rather than streaming them back to the client allows you to perform further queries on the data, gives you a faster way to output the results, as well as creating a place where the results can be stored.

You can create and manage external tables using the Hive connector; follow the documentation on how to configure the Hive connector to be able to read from and write to an Amazon S3 bucket.

Be aware that:

  • an external table does not use the GeoSpock database's indexing and optimizations that have been applied to the ingested data
  • you will not be able to access the external table once the GeoSpock database cluster has been destroyed
  • the table name must be unique as you cannot re-use the name of table that has been deleted

Prerequisites

To use an external table you need an S3 bucket with read / write access for the GeoSpock database; follow the AWS instructions for how to create an Amazon S3 bucket.

Before creating a new table, make sure that the file location for the table is empty; if the file location already contains a table or schema, follow the instructions for deleting a table and schema.

Creating a table and its schema

You create an external table by creating a schema in an external location and then creating a table.

To create a schema, at the Presto prompt, use the following command:

CREATE SCHEMA hive.<schema_name> WITH (location = 's3://<bucket_name>/<path_to_folder>');

To create a table, at the Presto prompt, use the following command:

CREATE TABLE hive.<schema_name>.<table_name> WITH (format = 'TEXTFILE') AS <your_query>

When you create the table, you can specify its storage format. In the example above, the table is stored in TEXTFILE format, a human-readable, plain text format. You may find that ORC or PARQUET provide better performance, but this may depend upon your use case. See the Hive documentation for a full list of the formats that are supported.

Deleting a table and its schema

You should delete an external table before:

  • destroying a GeoSpock database cluster; if you do not delete the table and its schema before you destroy the cluster, you will have to delete these resources from the S3 bucket using the AWS console
  • creating a new table in an existing file location

To delete a table, at the Presto prompt, use the following command:

DROP TABLE IF EXISTS hive.<schema_name>.<table_name>;

To delete a schema, at the Presto prompt, use the following command:

DROP SCHEMA IF EXISTS hive.<schema_name>;