Saving query results to an external table
Saving your query results to a table rather than streaming them back to the client allows you to perform further queries on the data, gives you a faster way to output the results, as well as creating a place where the results can be stored.
You can create and manage external tables using the Hive connector; follow the documentation on how to configure the Hive connector to be able to read from and write to an Amazon S3 bucket.
Be aware that:
- an external table does not use the GeoSpock database's indexing and optimizations that have been applied to the ingested data
- you will not be able to access the external table once the GeoSpock database cluster has been destroyed
- the table name must be unique as you cannot re-use the name of table that has been deleted
To use an external table you need an S3 bucket with read / write access for the GeoSpock database; follow the AWS instructions for how to create an Amazon S3 bucket.
Before creating a new table, make sure that the file location for the table is empty; if the file location already contains a table or schema, follow the instructions for deleting a table and schema.
Creating a table and its schema
You create an external table by creating a schema in an external location and then creating a table.
To create a schema, at the Presto prompt, use the following command:
CREATE SCHEMA hive.<schema_name> WITH (location = 's3://<bucket_name>/<path_to_folder>');
To create a table, at the Presto prompt, use the following command:
CREATE TABLE hive.<schema_name>.<table_name> WITH (format = 'TEXTFILE') AS <your_query>
When you create the table, you can specify its storage format. In the example above, the table is stored in TEXTFILE format, a human-readable, plain text format. You may find that ORC or PARQUET provide better performance, but this may depend upon your use case. See the Hive documentation for a full list of the formats that are supported.
Deleting a table and its schema
You should delete an external table before:
- destroying a GeoSpock database cluster; if you do not delete the table and its schema before you destroy the cluster, you will have to delete these resources from the S3 bucket using the AWS console
- creating a new table in an existing file location
To delete a table, at the Presto prompt, use the following command:
DROP TABLE IF EXISTS hive.<schema_name>.<table_name>;
To delete a schema, at the Presto prompt, use the following command:
DROP SCHEMA IF EXISTS hive.<schema_name>;