Using datasets
A dataset consists of a set of indexed data, from a single source, such as event data, Point of Interest (POI) data, or sensor data, where this source input data has been processed and loaded into the GeoSpock database. You can access and manage your datasets using the GeoSpock CLI. Using this command line tool, you can:
- create a data source description for your dataset (
data-source-description
); see Building a data source description - create a new dataset and ingest data into it (
dataset-create
); see Ingesting source input data - add data to an existing datset, sometime referred to as an incremental ingest (
dataset-add-data
); see Ingesting source input data - list the datasets (
dataset-list
); see Listing datasets - get the status of a dataset (
dataset-status
); see Getting information about a dataset - get the history of a dataset (
dataset-operations
); see Getting information about a dataset - get the dataset source description used for a dataset (
dataset-data-source-description
); see Getting information about a dataset - give a user group permission to access a dataset (
dataset-permission-grant
); see Giving a GeoSpock database user group permission to access the dataset - remove a user group's permission to access a dataset (
dataset-permission-revoke
); see Removing permission to access a dataset from a user group - delete a dataset (
dataset-delete
); see Deleting a dataset
See The GeoSpock CLI for more information about how to install and use this command line interface.
Listing datasets
To list all the datasets that you have permission to view, use the dataset-list command:
$ geospock dataset-list --page-index <value> --page-size <value>
The page index/number can be supplied (starting with page 0) and/or the number of datasets per page. By default page 0 will be returned with 1000 datasets listed per page.
For more information about this command, use the GeoSpock CLI's help
command.
Getting information about a dataset
Using the GeoSpock CLI, you can get the following information about a specific dataset:
- its current status: use the dataset-status command to get information about a specified dataset, including its title, a summary of its contents, any groups that have permission to access it and the status of the most recent operation on that dataset. For example:
geospock dataset-status --dataset-name ingesttest1
{
"id": "ingesttest1",
"title": "ingesttest1",
"description": "Ingested from \u201cingesttest1\u201d on 1/23/2020-09:45:49",
"createdDate": "2020-01-23T09:45:50Z",
"access": [
{
"schemaName": "default",
"datasetName": "ingesttest1",
"permissions": [
{
"grantType": "GRANT",
"entitiesWithAccess": []
},
{
"grantType": "MODIFY",
"entitiesWithAccess": []
},
{
"grantType": "READ",
"entitiesWithAccess": []
},
{
"grantType": "VIEW",
"entitiesWithAccess": []
}
]
},
{
"schemaName": "default",
"datasetName": "*",
"permissions": [
{
"grantType": "GRANT",
"entitiesWithAccess": [
"root.user@geospock.com"
]
},
{
"grantType": "MODIFY",
"entitiesWithAccess": []
},
{
"grantType": "READ",
"entitiesWithAccess": []
},
{
"grantType": "VIEW",
"entitiesWithAccess": []
}
]
}
],
"operationStatus": {
"id": "opr-ingesttest1-7",
"label": "Data ingested",
"type": "INGEST",
"status": "COMPLETED",
"lastModifiedDate": "2020-01-22T16:57:14.985Z",
"createdDate": "2020-01-23T09:45:51Z"
}
}
For more information about this command, use the GeoSpock CLI's help
command.
- its history: use the dataset-operations command to get a history of all the operations that have been performed on that dataset. For example:
$ geospock dataset-operations --dataset-name ingesttest1
{
"listInfo": {
"totalItemCount": 2,
"pageCount": 1
},
"operations": [
{
"label": "Data ingested",
"type": "INGEST",
"status": "COMPLETED",
"createdDate": "2020-01-23T09:45:51Z",
"lastModifiedDate": "2020-01-23T10:00:09.651Z"
},
...
For more information about this command, use the GeoSpock CLI's help
command.
- its data source description: use the dataset-data-source-description command to get the data source description that was used during the ingestion of a specified dataset. For example:
$ geospock dataset-data-source-description --dataset-name ingesttest1
{
"data-source-description": {
"format": "COLUMNAR",
"columnarFormatSeparator": ",",
"properties": [
{
"id": "longitude",
"type": "LONGITUDE",
"sourceFieldIndex": 5,
},
… (other properties) …
],
"indexes": [
{
"propertyIDs": [
"farecategory"
]
}
]
}
}
For more information about this command, use the GeoSpock CLI's help
command.
Giving a GeoSpock database user group permission to access the dataset
To enable GeoSpock database users to get access to a dataset, you need to grant a group access to a dataset. If you have
GRANT
permissions for the dataset schema, use the following command to give a group permission to access a specified
dataset:
geospock dataset-permission-grant --dataset-name <dataset-name> --group-name <group-name> --grant-type <grant-type>
For example:
geospock dataset-permission-grant --dataset-name nycTaxiData --group-name newGroup --grant-type READ
[
{
"entityId": "newGroup"
}
]
For more information about this command, use the GeoSpock CLI's help
command.
Refer to Adding permissions to your ingested data for more information about granting READ
permissions, and Managing Administrator Access for more information about administrating
dataset permissions.
Removing permission to access a dataset from a user group
If you want to remove the permission from a group to access a specified dataset, use the following command:
geospock dataset-permission-revoke --dataset-name <dataset-name> --group-name <group-name> --grant-type <grant-type>
For example:
geospock dataset-permission-revoke --dataset-name nycTaxiData --group-name newGroup --grant-type READ
[]
For more information about this command, use the GeoSpock CLI's help
command.
This requires GRANT
permissions for the dataset schema; refer to Managing Administrator Access
for more information about dataset permissions.
Deleting a dataset
To delete a dataset and its associated data from the GeoSpock database, use the dataset-delete command as follows:
$ geospock dataset-delete --dataset-name nycTaxiData
It takes short while for the dataset to be deleted. You can check that it has been removed by checking the list of datasets.
This requires MODIFY
permissions for the dataset schema; refer to Managing Administrator Access
for more information about dataset permissions.
Note that a deletion will fail if the dataset is currently ingesting data. Prior to using the dataset-delete
command,
use the dataset-status
command to check the status of the most recent operation - if the status is COMPLETED
or
FAILED
then it is safe to delete.
For more information about this command, use the GeoSpock CLI's help
command.