Using datasets

A dataset consists of a set of indexed data, from a single source, such as event data, Point of Interest (POI) data, or sensor data, where this source input data has been processed and loaded into the GeoSpock database. You can access and manage your datasets using the GeoSpock CLI. Using this command line tool, you can:

See The GeoSpock CLI for more information about how to install and use this command line interface.

Listing datasets

To list all the datasets that you have permission to view, use the dataset-list command:

$ geospock dataset-list --page-index <value> --page-size <value>

The page index/number can be supplied (starting with page 0) and/or the number of datasets per page. By default page 0 will be returned with 1000 datasets listed per page.

For more information about this command, use the GeoSpock CLI's help command.

Getting information about a dataset

Using the GeoSpock CLI, you can get the following information about a specific dataset:

  • its current status: use the dataset-status command to get information about a specified dataset, including its title, a summary of its contents, any groups that have permission to access it and the status of the most recent operation on that dataset. For example:
geospock dataset-status --dataset-name ingesttest1 
{
    "id": "ingesttest1",
    "title": "ingesttest1",
    "description": "Ingested from \u201cingesttest1\u201d on 1/23/2020-09:45:49",
    "createdDate": "2020-01-23T09:45:50Z",
"access": [
        {
            "schemaName": "default",
            "datasetName": "ingesttest1",
            "permissions": [
                {
                    "grantType": "GRANT",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "MODIFY",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "READ",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "VIEW",
                    "entitiesWithAccess": []
                }
            ]
        },
        {
            "schemaName": "default",
            "datasetName": "*",
            "permissions": [
                {
                    "grantType": "GRANT",
                    "entitiesWithAccess": [
                        "root.user@geospock.com"
                    ]
                },
                {
                    "grantType": "MODIFY",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "READ",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "VIEW",
                    "entitiesWithAccess": []
                }
            ]
        }
    ],

    "operationStatus": {
        "id": "opr-ingesttest1-7",
        "label": "Data ingested",
        "type": "INGEST",
        "status": "COMPLETED",
        "lastModifiedDate": "2020-01-22T16:57:14.985Z",
        "createdDate": "2020-01-23T09:45:51Z"
    }
}

For more information about this command, use the GeoSpock CLI's help command.

  • its history: use the dataset-operations command to get a history of all the operations that have been performed on that dataset. For example:
$ geospock dataset-operations --dataset-name ingesttest1 
{
    "listInfo": {
        "totalItemCount": 2,
        "pageCount": 1
    },
    "operations": [
        {
            "label": "Data ingested",
            "type": "INGEST",
            "status": "COMPLETED",
            "createdDate": "2020-01-23T09:45:51Z",
            "lastModifiedDate": "2020-01-23T10:00:09.651Z"
        },
     ...

For more information about this command, use the GeoSpock CLI's help command.

  • its data source description: use the dataset-data-source-description command to get the data source description that was used during the ingestion of a specified dataset. For example:
$ geospock dataset-data-source-description --dataset-name ingesttest1 
{
    "data-source-description": {
        "format": "COLUMNAR",
        "columnarFormatSeparator": ",",
        "properties": [
            {
                "id": "longitude",
                "type": "LONGITUDE",
                "sourceFieldIndex": 5,
            },
            … (other properties) … 
        ],
        "indexes": [
            {
                "propertyIDs": [
                    "farecategory"
                ]
            }
        ]
    }
}

For more information about this command, use the GeoSpock CLI's help command.

Giving a GeoSpock database user group permission to access the dataset

To enable GeoSpock database users to get access to a dataset, you need to grant a group access to a dataset. If you have GRANT permissions for the dataset schema, use the following command to give a group permission to access a specified dataset:

geospock dataset-permission-grant --dataset-name <dataset-name> --group-name <group-name> --grant-type <grant-type>

For example:

geospock dataset-permission-grant --dataset-name nycTaxiData --group-name newGroup --grant-type READ
[
    {
        "entityId": "newGroup"
    }
]

For more information about this command, use the GeoSpock CLI's help command.

Refer to Adding permissions to your ingested data for more information about granting READ permissions, and Managing Administrator Access for more information about administrating dataset permissions.

Removing permission to access a dataset from a user group

If you want to remove the permission from a group to access a specified dataset, use the following command:

geospock dataset-permission-revoke --dataset-name <dataset-name> --group-name <group-name> --grant-type <grant-type>

For example:

geospock dataset-permission-revoke --dataset-name nycTaxiData --group-name newGroup  --grant-type READ
[]

For more information about this command, use the GeoSpock CLI's help command.

This requires GRANT permissions for the dataset schema; refer to Managing Administrator Access for more information about dataset permissions.

Deleting a dataset

To delete a dataset and its associated data from the GeoSpock database, use the dataset-delete command as follows:

$ geospock dataset-delete --dataset-name nycTaxiData 

It takes short while for the dataset to be deleted. You can check that it has been removed by checking the list of datasets.

This requires MODIFY permissions for the dataset schema; refer to Managing Administrator Access for more information about dataset permissions.

Note that a deletion will fail if the dataset is currently ingesting data. Prior to using the dataset-delete command, use the dataset-status command to check the status of the most recent operation - if the status is COMPLETED or FAILED then it is safe to delete.

For more information about this command, use the GeoSpock CLI's help command.