Dataset administration

The GeoSpock CLI provides a number of commands to administer your datasets after you have ingested data into them.

Listing datasets

To list all the datasets that you have permission to view, use the dataset-list command:

$ geospock dataset-list --page-index <value> --page-size <value>

This command requires the user running the command to have dataset administration VIEW permissions.

The page index/number can be supplied (starting with page 0) and/or the number of datasets per page. By default page 0 will be returned with 1000 datasets listed per page.

For more information about this command, use the GeoSpock CLI's help command.

Getting information about a dataset

Using the GeoSpock CLI, you can get the following information about a specific dataset.

These commands requires the user running the command to have dataset administration VIEW permissions.

Dataset status

To report on a dataset's current status, use the dataset-status command. Information provided includes its title, a summary of its contents, any groups that have permission to access it, and the status of the most recent operation on that dataset. For example:

$ geospock dataset-status --dataset-name ingesttest1
{
    "id": "ingesttest1",
    "title": "ingesttest1",
    "description": "Ingested from \u201cingesttest1\u201d on 1/23/2020-09:45:49",
    "createdDate": "2020-01-23T09:45:50Z",
"access": [
        {
            "schemaName": "default",
            "datasetName": "ingesttest1",
            "permissions": [
                {
                    "grantType": "GRANT",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "MODIFY",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "READ",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "VIEW",
                    "entitiesWithAccess": []
                }
            ]
        },
        {
            "schemaName": "default",
            "datasetName": "*",
            "permissions": [
                {
                    "grantType": "GRANT",
                    "entitiesWithAccess": [
                        "root.user@geospock.com"
                    ]
                },
                {
                    "grantType": "MODIFY",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "READ",
                    "entitiesWithAccess": []
                },
                {
                    "grantType": "VIEW",
                    "entitiesWithAccess": []
                }
            ]
        }
    ],

    "operationStatus": {
        "id": "opr-ingesttest1-7",
        "label": "Data ingested",
        "type": "INGEST",
        "status": "COMPLETED",
        "lastModifiedDate": "2020-01-22T16:57:14.985Z",
        "createdDate": "2020-01-23T09:45:51Z"
    }
}

For more information about this command, use the GeoSpock CLI's help command.

Dataset history

To get a history of all the operations that have been performed on a dataset, use the dataset-operations command. For example:

$ geospock dataset-operations --dataset-name ingesttest1
{
    "listInfo": {
        "totalItemCount": 2,
        "pageCount": 1
    },
    "operations": [
        {
            "label": "Data ingested",
            "type": "INGEST",
            "status": "COMPLETED",
            "createdDate": "2020-01-23T09:45:51Z",
            "lastModifiedDate": "2020-01-23T10:00:09.651Z"
        },
     ...

For more information about this command, use the GeoSpock CLI's help command.

Data source description

To get the data source description that was used during the ingestion of a specified dataset, use the dataset-data-source-description command. For example:

$ geospock dataset-data-source-description --dataset-name ingesttest1
{
    "data-source-description": {
        "format": "COLUMNAR",
        "columnarFormatSeparator": ",",
        "properties": [
            {
                "id": "longitude",
                "type": "LONGITUDE",
                "sourceFieldIndex": 5,
            },
            … (other properties) …
        ],
        "indexes": [
            {
                "propertyIDs": [
                    "farecategory"
                ]
            }
        ]
    }
}

For more information about this command, use the GeoSpock CLI's help command.

Deleting a dataset

To delete a dataset and its associated data from the GeoSpock database, use the dataset-delete command as follows:

$ geospock dataset-delete --dataset-name nycTaxiData

This command requires the user running the command to have dataset administration MODIFY permissions.

It takes short while for the dataset to be deleted. You can check that it has been removed by checking the list of datasets.

Note that a deletion will fail if the dataset is currently ingesting data. Prior to using the dataset-delete command, use the dataset-status command to check the status of the most recent operation - if the status is COMPLETED or FAILED then it is safe to delete.

For more information about this command, use the GeoSpock CLI's help command.