Generating a data source description

Rather than creating a data source description file from scratch, you can use the GeoSpock CLI data-source-description command to create one. This command parses the source input data and returns a data source description for the dataset that fits the sampled data.

Be aware that:

  • this command does not support compressed data, so you should extract the data in the sample file before running this command
  • there is a file size limit on this sample file of 300MB
$ geospock data-source-description --data-url "s3://path-to-file/nyc-new-taxi-data-Pickups-Snapped-sample.csv"

You should review this data source description to make sure that SQL data types assigned to each field are correct and the source data is going to get ingested in the way you want.

Reviewing the data source description

Before using a data source description created by the CLI, you should check that the generated file is correct.

In particular:

You should also verify that each field you want to ingest has been described correctly, including:

  • its id: the name for the column in the resulting SQL table. It is strongly recommended that if you have more than one field for latitude, longitude or timestamp, you should avoid using ids that are identical apart from an underscore and a number for these fields, such as timestamp_1, as this negatively impacts the query optimizations in the Geospock database
  • source field: the field in the source input data
  • (optional) purpose: see Special fields for more information about this setting
  • data type: this describes how this field is going to be stored in the GeoSpock database; ensure this is correct before ingesting your source input data as once the data has been ingested, you will be unable to change it. See Types of data for more information.

Saving the data source description

The data source description generated by the GeoSpock CLI will not be persisted so you must save this file either locally or to an S3 bucket for use when you ingest your data.