The data source description structure

The source data description contains the following sections:

  • the source data file format definition, that states the format of the source data file; see Specifying the source data file format
  • the event block or location block, that provides the fields in the source data that describe the event or location; see Types of dataset
  • (optional) a properties block, that describes any other properties in the source data that you want to ingest into the GeoSpock database; see Defining other fields

Note that source fields should only be referenced once in data source description.

Specifying the source data file format

At the top of the data source description, there is a section that describes the data format of the source file, for example:

{
    "format": "TABULAR","tabularFormatSeparator": "\t",
    "location": [
        {
            "id": "longitude",
    ...

Use the following format and separator properties the data source description to define the correct source file type:

Source data file format Data source description configuration Field object definition
Comma separated values (csv) "format": "TABULAR", "tabularFormatSeparator": ",", See Describing CSV format data
Tab separated values (tsv) "format": "TABULAR", "tabularFormatSeparator": "\t", See Describing TSV format data
JSON Lines (jsonl) "format": "JSON_LINES", See Describing JSON lines data
Parquet (parquet) "format": "PARQUET", See Describing Parquet data

Types of dataset

The GeoSpock database supports datasets containing:

  • location-based data
  • events-based data

For each type of dataset, you must provide the details of the fields in the source data that describe either the event or the location:

Location-based data

A location block, including the:

  • latitude
  • longitude
  • (recommended) source_id
...
"location": [
        {
            "id": "longitude",
            "sourceFieldIndex": 1,
            "purpose": "LONGITUDE",
            "sqlType": "DOUBLE"
        },
        {
            "id": "latitude",
            "sourceFieldIndex": 2,
            "purpose": "LATITUDE",
            "sqlType": "DOUBLE"
        },
        {
            "id": "organization_id",
            "sourceFieldIndex": 8,
            "purpose": "SOURCE_ID",
            "sqlType": "VARCHAR"
        }
    ],
...

A row will be considered valid only if all of the values associated with the LATITUDE and LONGITUDE fields defined in the location block are not NULL.

Event-based data

An event block, including the:

  • timestamp
  • latitude
  • longitude
  • (recommended) source_id
...
"event": [
    {
        "id": "longitude",
        "sourceFieldIndex": 1,
        "purpose": "LONGITUDE",
        "sqlType": "DOUBLE"
    },
    {
        "id": "latitude",
        "sourceFieldIndex": 2,
        "purpose": "LATITUDE",
        "sqlType": "DOUBLE"
    },
    {
        "id": "date",
        "sourceFieldIndex": 7,
        "sqlType": "TIMESTAMP",
        "sourceFieldFormat": "TIMESTAMP_ISO_8601"
    },
    {
        "id": "taxi-id",
        "sourceFieldIndex": 8,
        "purpose": "SOURCE_ID",
        "sqlType": "VARCHAR"
    }
    ],
...

A row will be considered valid only if all of the values associated with the TIMESTAMP, LATITUDE and LONGITUDE fields defined in the event block are not NULL.

It is strongly recommended that if you have more than one field for latitude, longitude or timestamp, you should avoid using ids that are identical apart from an underscore and a number for these fields, such as timestamp_1, as this negatively impacts the query optimizations in the GeoSpock database.

Defining other fields

If your source data file contains fields in addition to those that describe the event or location, you can describe these in the properties block:

...
"properties": [
        {
            "id": "catchment_square_meters",
            "sourceFieldIndex": 0,
            "sqlType": "BIGINT"
        },
        {
            "id": "street_number",
            "sourceFieldIndex": 3,
            "sqlType": "INTEGER"
        },
        {
            "id": "category_id",
            "sourceFieldIndex": 11,
            "sqlType": "SMALLINT"
        },
        {
            "id": "num_floors",
            "sourceFieldIndex": 5,
            "sqlType": "TINYINT"
        },
        {
            "id": "zip_code",
            "sourceFieldIndex": 6,
            "sqlType": "VARCHAR"
        },
...

Be aware that the properties block must not include a LATITUDE or LONGITUDE field.

For the: