Types of data

In order to get the best results from the GeoSpock database, you will need to choose the correct data types for each of the fields in your source data. The data source description (see Data source description files) specifies the type of data for each field that you want to ingest. This determines how the data is stored and indexed in the GeoSpock database.

It is important that you choose the correct data types for each of the fields in the source data before you ingest it. Once a dataset has been ingested, there is no way to change the data type associated with a field without completely re-ingesting the data with a modified data source description.

You should determine the sqlType by considering:

  • the type of data that is being represented
  • the precision that is needed to satisfy the queries for a given use case

Be aware that fields annotated with a purpose are restricted in the sqlType that you can assign to them (see Special fields).

Data precision

It is recommended that the maximum precision you choose for field is the lowest precision that can fully represent the source input data without loss of information or accuracy.

If, however, you want to improve the performance of a query, you should consider using an even lower precision for the data fields, although this will impact the accuracy of your query results.

Supported data types

The GeoSpock database supports the following types of data:

  • Timestamps
  • Text
  • Integers
  • Floating points

Timestamps

The GeoSpock database stores values representing moments in time in fields with the SQL type TIMESTAMP. During ingest, the GeoSpock database converts timestamps to the UTC time zone for storage, so timestamps returned by queries are in UTC.

A timestamp field is defined by specifying sqlType: TIMESTAMP, for example:

{
    “id”: “event_timestamp”,
    “sourceFieldName”: “date2”,
    “sqlType”: “TIMESTAMP”,
    “sourceFieldFormat”: “TIMESTAMP_EPOCH_MILLIS”,
}

You must specify the format of the timestamp in the source data using sourceFieldFormat. The following time formats are supported:

  • ISO 8601 format timestamps (TIMESTAMP_ISO_8601)
  • SQL format timestamps (TIMESTAMP_SQL)
  • Epoch time format in milliseconds (TIMESTAMP_EPOCH_MILLIS)
  • Epoch time in seconds (TIMESTAMP_EPOCH_SECONDS)
  • Custom time formats (TIMESTAMP_CUSTOM)
  • Parquet time format (TIMESTAMP_PARQUET)

See Timestamp formats for information about each of the supported formats.

Each value in the source data (for this field) will be validated against the specified time format.

Text

A text field contains alphanumeric data and is specified by declaring sqlType: varchar, as follows:

{
    “id”: “taxi_name”,
    “sourceFieldName”: “tid”,
    “sqlType”: “VARCHAR”
}

There is no character length limit on this data type.

All source data values can be represented as text, so no validation rules are applied.

Integers

An integer field is defined by specifying an integer sqlType of the required precision, for example:

{
    “id”: “jouney_count”,
    “sourceFieldName”: “journeys1”,
    “sqlType”: “SMALLINT”
}

The following precisions are supported:

Precision in bits SQL data type
8 TINYINT
16 SMALLINT
32 INTEGER
64 BIGINT

For the data to be ingested, each value in the source data must be representable as an integer, for example:

  • 99
  • 99.000
  • -19
  • -19.0

Floating points

A float field is defined by specifying a float sqlType of the required precision, for example:

{
    “id”: “velocity”,
    “sourceFieldName”: “velo”,
    “sqlType”: “DOUBLE”
}

Float fields can be defined using any of the float sqlTypes: REAL or DOUBLE

Float fields source values are validated if they can be represented as a number, for example:

  • 99
  • 99.00
  • 99.12
  • -99
  • -99.00
  • -99.12