Types of data

You should determine the sqlType by considering:

  • which type of data is being represented
  • the precision that is needed to satisfy the queries for a given use case

The data source description specifies the type of data for each field that you want to ingest (see Creating a data source description for a dataset). Make sure you select the correct field types for your source data before you ingest it; once the data has been ingested, you cannot change its data type.

Be aware that fields annotated with a purpose are restricted in the sqlType that you can assign to them (see Special fields (purpose));

Data precision

It is recommended that the maximum precision you choose for field is the lowest precision that can fully represent the source input data without loss of information or accuracy.

If, however, you want to improve the performance of a query, you should consider using an even lower precision for the data fields, although this will impact the accuracy of your query results.

To successfully analyze your data, you need to make sure that you have selected the correct data types for each of the fields in the source data. How you configure these fields in the data source description determines how the data is stored and indexed in the GeoSpock database. Once a dataset has been ingested, you cannot change the data types associated with the fields. To change this, you would need to edit the data source description and then re-ingest your source input data for that dataset. See Types of data for more information about the data types supported by the Geospock database.

Supported data types

The GeoSpock database supports the following types of data:

Timestamps

The GeoSpock database stores values representing moments in time in fields with the SQL type TIMESTAMP. During ingest, the GeoSpock database converts timestamps to the UTC time zone for storage, so timestamps returned by queries are in UTC.

A timestamp field is defined by specifying sqlType: TIMESTAMP, for example:

{
    “id”: “event_timestamp”,
    “sourceFieldName”: “date2”,
    “sqlType”: “TIMESTAMP”,
    “sourceFieldFormat”: “TIMESTAMP_EPOCH_MILLIS”,
}

You must also specify the format of the timestamp in the source data using sourceFieldFormat. The following time formats are supported:

  • ISO 8601 format timestamps (TIMESTAMP_ISO_8601)
  • SQL format timestamps (TIMESTAMP_SQL)
  • Epoch time format in milliseconds (TIMESTAMP_EPOCH_MILLIS)
  • Epoch time in seconds (TIMESTAMP_EPOCH_SECONDS)
  • Custom time formats (TIMESTAMP_CUSTOM)
  • Parquet time format (TIMESTAMP_PARQUET)

See Timestamp formats (sourceFieldFormat) for information about each of the supported formats.

Each value in the source data (for this field) will be validated against the specified time format.

Text

A text field contains alphanumeric data and is specified by declaring sqlType: varchar, as follows:

{
    “id”: “taxi_name”,
    “sourceFieldName”: “tid”,
    “sqlType”: “VARCHAR”
}

There is no character length limit on this data type.

All source data values can be represented as text, so no validation rules are applied.

Integers

An integer field is defined by specifying an integer sqlType of the required precision, for example:

{
    “id”: “jouney_count”,
    “sourceFieldName”: “journeys1”,
    “sqlType”: “SMALLINT”
}

The following precisions are supported:

Precision in bits SQL data type
8 TINYINT
16 SMALLINT
32 INTEGER
64 BIGINT

For the data to be ingested, each value in the source data must be representable as an integer, for example:

  • 99
  • 99.000
  • -19
  • -19.0

Floating points

A float field is defined by specifying a float sqlType of the required precision, for example:

{
    “id”: “velocity”,
    “sourceFieldName”: “velo”,
    “sqlType”: “DOUBLE”
}

Float fields can be defined using any of the float sqlTypes: REAL or DOUBLE

Float fields source values are validated if they can be represented as a number, for example:

  • 99
  • 99.00
  • 99.12
  • -99
  • -99.00
  • -99.12