TSV format

The GeoSpock ingestor expects data in TSV format to comply with the following:

  • Fields must be separated by a Horizontal Tab (character 9)
  • Content that contains Horizontal Tabs must be quoted with double quotes (")
  • Data must be encoded with UTF-8
  • Lines must be terminated by a Line Feed (character 10)
  • File names must be suffixed by .tsv

Be aware that:

  • The ingestor does not trim spaces from fields. If you do not want spaces in the ingested data, you must remove them from the fields before you ingest the source input data
  • The ingestor will strip double quotes that surround a field
  • The ingestor ignores the header line in a file, and does not use it to determine the content of the fields
  • Field ordering must remain the same between files which are part of the same dataset

Example content

In this example \t indicates a tab character:

2aadb-99d-97943\t42.32365\t44.538375\t12.5\t1041037198

Describing TSV format data

To ingest your source input data, you need to provide a description of the source data for the ingestor. The ingestor uses this data source description to store the ingested data correctly in the GeoSpock database, enabling you to run your queries and do your data analysis. For more information see Creating a data source description for a dataset.

The following table shows the fields you must provide when describing this format of data in a data source description.

Setting Description
id

The name for the column in the SQL table

Example: event_elevation

The ID specified should contain only numbers, lowercase letters and underscores.

sourceFieldIndex The column index for this field in the source data

purpose

(Optional) This setting enables you to identify the following fields:

  • latitude
  • longitude
  • elevation
  • source_id

See Special fields (purpose) for more information.

sqlType

The SQL data type for this field. For more information about the data types supported, see Types of data.

Example: DOUBLE

For example:

        {
            "id": "organization_id",
            "sourceFieldIndex": 8,
            "purpose": "SOURCE_ID",
            "sqlType": "VARCHAR"
        }

Data validation

For TSV format data, each row will be split according to its tabularFormatSeparator. Fields that reference a column index that does not exist will associate NULL to that field’s value.

Empty values will be interpreted as NULL (the validity of this is based on its field specification)

Fields that reference a column index (using sourceFieldIndex) that does not exist will associate NULL to that field’s value (the validity of this is based on its field specification)