CSV format

The GeoSpock ingestor expects data in CSV format to comply with the following:

  • Fields must be separated by a comma
  • Data must be encoded with UTF-8
  • Lines must be terminated by a Line Feed (character 10)
  • File names must be suffixed by .csv

Be aware that:

  • The ingestor does not trim spaces from fields. If you do not want spaces in the ingested data, you must remove them from the fields before you ingest the source input data
  • Content that contains commas must be quoted with double quotes (").
  • If a string contains double quotes, you must add double quotes around the quoted content, for example:
    "BA ""Bad Attitude"" Baracas"
  • The ingestor ignores the header line in a file, and does not use it to determine the content of the fields
  • Field ordering must remain the same between files which are part of the same dataset

For example:

"2aadb-99d-97943",42.32365,44.538375,12.5,1041037198

Describing CSV format data

To ingest your source input data, you need to provide a description of the source data for the ingestor. The ingestor uses this data source description to store the ingested data correctly in the GeoSpock database, enabling you to run your queries and do your data analysis. For more information see Creating a data source description for a dataset.

The following table shows the fields you must provide when describing this format of data in a data source description.

Setting Description
id

The name for the column in the SQL table

Example: event_elevation

The ID specified should contain only numbers, lowercase letters and underscores.

sourceFieldIndex The column index for this field in the source data

purpose

(Optional) This setting enables you to identify the following fields:

  • latitude
  • longitude
  • elevation
  • source_id

See Special fields (purpose) for more information.

sqlType

The SQL data type for this field. For more information about the data types supported, see Types of data.

Example: DOUBLE

For example:

        {
            "id": "organization_id",
            "sourceFieldIndex": 8,
            "purpose": "SOURCE_ID",
            "sqlType": "VARCHAR"
        }

Data validation

For CSV format data, each row will be split according to its tabularFormatSeparator. Fields that reference a column index that does not exist will associate NULL to that field’s value.

Empty values will be interpreted as NULL (the validity of this is based on its field specification)

Fields that reference a column index (using sourceFieldIndex) that does not exist will associate NULL to that field’s value (the validity of this is based on its field specification)