Source input data formats

The GeoSpock ingestor supports the following data formats:

All field values must be string, numeric or boolean.

If your source input data is in a different format, you will have to process it so that it conforms to one of the supported ingest formats.

File compression

The files may be uncompressed, or compressed with:

  • bzip2 (with the .bz2 suffix)
  • lzo (with the .lzo suffix)
  • gzip (with the .gz suffix)
  • Snappy (with the .snappy suffix)
  • lz4 (with the .lz4 suffix)
  • deflate

The ingestor does not support split archives, so you should make sure that your data files are small enough to be compressed; for further guidance, see the documentation about file size.

For compressed data files, you must add a file extension for each file to enable the ingestor to process the data correctly.

Data validation

During ingestion, the source input data is processed row-wise, based on the format of the source input file.

For all datasets:

  • a row will be considered invalid if any of the fields' values in the data source desscription are considered invalid
  • an invalid row will be excluded from the dataset