Describing Parquet data

The following table shows the properties you must provide in a data source description for each data field in a location, event or properties block with Parquet format data.

Setting Description
id The name for the column in the database table
Example: taxi_id
The ID specified should contain only numbers, lowercase letters and underscores and should start with a lowercase letter.
sourceFieldName The name of the field in the Parquet object
Example: "tid"
purpose (Optional)
This setting enables you to identify the following fields for specialized indexing:
- latitude
- longitude
- source_id
See Special fields for more information.
sqlType The SQL data type for this field. For more information about the data types supported, see Types of data.
Example: VARCHAR

For example:

{
    "id": "taxi_id",
    "sourceFieldName": "tid",
    "purpose": "SOURCE_ID",
    "sqlType": "VARCHAR"
}

Data validation

For Parquet format data, a valid parquet file will have rows consistent with its own internal schema. Your source data is therefore unlikely to get rejected because of invalid data.

For a given row, if the source field:

  • is referenced that does not exist, the value associated with that field will be considered invalid.
  • is an empty string, it will be interpreted as an empty string (the validity of this is based on its field specification)
  • has a value null, it will be interpreted as NULL (the validity of this is based on its field specification)

Please do not attempt to ingest parquet files with columns made entirely of NULL values, as this will cause the ingest to fail.