Describing Parquet data
The following table shows the properties you must provide in a data source description for each data field in a location
, event
or properties
block with Parquet format data.
Setting | Description |
---|---|
id | The name for the column in the SQL table Example: event_elevation The ID specified should contain only numbers, lowercase letters and underscores and should start with a lowercase letter. |
sourceFieldName | The name of the field in the Parquet object Example: "height1" |
purpose | (Optional) This setting enables you to identify the following fields: - latitude - longitude - elevation - source_id See Special fields for more information. |
sqlType | The SQL data type for this field. For more information about the data types supported, see Types of data. Example: REAL |
For example:
{
"id": "taxi_id",
"sourceFieldName": "tid",
"purpose": "SOURCE_ID",
"sqlType": "VARCHAR"
}
Data validation
For Parquet format data, a valid parquet file will have rows consistent with its own internal schema. Your source data is therefore unlikely to get rejected because of invalid data.
For a given row, if the source field:
- is referenced that does not exist, the value associated with that field will be considered invalid.
- is an empty string, it will be interpreted as an empty string (the validity of this is based on its field specification)
- has a value null, it will be interpreted as
NULL
(the validity of this is based on its field specification)
Please do not attempt to ingest parquet files with columns made entirely of NULL
values, as this will cause the ingest to fail.