Available Datasets

GeoSpock DB Discovery provides you access to several datasets grouped by geographic region or topic. These datasets and their description are listed below.

Smart Singapore dataset list

We have compiled a set of open datasets for the city state of Singapore. The date ranges covered by the datasets vary from 2017-2020, but specifically have been selected to cover at least the second half of 2019.

In addition, we have generated a synthetic dataset to emulate the data generated by the novel Electronic Road Pricing 2.0 system (ERP 2.0). The data has been generated based on an assumed 4 million journeys per day for a 12-month period, where active vehicles are sampled every second. This dataset, spanning 1.3 trillion rows and 108 TB, enables users to test the scalability of the platform. It is worth mentioning that some datasets are static, as they are comprised of point of interests (POI's) or polygons.

No Table Description Raw data size Type Date range Source
1 sg_air_temperature Air temperature 127.8 MB Time series 2017-01-01 to 2020-06-30 source link
2 sg_erp1_camera_location ERP 1 gantries locations 7.3 KB POI Static source link
3 sg_erp2_synthesis Synthetic ERP 2.0 108 TB Time series 2019-01-01 to 2019-12-31 Generated by GeoSpock
4 sg_planning_area_census Planning area census regions 244.7 KB Polygon Static source link
5 sg_pm25 PM 2.5 1.1 MB Time series 2017-01-01 to 2020-06-30 source link
6 sg_psi PSI 16.3 MB Time series 2017-01-01 to 2020-06-30 source link
7 sg_rainfall Rainfall 66.1 MB Time series 2017-01-01 to 2020-06-30 source link
8 sg_region_census Singapore districts 170.0 KB Polygon Static source link
9 sg_relative_humidity Relative humidity 218.9 MB Time series 2017-01-01 to 2020-06-30 source link
10 sg_road_segments Road segments extracted from traffic speedbands data 9.8 MB POI Static source link
11 sg_speed_bands Traffic speed bands 25.2 GB Time series 2019-07-03 to 2019-12-31 source link
12 sg_speed_laser_camera Speed laser camera 3.7 KB POI Static source link
13 sg_taxi_availability Taxi availability 3.0 GB Time series 2018-12-31 to 2020-06-30 source link
14 sg_traffic_incidents Traffic incidents 38.5 MB Time series 2019-06-26 to 2020-12-31 source link
15 sg_subzone_census Subzone census 398.6 KB Polygon Static source link
16 sg_wind_direction Wind direction 221.2 MB Time series 2017-01-01 to 2020-06-30 source link
17 sg_wind_speed Wind speed 133.3 MB Time series 2017-01-01 to 2020-06-30 source link

Smart Singapore dataset descriptions

Below you will be able to find the outcome of performing the DESCRIBE sql command on each respective table and a visual representation of the datasets. The code used to create these graphics can be found in our Discovery examples GitHub repository, under sg_python/SmartSingapore-DataExploration.ipynb.

1. Dataset : sg_air_temperature

  • DESCRIBE geospock.default.sg_air_temperature returns:
Column Type Extra Comment
air_temperature_station_id varchar Nullable
latitude double LATITUDE index
longitude double LONGITUDE index
month_timestamp timestamp TIME index
station_name varchar Nullable
temperature_value double Nullable
original_timestamp timestamp Non-nullable
date varchar Nullable
  • A visualization of a sample of this dataset looks as follows, where the solid line is the average, and the shaded areas are the 95% confidence interval that is automatically computed by Seaborn:

1_sg_wind_speed_wind_speed_value_original_timestamp

2. Dataset: sg_erp1_camera_location

  • DESCRIBE geospock.default.sg_erp1_camera_location returns:
Column Type Extra Comment
lat double LATITUDE index
long double LONGITUDE index
geometry varchar Nullable
typ_cd varchar Nullable
typ_cd_des varchar Nullable
id varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

2_sg_erp1_camera_location

3. Dataset: sg_erp2_synthesis

  • DESCRIBE geospock.default.sg_erp2_synthesis returns:
Column Type Extra Comment
longitude double LONGITUDE index
latitude double LATITUDE index
erp_vehicle_id varchar Nullable
timestamp timestamp TIME index
erp_vehicle_types varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

3_sg_erp2

4. Dataset: sg_planning_area_census

  • DESCRIBE geospock.default.sg_planning_area_census returns:
Column Type Extra Comment
lat double LATITUDE index
long double LONGITUDE index
geometry varchar Nullable
name varchar Nullable
visibility varchar Nullable
open varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

4_sg_planning_area_census

5. Dataset: sg_pm25

  • DESCRIBE geospock.default.sg_pm25 returns:
Column Type Extra Comment
pm25_sensor_name varchar Nullable
latitude double LATITUDE index
longitude double LONGITUDE index
month_timestamp timestamp TIME index
pm_25_sensor_value double Nullable
original_timestamp timestamp Non-nullable
  • A visualization of a sample of this dataset looks as follows, where the solid line is the average, and the shaded areas are the 95% confidence interval that is automatically computed by Seaborn:

5_sg_pm25_pm_25_sensor_value_original_timestamp

6. Dataset: sg_psi

  • DESCRIBE geospock.default.sg_psi returns:
Column Type Extra Comment
latitude double LATITUDE index
longitude double LONGITUDE index
month_timestamp timestamp TIME index
locname varchar Nullable
sensortype varchar Nullable
sensorvalue double Nullable
timestamp timestamp Non-nullable
  • A visualization of a sample of this dataset looks as follows, where the solid line is the average, and the shaded areas are the 95% confidence interval that is automatically computed by Seaborn:

6_sg_psi_sensorvalue_timestamp

7. Dataset: sg_rainfall

  • DESCRIBE geospock.default.sg_rainfall returns:
Column Type Extra Comment
rainfall_station_id varchar Nullable
latitude double LATITUDE index
longitude double LONGITUDE index
month_timestamp timestamp TIME index
station_name varchar Nullable
rainfall_value double Nullable
original_timestamp timestamp Non-nullable
  • A visualization of a sample of this dataset looks as follows, where the solid line is the average, and the shaded areas are the 95% confidence interval that is automatically computed by Seaborn:

7_sg_rainfall_rainfall_value_original_timestamp

8. Dataset: sg_region_census

  • DESCRIBE geospock.default.sg_region_census returns:
Column Type Extra Comment
longitude double LONGITUDE index
latitude double LATITUDE index
geometry varchar Nullable
name varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

8_sg_region_census

9. Dataset: sg_relative_humidity

  • DESCRIBE geospock.default.sg_relative_humidity returns:
Column Type Extra Comment
longitude double LONGITUDE index
latitude double LATITUDE index
month_timestamp timestamp TIME index
station_id varchar Nullable
device_id varchar Nullable
name varchar Nullable
value double Nullable
timestamp timestamp Non-nullable
date varchar Nullable
  • A visualization of a sample of this dataset looks as follows, where the solid line is the average, and the shaded areas are the 95% confidence interval that is automatically computed by Seaborn:

9_sg_relative_humidity_value_timestamp

10. Dataset: sg_road_segments

  • DESCRIBE geospock.default.sg_road_segments returns:
Column Type Extra Comment
longitude double LONGITUDE index
latitude double LATITUDE index
linkid varchar Nullable
roadname varchar Nullable
roadcategory varchar Nullable
geometry varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

10_sg_road_segments

11. Dataset: sg_speed_bands

  • DESCRIBE geospock.default.sg_speed_bands returns:
Column Type Extra Comment
linkid varchar Nullable
timestamp timestamp TIME index
latitude1 double LATITUDE index
longitude1 double LONGITUDE index
roadname varchar Nullable
roadcategory varchar Nullable
speedband integer Nullable
minimumspeed integer Nullable
maximumspeed integer Nullable
latitude2 double Nullable
longitude2 double Nullable
  • A visualization of a sample of this dataset looks as follows:

11_sg_speed_bands

12. Dataset: sg_speed_laser_camera

  • DESCRIBE geospock.default.sg_speed_laser_camera returns:
Column Type Extra Comment
lat double LATITUDE index
long double LONGITUDE index
name varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

12_sg_speed_laser_camera

13. Dataset: sg_subzone_census

  • DESCRIBE geospock.default.sg_subzone_census returns:
Column Type Extra Comment
longitude double LONGITUDE index
latitude double LATITUDE index
geometry varchar Nullable
name varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

13_sg_subzone_census

14. Dataset: sg_taxi_availability

  • DESCRIBE geospock.default.sg_taxi_availability returns:
Column Type Extra Comment
longitude double LONGITUDE index
latitude double LATITUDE index
month_timestamp timestamp TIME index
timestamp timestamp Non-nullable
date varchar Nullable
  • A visualization of a sample of this dataset looks as follows:

14_sg_taxi_availability

15. Dataset: sg_traffic_incident

  • DESCRIBE geospock.default.sg_traffic_incidents returns:
Column Type Extra Comment
latitude double LATITUDE index
longitude double LONGITUDE index
month_timestamp timestamp TIME index
message varchar Nullable
type varchar Nullable
timestamp timestamp Non-nullable
  • A visualization of a sample of this dataset looks as follows:

15_sg_traffic_incidents

16. Dataset: sg_wind_direction

  • DESCRIBE geospock.default.sg_wind_direction returns:
Column Type Extra Comment
latitude double LATITUDE index
longitude double LONGITUDE index
month_timestamp timestamp TIME index
station_id varchar Nullable
device_id varchar Nullable
name varchar Nullable
value integer Nullable
timestamp timestamp Non-nullable
date varchar Nullable
  • A visualization of a sample of this dataset looks as follows, where the solid line is the average, and the shaded areas are the 95% confidence interval that is automatically computed by Seaborn:

16_sg_wind_direction_value_timestamp

17. Dataset: sg_wind_speed

  • DESCRIBE geospock.default.sg_wind_speed returns:
Column Type Extra Comment
wind_speed_station_id varchar Nullable
latitude double LATITUDE index
longitude double LONGITUDE index
month_timestamp timestamp TIME index
station_name varchar Nullable
wind_speed_value double Nullable
original_timestamp timestamp Non-nullable
  • A visualization of a sample of this dataset looks as follows, where the solid line is the average, and the shaded areas are the 95% confidence interval that is automatically computed by Seaborn:

17_sg_wind_speed_wind_speed_value_original_timestamp