Discover Singapore using Tableau
This page will run you through some sample queries used to extract insights from the Smart Singapore dataset using Tableau Workbooks. The queries reflect some of the data access patterns described in the Introduction to Geospatial Queries.
For these example queries we recommended using a query cluster with 10 worker nodes.
Q1. Station locations
We would like to start by visualizing the location of the sensor stations that generate wind speed, PM 2.5 and rainfall data. In this case, as we do not have a POI dataset with their locations, we extract the locations from the table containing the sensor values.
SELECT pm25_windspeed.sensor_name, pm25_windspeed.latitude, pm25_windspeed.longitude, pm25_windspeed.sensor_type FROM ( SELECT DISTINCT pm25.pm25_sensor_name as sensor_name, pm25.latitude as latitude, pm25.longitude as longitude, 'PM 2.5' as sensor_type FROM geospock.default.sg_pm25 as pm25 UNION SELECT DISTINCT windspeed.station_name as sensor_name, windspeed.latitude as latitude, windspeed.longitude as longitude, 'Wind Speed' as sensor_type FROM geospock.default.sg_wind_speed as windspeed ) AS pm25_windspeed UNION SELECT DISTINCT rainfall.rainfall_station_id, rainfall.latitude as latitude, rainfall.longitude AS longitude, 'Rainfall' as sensor_type FROM geospock.default.sg_rainfall as rainfall
The results can be visualized in Tableau's native map.
Q2. Daily averages across all stations
If we'd like to study the behavior of different sensor signals, we can query the daily averages across all stations:
-- pm25 day average SELECT latitude, longitude, pm25_sensor_name AS singapore_pm_25_sensor_name, DATE_TRUNC('day', pm25.original_timestamp) AS date_by_day, AVG(pm_25_sensor_value) AS avg_pm25 FROM geospock.default.sg_pm25 AS pm25 GROUP BY latitude, longitude, pm25_sensor_name, DATE_TRUNC('day', pm25.original_timestamp) -- rainfall average SELECT latitude, longitude, rainfall_station_id AS rainfall_sensor_id, DATE_TRUNC('day', rainfall.original_timestamp) AS date_by_day, AVG(rainfall_value) AS avg_rainfall FROM geospock.default.sg_rainfall AS rainfall GROUP BY latitude, longitude, rainfall_station_id, DATE_TRUNC('day', rainfall.original_timestamp) -- windspeed average SELECT latitude, longitude, wind_speed_station_id AS singapore_wind_speed_sensor_id, station_name AS singapore_wind_speed_sensor_name, DATE_TRUNC('day', windspeed.original_timestamp) AS date_by_day, AVG(wind_speed_value) AS avg_windspeed FROM geospock.default.sg_wind_speed AS windspeed GROUP BY latitude, longitude, wind_speed_station_id, station_name, DATE_TRUNC('day', windspeed.original_timestamp)
The results, shown below, display a clear maximum around October 2019, most likely due to the tropical forest fires in the neighboring countries that occurred around this time.
Q3. PM 2.5 hourly maximum
Additionally, we could take a look at the hourly maximum to establish if there is any pattern using the following query:
SELECT latitude, longitude, pm25_sensor_name AS singapore_pm_25_sensor_name, HOUR(pm25.original_timestamp) AS hour_of_the_day, MAX(pm_25_sensor_value) AS max_pm25 FROM geospock.default.sg_rainfall AS pm25 GROUP BY latitude, longitude, pm25_sensor_name, HOUR(pm25.original_timestamp)
While some peaks in PM 2.5 are at around 15:00-19:00, it's quite interesting that the same behavior could be seen later at night, around 21:00-23:00. Furthermore the graph shows that the west and south stations seem to provide similar signals.
Q4. Origin and destination vehicle counts
In the next example, we will analyze the ERP 2.0 synthetic dataset, a large dataset (108 TB) of vehicle traces.
The first analysis will focus on finding the origin and destination for trips at rush hour. While we can use Tableau's default density map, we illustrate below a visualization made using Kepler.gl. To enable Kepler in Tableau, please follow the instructions listed here and here. Note that the full query returns 282,839 records and that, given Kepler can only handle 10,000, we sample the result by a factor of 0.035.
WITH trips AS ( SELECT min_by( vehicles.latitude , vehicles.timestamp ) AS startlat, min_by( vehicles.longitude , vehicles.timestamp ) AS startlon, max_by( vehicles.latitude , vehicles.timestamp ) AS endlat, max_by( vehicles.longitude , vehicles.timestamp ) AS endlon, min(vehicles.timestamp) as starttime, arbitrary(vehicles.erp_vehicle_types) as vehicleType, vehicles.erp_vehicle_id FROM geospock.default.sg_erp2_synthesis AS vehicles WHERE vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 08:00:00' AND TIMESTAMP '2019-01-04 09:00:00' GROUP BY vehicles.erp_vehicle_id ), start_end AS ( SELECT trips.*, geometrystart.name as geometry_start_name, geometryend.name as geometry_end_name FROM trips JOIN geospock.default.sg_planning_area_census AS geometrystart ON ST_Within(ST_Point(trips.startlon, trips.startlat), ST_GeometryFromText(geometrystart.geometry)) JOIN geospock.default.sg_planning_area_census AS geometryend ON ST_Within(ST_Point(trips.endlon, trips.endlat), ST_GeometryFromText(geometryend.geometry)) ) SELECT * FROM start_end TABLESAMPLE BERNOULLI(0.035)
From the graphics we can see that the trips originate from the north, west and center of Singapore. On the other hand, most of trips end at the CBD (central Singapore).
Please note that Kepler has issues with some other type of plots in Tableau. Therefore, for certain types (such as point plots), we would recommend using Tableau's default map visualization.
Q4. Journeys car distributions
We can use the output from the previous query to visualize the distribution of vehicle types represented in the dataset.
Q5. Origin and destination by Singapore districts
Instead of focusing on the exact locations, we may want to look at trip origins and destinations at the district level. The structure of this query is constructed such that we obtain the origin and destination points, then manipulate the results to make it friendlier for Tableau.
WITH trips AS ( SELECT min(vehicles.timestamp) as starttime, arbitrary(vehicles.erp_vehicle_types) as vehicleType, vehicles.erp_vehicle_id, min_by( vehicles.longitude, vehicles.timestamp ) AS min_lon, min_by( vehicles.latitude, vehicles.timestamp ) AS min_lat, max_by( vehicles.longitude, vehicles.timestamp ) AS max_lon, max_by( vehicles.latitude, vehicles.timestamp ) AS max_lat FROM geospock.default.sg_erp2_synthesis AS vehicles WHERE vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 08:00:00' AND TIMESTAMP '2019-01-04 09:00:00' GROUP BY vehicles.erp_vehicle_id ), trips_geometry AS ( SELECT trips.starttime, trips.vehicleType, trips.erp_vehicle_id, trips.min_lon, trips.min_lat, trips.max_lon, trips.max_lat, CONCAT(geometrystart.name, '-',geometryend.name) as route, geometrystart.name as geometrystart_name, geometryend.name as geometryend_name, geometrystart.lat as geometrystart_lat, geometrystart.long as geometrystart_long, geometryend.lat as geometryend_lat, geometryend.long as geometryend_long FROM trips JOIN geospock.default.sg_planning_area_census AS geometrystart ON ST_Within(ST_Point(trips.min_lon, trips.min_lat), ST_GeometryFromText(geometrystart.geometry)) JOIN geospock.default.sg_planning_area_census AS geometryend ON ST_Within(ST_Point(trips.max_lon, trips.max_lat), ST_GeometryFromText(geometryend.geometry)) ) SELECT 1 as path_order, starttime, vehicleType, erp_vehicle_id, min_lon as longitude, min_lat as latitude, route, geometrystart_name, geometryend_name, geometrystart_long, geometrystart_lat, geometryend_long, geometryend_lat FROM trips_geometry UNION SELECT 2 as path_order, starttime, vehicleType, erp_vehicle_id, max_lon as longitude, max_lat as latitude, route, geometrystart_name, geometryend_name, geometrystart_long, geometrystart_lat, geometryend_long, geometryend_lat FROM trips_geometry
To generate the following visualization you'll have to download the Planning Area Census geometries .shp file that can be found here as Tableau does not work well with Well-Known-Text (WKT) geometries.
This visualization counts the number of trips between different Planning Area census geometries which could help city officials to develop infrastructure to satisfy commuters needs.
Q6. Distance travelled per vehicle inside of a polygon (Downtown)
A use case for the ERP 2.0 dataset is computing distance-based congestion charges. For this we calculate the distance travelled by vehicles within a congestion area, in this case the Downtown Core.
SELECT id, sum(distance) AS distance FROM ( SELECT event.erp_vehicle_id AS id, ST_Distance( to_spherical_geography(ST_Point(event.longitude, event.latitude)), to_spherical_geography(ST_Point( lead(event.longitude, 1) OVER ( PARTITION BY event.erp_vehicle_id ORDER BY event.timestamp ), lead(event.latitude, 1) OVER (PARTITION BY event.erp_vehicle_id ORDER BY event.timestamp) )) ) AS distance FROM geospock.default.sg_erp2_synthesis AS event WHERE ST_Within( ST_Point(event.longitude, event.latitude), ST_GeometryFromText('POLYGON ((103.8594 1.2998, ...))') ) AND event.timestamp BETWEEN timestamp '2019-01-04 00:00:00' AND TIMESTAMP '2019-01-04 23:59:59' ) GROUP BY id ORDER BY distance DESC
The Downtown Core can be seen in the following image:
The results of the query can be seen in the following bar graph. It shows the distance travelled by each individual vehicle and a possible associated congestion fee proportional to that distance.
The following bar graph is a follow up analysis of the distance query that shows the distribution of distance travelled and a possible associated distance based congestion charge across vehicles.
Q7. Vehicle counts by ERP cameras
The following query takes advantage of the 1 second resolution of GPS traces and the location of existing infrastructure by counting vehicles passing through that location in a day. This query takes advantage of the data access pattern spatial join of events vs. POIs and timestamp filtering.
SELECT COUNT(DISTINCT(vehicles.erp_vehicle_id)) AS total, hour(vehicles.timestamp) AS hour, erpcameras.id AS erp_camera_id, arbitrary(erpcameras.long) AS longitude, arbitrary(erpcameras.lat) AS latitude FROM geospock.default.sg_erp2_synthesis AS vehicles JOIN geospock.default.sg_erp1_camera_location AS erpcameras ON ST_Distance( ST_Point(erpcameras.long, erpcameras.lat), ST_Point(vehicles.longitude, vehicles.latitude) ) < 0.0001 WHERE (vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 00:00:00' AND TIMESTAMP '2019-01-04 23:59:59') GROUP BY 2,3
The screenshot of the animation shows the queries on a day (by camera), which in Tableau can be played as a video.
Q8. Vehicle counts using road segments
Similarly to Q7, we can get an idea of Singapore's traffic by calculating the count of vehicles per road segment.
WITH join_roadsegment_events AS ( SELECT count(1) AS tv, vehicles.erp_vehicle_id AS id, hour(vehicles.timestamp) AS hour, roadsegments.linkid AS singapore_road_segment_link_id , arbitrary(roadsegments.longitude) AS longitude, arbitrary(roadsegments.latitude) AS latitude FROM geospock.default.sg_pm25 AS vehicles JOIN geospock.default.sg_road_segments AS roadsegments ON ST_Distance( ST_Point(roadsegments.longitude, roadsegments.latitude), ST_Point(vehicles.longitude, vehicles.latitude) ) < 0.0001 WHERE (vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 00:00:00' AND TIMESTAMP '2019-01-04 23:59:59') AND (roadsegments.roadcategory = 'A' ) GROUP BY 2,3,4 ) SELECT sum(tv) AS total, hour, singapore_road_segment_link_id, arbitrary(longitude) AS longitude, arbitrary(latitude) AS latitude FROM join_roadsegment_events GROUP BY 2,3
The results can then be visualized in Tableau as an animation:
We have explored a few example queries you can run using Tableau and the datasets available through Discovery. This is by no means an exhaustive list of the types of analysis you can do – please continue your Discovery journey by trying out new queries, using our examples for inspiration.