Discover Singapore using Tableau

This page will run you through some sample queries used to extract insights from the Smart Singapore dataset using Tableau Workbooks. The queries reflect some of the data access patterns described in the Introduction to Geospatial Queries.

You can find detailed instructions on how to connect to GeoSpock DB Discovery from Tableau here. The Tableau workbook used in this post can be found in the examples GitHub repository.

For these example queries we recommended using a query cluster with 10 worker nodes.

Q1. Station locations

We would like to start by visualizing the location of the sensor stations that generate wind speed, PM 2.5 and rainfall data. In this case, as we do not have a POI dataset with their locations, we extract the locations from the table containing the sensor values.

SELECT
       pm25_windspeed.sensor_name,
       pm25_windspeed.latitude,
       pm25_windspeed.longitude,
       pm25_windspeed.sensor_type
  FROM (
    SELECT DISTINCT pm25.pm25_sensor_name as sensor_name, pm25.latitude as latitude, pm25.longitude as longitude, 'PM 2.5' as sensor_type
      FROM geospock.default.sg_pm25 as pm25
     UNION
    SELECT DISTINCT windspeed.station_name as sensor_name, windspeed.latitude as latitude, windspeed.longitude as longitude, 'Wind Speed' as sensor_type
      FROM geospock.default.sg_wind_speed as windspeed
) AS pm25_windspeed
 UNION
SELECT DISTINCT rainfall.rainfall_station_id, rainfall.latitude as latitude, rainfall.longitude AS longitude, 'Rainfall' as sensor_type
  FROM geospock.default.sg_rainfall as rainfall

The results can be visualized in Tableau's native map.

q1_stationlocations

Q2. Daily averages across all stations

If we'd like to study the behavior of different sensor signals, we can query the daily averages across all stations:

-- pm25 day average
SELECT
         latitude,
         longitude,
         pm25_sensor_name AS singapore_pm_25_sensor_name,
         DATE_TRUNC('day', pm25.original_timestamp) AS date_by_day,
         AVG(pm_25_sensor_value) AS avg_pm25
    FROM geospock.default.sg_pm25 AS pm25
GROUP BY latitude,
         longitude,
         pm25_sensor_name,
         DATE_TRUNC('day', pm25.original_timestamp)

-- rainfall average
SELECT
         latitude,
         longitude,
         rainfall_station_id AS rainfall_sensor_id,
         DATE_TRUNC('day', rainfall.original_timestamp) AS date_by_day,
         AVG(rainfall_value) AS avg_rainfall
    FROM geospock.default.sg_rainfall AS rainfall
GROUP BY
         latitude,
         longitude,
         rainfall_station_id,
         DATE_TRUNC('day', rainfall.original_timestamp)

-- windspeed average
SELECT
         latitude,
         longitude,
         wind_speed_station_id AS singapore_wind_speed_sensor_id,
         station_name AS singapore_wind_speed_sensor_name,
         DATE_TRUNC('day', windspeed.original_timestamp) AS date_by_day,
         AVG(wind_speed_value) AS avg_windspeed
    FROM geospock.default.sg_wind_speed AS windspeed
GROUP BY latitude,
         longitude,
         wind_speed_station_id,
         station_name,
         DATE_TRUNC('day', windspeed.original_timestamp)

The results, shown below, display a clear maximum around October 2019, most likely due to the tropical forest fires in the neighboring countries that occurred around this time.

q2_yearlydayavg

Q3. PM 2.5 hourly maximum

Additionally, we could take a look at the hourly maximum to establish if there is any pattern using the following query:

   SELECT
         latitude,
         longitude,
         pm25_sensor_name AS singapore_pm_25_sensor_name,
         HOUR(pm25.original_timestamp) AS hour_of_the_day,
         MAX(pm_25_sensor_value) AS max_pm25
    FROM geospock.default.sg_rainfall AS pm25
GROUP BY
         latitude,
         longitude,
         pm25_sensor_name,
         HOUR(pm25.original_timestamp)

While some peaks in PM 2.5 are at around 15:00-19:00, it's quite interesting that the same behavior could be seen later at night, around 21:00-23:00. Furthermore the graph shows that the west and south stations seem to provide similar signals.

q3_pm25hourlymax

Q4. Origin and destination vehicle counts

In the next example, we will analyze the ERP 2.0 synthetic dataset, a large dataset (108 TB) of vehicle traces.

The first analysis will focus on finding the origin and destination for trips at rush hour. While we can use Tableau's default density map, we illustrate below a visualization made using Kepler.gl. To enable Kepler in Tableau, please follow the instructions listed here and here. Note that the full query returns 282,839 records and that, given Kepler can only handle 10,000, we sample the result by a factor of 0.035.

WITH trips AS (
  SELECT
          min_by( vehicles.latitude , vehicles.timestamp )  AS startlat,
          min_by( vehicles.longitude , vehicles.timestamp ) AS startlon,
          max_by( vehicles.latitude , vehicles.timestamp ) AS endlat,
          max_by( vehicles.longitude , vehicles.timestamp ) AS endlon,
          min(vehicles.timestamp) as starttime,
          arbitrary(vehicles.erp_vehicle_types) as vehicleType,
          vehicles.erp_vehicle_id
  FROM geospock.default.sg_erp2_synthesis AS vehicles
  WHERE vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 08:00:00' AND TIMESTAMP '2019-01-04 09:00:00'
  GROUP BY vehicles.erp_vehicle_id
),
start_end AS (
  SELECT
         trips.*,
         geometrystart.name as geometry_start_name,
         geometryend.name as geometry_end_name
    FROM trips
    JOIN geospock.default.sg_planning_area_census AS geometrystart
      ON ST_Within(ST_Point(trips.startlon, trips.startlat), ST_GeometryFromText(geometrystart.geometry))
    JOIN geospock.default.sg_planning_area_census AS geometryend
      ON ST_Within(ST_Point(trips.endlon, trips.endlat), ST_GeometryFromText(geometryend.geometry))
)
SELECT *
FROM start_end TABLESAMPLE BERNOULLI(0.035)

From the graphics we can see that the trips originate from the north, west and center of Singapore. On the other hand, most of trips end at the CBD (central Singapore).

q4_origindestinationcounts

Please note that Kepler has issues with some other type of plots in Tableau. Therefore, for certain types (such as point plots), we would recommend using Tableau's default map visualization.

Q4. Journeys car distributions

We can use the output from the previous query to visualize the distribution of vehicle types represented in the dataset.

q4_carpercentage

Q5. Origin and destination by Singapore districts

Instead of focusing on the exact locations, we may want to look at trip origins and destinations at the district level. The structure of this query is constructed such that we obtain the origin and destination points, then manipulate the results to make it friendlier for Tableau.

WITH trips AS (
  SELECT
         min(vehicles.timestamp) as starttime,
         arbitrary(vehicles.erp_vehicle_types) as vehicleType,
         vehicles.erp_vehicle_id,
         min_by( vehicles.longitude, vehicles.timestamp ) AS min_lon,
         min_by( vehicles.latitude,  vehicles.timestamp ) AS min_lat,
         max_by( vehicles.longitude, vehicles.timestamp ) AS max_lon,
         max_by( vehicles.latitude,  vehicles.timestamp ) AS max_lat
    FROM geospock.default.sg_erp2_synthesis AS vehicles
   WHERE vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 08:00:00' AND TIMESTAMP '2019-01-04 09:00:00'
  GROUP BY vehicles.erp_vehicle_id
),
trips_geometry AS (
  SELECT
         trips.starttime, trips.vehicleType, trips.erp_vehicle_id,
         trips.min_lon,
         trips.min_lat,
         trips.max_lon,
         trips.max_lat,
         CONCAT(geometrystart.name, '-',geometryend.name) as route,
         geometrystart.name as geometrystart_name,
         geometryend.name   as geometryend_name,
         geometrystart.lat  as geometrystart_lat,
         geometrystart.long as geometrystart_long,
         geometryend.lat    as geometryend_lat,
         geometryend.long   as geometryend_long
    FROM trips
    JOIN geospock.default.sg_planning_area_census AS geometrystart
      ON ST_Within(ST_Point(trips.min_lon, trips.min_lat), ST_GeometryFromText(geometrystart.geometry))
    JOIN geospock.default.sg_planning_area_census AS geometryend
      ON ST_Within(ST_Point(trips.max_lon, trips.max_lat), ST_GeometryFromText(geometryend.geometry))
)
SELECT
  1 as path_order,
  starttime, vehicleType, erp_vehicle_id,
  min_lon as longitude,
  min_lat as latitude,
  route,
  geometrystart_name,
  geometryend_name,
  geometrystart_long, geometrystart_lat,
  geometryend_long, geometryend_lat
FROM trips_geometry
UNION
SELECT
  2 as path_order,
  starttime, vehicleType, erp_vehicle_id,
  max_lon as longitude,
  max_lat as latitude,
  route,
  geometrystart_name,
  geometryend_name,
  geometrystart_long, geometrystart_lat,
  geometryend_long, geometryend_lat
FROM trips_geometry

To generate the following visualization you'll have to download the Planning Area Census geometries .shp file that can be found here as Tableau does not work well with Well-Known-Text (WKT) geometries.

This visualization counts the number of trips between different Planning Area census geometries which could help city officials to develop infrastructure to satisfy commuters needs.

q5_origindestinationdistrict

Q6. Distance travelled per vehicle inside of a polygon (Downtown)

A use case for the ERP 2.0 dataset is computing distance-based congestion charges. For this we calculate the distance travelled by vehicles within a congestion area, in this case the Downtown Core.

SELECT
       id,
       sum(distance) AS distance
 FROM (
       SELECT
              event.erp_vehicle_id AS id,
              ST_Distance(
                  to_spherical_geography(ST_Point(event.longitude, event.latitude)),
                  to_spherical_geography(ST_Point(
                      lead(event.longitude, 1) OVER ( PARTITION BY event.erp_vehicle_id ORDER BY event.timestamp ),
                      lead(event.latitude, 1) OVER (PARTITION BY event.erp_vehicle_id ORDER BY event.timestamp)
                  ))
              ) AS distance
         FROM geospock.default.sg_erp2_synthesis AS event
        WHERE ST_Within(
                     ST_Point(event.longitude, event.latitude),
                     ST_GeometryFromText('POLYGON ((103.8594 1.2998, ...))')
              )
          AND event.timestamp BETWEEN timestamp '2019-01-04 00:00:00' AND TIMESTAMP '2019-01-04 23:59:59'
      )
GROUP BY id
ORDER BY distance DESC

The Downtown Core can be seen in the following image: q6_districts

The results of the query can be seen in the following bar graph. It shows the distance travelled by each individual vehicle and a possible associated congestion fee proportional to that distance.

q6_distancepercar

The following bar graph is a follow up analysis of the distance query that shows the distribution of distance travelled and a possible associated distance based congestion charge across vehicles.

q6_histograms

Q7. Vehicle counts by ERP cameras

The following query takes advantage of the 1 second resolution of GPS traces and the location of existing infrastructure by counting vehicles passing through that location in a day. This query takes advantage of the data access pattern spatial join of events vs. POIs and timestamp filtering.

SELECT
       COUNT(DISTINCT(vehicles.erp_vehicle_id)) AS total,
       hour(vehicles.timestamp) AS hour,
       erpcameras.id AS erp_camera_id,
       arbitrary(erpcameras.long) AS longitude,
       arbitrary(erpcameras.lat) AS latitude
  FROM geospock.default.sg_erp2_synthesis AS vehicles
  JOIN geospock.default.sg_erp1_camera_location AS erpcameras
    ON ST_Distance(
         ST_Point(erpcameras.long, erpcameras.lat),
         ST_Point(vehicles.longitude, vehicles.latitude)
       ) < 0.0001
WHERE (vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 00:00:00' AND TIMESTAMP '2019-01-04 23:59:59')
GROUP BY 2,3

The screenshot of the animation shows the queries on a day (by camera), which in Tableau can be played as a video.

q7_erpcameras

Q8. Vehicle counts using road segments

Similarly to Q7, we can get an idea of Singapore's traffic by calculating the count of vehicles per road segment.

WITH join_roadsegment_events AS (
  SELECT
         count(1) AS tv,
         vehicles.erp_vehicle_id AS id,
         hour(vehicles.timestamp) AS hour,
         roadsegments.linkid AS singapore_road_segment_link_id ,
         arbitrary(roadsegments.longitude) AS longitude,
         arbitrary(roadsegments.latitude) AS latitude
   FROM  geospock.default.sg_pm25 AS vehicles
   JOIN  geospock.default.sg_road_segments AS roadsegments
     ON  ST_Distance(
            ST_Point(roadsegments.longitude, roadsegments.latitude),
            ST_Point(vehicles.longitude, vehicles.latitude)
         ) < 0.0001
   WHERE (vehicles.timestamp BETWEEN TIMESTAMP '2019-01-04 00:00:00' AND TIMESTAMP '2019-01-04 23:59:59') AND (roadsegments.roadcategory = 'A' )
GROUP BY 2,3,4
)
 SELECT  sum(tv) AS total,
         hour,
         singapore_road_segment_link_id,
         arbitrary(longitude) AS longitude,
         arbitrary(latitude) AS latitude
    FROM join_roadsegment_events
GROUP BY 2,3

The results can then be visualized in Tableau as an animation:

q8_roadsegments

We have explored a few example queries you can run using Tableau and the datasets available through Discovery. This is by no means an exhaustive list of the types of analysis you can do – please continue your Discovery journey by trying out new queries, using our examples for inspiration.