Singapore Tourism Recovery Visual Analytics Prototype
Proposal
1. Overview & Motivation
Singapore’s tourism sector experienced a severe shock during the pandemic and entered an uneven recovery phase in the years that followed. Monthly patterns in visitor arrivals, hotel occupancy, length of stay, and hotel performance indicators all suggest distinct recovery stages. Static charts alone are insufficient for users who need to compare series, detect structural shifts, and assess what may happen next. This project therefore proposes an interactive visual analytics prototype to support exploration, clustering, and forecasting on one shared tourism time-series dataset.
2. Problem Statement
Current discussions on tourism recovery often focus narrowly on whether visitor numbers have returned, without adequately examining seasonality, source-market differences, or the way recovery paths vary across tourism series. In particular, there is a lack of an interactive time-series framework that allows users to compare country-level arrivals, detect recovery regimes, and assess how future demand may evolve.
3. Project Aim
The project aims to develop a visual analytics prototype based on monthly tourism time series, enabling users to understand tourism recovery at descriptive, structural, and forecasting levels. The prototype will integrate time-series exploration, time-series clustering, and forecasting analysis so that users can interpret recovery patterns and estimate how selected demand series may move in the near future.
4. The Data
The project data is sourced from CEIC and Singapore Tourism Board related series and is reliable in terms of data origin. The current prototype uses two coordinated workbook layers built from the same tourism-arrivals backbone:
visitor_arrivals_full_dataset.xlsxfor the shared country-arrivals backbone used in clustering and forecastingdata/processed/arrivals_country_long.csvanddata/processed/arrivals_country_wide.csvas reproducible shared country-arrivals outputs for comparative analysistourism_update.xlsxfor the EDA/CDA views that compare arrivals, China share, hotel occupancy, and stay indicators in one curated sheet
Across these connected inputs, the analytical focus remains monthly visitor arrivals, arrivals from selected countries and regions such as China, hotel room occupancy rate, monthly average length of stay, number of hotels, and total room revenue. The app uses one shared country-arrivals backbone for clustering and forecasting, while the EDA/CDA pages rely on a curated workbook derived from the same tourism context so that descriptive and confirmatory views still speak to the same recovery story.
5. Research Questions
The project focuses on the following questions:
- How did Singapore’s tourism market differ across the pre-COVID, shock, and recovery periods?
- Did the recovery of selected source markets move in tandem with the broader tourism market?
- Can country-level arrivals trajectories be grouped into meaningful recovery-pattern clusters?
- How well can selected country-level visitor-arrivals series be forecast using a baseline method and a model-based approach?
6. Methodology and Analytical Approach
The project adopts three main analytical modules:
- Time-series exploration to reveal trends, seasonality, and unusual shifts across target series.
- Time-series clustering to group country-level arrivals trajectories into interpretable recovery-pattern clusters.
- Forecasting to compare a baseline method against a model-based approach on selected country-arrivals series and evaluate holdout accuracy, while interpreting those forecasts against hotel and stay indicators.
These modules are designed to use the same monthly dataset so that users can move from description to structure to prediction without changing data context.
Within the clustering module, the app is designed to move beyond a simple membership table. Users first read a dashboard summarising cluster quality and the dominant trajectory patterns, then inspect the pattern atlas, focus-market placement, and final assignments. This ensures that the clustering output remains interpretable as a recovery story rather than as an isolated machine-learning result.
7. Forecasting Module
The forecasting module replaces the earlier tree-model direction with a time-series prediction workflow that better fits the tourism dataset. Instead of classifying recovery into tree-based outputs, this module focuses on country-level monthly visitor arrivals and estimates short-term future demand using time-aware forecasting methods.
Following Chapters 19 and 20 of R for Visual Analytics, the module will:
- inspect the selected arrivals series through trend, seasonal, and decomposition views
- create a time-aware train/test split
- compare a baseline seasonal-naive forecast against ETS and ARIMA
- evaluate forecast quality with holdout metrics such as RMSE, MAE, and MAPE
- interpret the demand forecast against supporting tourism indicators such as hotel occupancy, stay length, and room revenue
In the deployed Shiny app, users will be able to choose one country-arrival series at a time, run the forecasting workflow, compare model performance, and review both the holdout forecast and the forward projection within the same page.
8. Data Visualisation Methods
The prototype will include time-series line charts, target-series comparison views, pattern atlases, recovery position maps, cluster profile tables, forecast plots, and forecast accuracy tables. These visualizations are designed not merely to display results, but to reveal seasonal structure, regime shifts, recovery-pattern similarities, and the relative performance of competing forecasting methods.
9. R Packages
| Package | Description |
|---|---|
| tidyverse | Used for data cleaning, transformation, filtering, summarisation, and general data manipulation. |
| readxl | Used to import the original Excel dataset into R. |
| lubridate | Used to process date variables and create time-based fields. |
| ggplot2 | Used to create line charts, clustering plots, and forecast visualisations. |
| plotly | Used to add interactivity to selected visualisations where needed. |
| DT | Used to display interactive data tables in the prototype. |
| cluster | Used for clustering analysis and cluster quality assessment such as silhouette scores. |
| factoextra | Used to visualise clustering results, elbow plots, and cluster profiles. |
| forecast | Used to fit baseline and model-based time-series forecasts such as seasonal naive, ETS, and ARIMA. |
| tidymodels | Used to support time-aware train/test splitting and the forecasting modelling workflow. |
| timetk | Used to support time-series exploration and modeltime-friendly forecasting workflows. |
| modeltime | Used to calibrate, compare, refit, and forecast time-series models in a unified workflow. |
| tsibble | Used to represent time-indexed series for decomposition and temporal analysis. |
| feasts | Used to derive decomposition and seasonal diagnostics before model fitting. |
| patchwork | Used to combine multiple ggplot charts into one display layout. |