Singapore Tourism Recovery Visual Analytics Prototype

Proposal

Author

Group Project Starter

Published

March 29, 2026

1. Overview & Motivation

Singapore’s tourism sector experienced a severe shock during the pandemic and entered an uneven recovery phase in the years that followed. Monthly patterns in visitor arrivals, hotel occupancy, length of stay, and hotel performance indicators all suggest distinct recovery stages. Static charts alone are insufficient for users who need to compare series, detect structural shifts, and assess what may happen next. This project therefore proposes an interactive visual analytics prototype to support exploration, clustering, and forecasting on one shared tourism time-series dataset.

2. Problem Statement

Current discussions on tourism recovery often focus narrowly on whether visitor numbers have returned, without adequately examining seasonality, source-market differences, or the way recovery paths vary across tourism series. In particular, there is a lack of an interactive time-series framework that allows users to compare country-level arrivals, detect recovery regimes, and assess how future demand may evolve.

3. Project Aim

The project aims to develop a visual analytics prototype based on monthly tourism time series, enabling users to understand tourism recovery at descriptive, structural, and forecasting levels. The prototype will integrate time-series exploration, time-series clustering, and forecasting analysis so that users can interpret recovery patterns and estimate how selected demand series may move in the near future.

4. The Data

The project data is sourced from CEIC and Singapore Tourism Board related series and is reliable in terms of data origin. The current prototype uses two coordinated workbook layers built from the same tourism-arrivals backbone:

visitor_arrivals_full_dataset.xlsx for the shared country-arrivals backbone used in clustering and forecasting
data/processed/arrivals_country_long.csv and data/processed/arrivals_country_wide.csv as reproducible shared country-arrivals outputs for comparative analysis
tourism_update.xlsx for the EDA/CDA views that compare arrivals, China share, hotel occupancy, and stay indicators in one curated sheet

Across these connected inputs, the analytical focus remains monthly visitor arrivals, arrivals from selected countries and regions such as China, hotel room occupancy rate, monthly average length of stay, number of hotels, and total room revenue. The app uses one shared country-arrivals backbone for clustering and forecasting, while the EDA/CDA pages rely on a curated workbook derived from the same tourism context so that descriptive and confirmatory views still speak to the same recovery story.

5. Research Questions

The project focuses on the following questions:

How did Singapore’s tourism market differ across the pre-COVID, shock, and recovery periods?
Did the recovery of selected source markets move in tandem with the broader tourism market?
Can country-level arrivals trajectories be grouped into meaningful recovery-pattern clusters?
How well can selected country-level visitor-arrivals series be forecast using a baseline method and a model-based approach?

6. Methodology and Analytical Approach

The project adopts three main analytical modules:

Time-series exploration to reveal trends, seasonality, and unusual shifts across target series.
Time-series clustering to group country-level arrivals trajectories into interpretable recovery-pattern clusters.
Forecasting to compare a baseline method against a model-based approach on selected country-arrivals series and evaluate holdout accuracy, while interpreting those forecasts against hotel and stay indicators.

These modules are designed to use the same monthly dataset so that users can move from description to structure to prediction without changing data context.

Within the clustering module, the app is designed to move beyond a simple membership table. Users first read a dashboard summarising cluster quality and the dominant trajectory patterns, then inspect the pattern atlas, focus-market placement, and final assignments. This ensures that the clustering output remains interpretable as a recovery story rather than as an isolated machine-learning result.

7. Forecasting Module

The forecasting module replaces the earlier tree-model direction with a time-series prediction workflow that better fits the tourism dataset. Instead of classifying recovery into tree-based outputs, this module focuses on country-level monthly visitor arrivals and estimates short-term future demand using time-aware forecasting methods.

Following Chapters 19 and 20 of R for Visual Analytics, the module will:

inspect the selected arrivals series through trend, seasonal, and decomposition views
create a time-aware train/test split
compare a baseline seasonal-naive forecast against ETS and ARIMA
evaluate forecast quality with holdout metrics such as RMSE, MAE, and MAPE
interpret the demand forecast against supporting tourism indicators such as hotel occupancy, stay length, and room revenue

In the deployed Shiny app, users will be able to choose one country-arrival series at a time, run the forecasting workflow, compare model performance, and review both the holdout forecast and the forward projection within the same page.

8. Data Visualisation Methods

The prototype will include time-series line charts, target-series comparison views, pattern atlases, recovery position maps, cluster profile tables, forecast plots, and forecast accuracy tables. These visualizations are designed not merely to display results, but to reveal seasonal structure, regime shifts, recovery-pattern similarities, and the relative performance of competing forecasting methods.

9. R Packages

Package	Description
tidyverse	Used for data cleaning, transformation, filtering, summarisation, and general data manipulation.
readxl	Used to import the original Excel dataset into R.
lubridate	Used to process date variables and create time-based fields.
ggplot2	Used to create line charts, clustering plots, and forecast visualisations.
plotly	Used to add interactivity to selected visualisations where needed.
DT	Used to display interactive data tables in the prototype.
cluster	Used for clustering analysis and cluster quality assessment such as silhouette scores.
factoextra	Used to visualise clustering results, elbow plots, and cluster profiles.
forecast	Used to fit baseline and model-based time-series forecasts such as seasonal naive, ETS, and ARIMA.
tidymodels	Used to support time-aware train/test splitting and the forecasting modelling workflow.
timetk	Used to support time-series exploration and modeltime-friendly forecasting workflows.
modeltime	Used to calibrate, compare, refit, and forecast time-series models in a unified workflow.
tsibble	Used to represent time-indexed series for decomposition and temporal analysis.
feasts	Used to derive decomposition and seasonal diagnostics before model fitting.
patchwork	Used to combine multiple ggplot charts into one display layout.

10. Shared UI and Storyboard Direction

The final Shiny application is organised as one integrated workflow with three coordinated tabs:

Time Series Visual Analysis: users choose a tourism series, inspect the recent window, and confirm metadata before switching to deeper analysis.
Time Series Clustering: users select a focused market set, choose the year window and normalization, run clustering, and then read the result through four pages: Dashboard, Pattern Explorer, Focus Market in Context, and Assignments.
Forecasting: users choose a country-level arrivals series, select the forecast horizon and the model set to compare, and then review forecast, context, seasonality, decomposition, and interpretation views.

This storyboard direction keeps the interaction path consistent across the project website, the prototype pages, and the deployed Shiny application.