pacman::p_load(tidyverse, lubridate, plotly, patchwork, readxl, cellranger, knitr, scales)Singapore Tourism Recovery Visual Analytics EDA
Background
Singapore’s inbound tourism landscape is characterised by a concentrated set of major source markets and a geographical distribution that evolves over time. Analysis of visitor arrivals from 2017 to 2025 indicates that Singapore’s tourism demand is anchored primarily in the Asia-Pacific region, while the relative importance of individual source countries shifts across different years. Such variation is visible not only in country rankings and market shares, but also in the wider spatial pattern of visitor origins. These dynamics make it insufficient to rely on static summary figures alone. A more effective approach is to integrate geospatial and comparative visual analysis in order to examine both the global distribution of visitor origins and the changing prominence of leading markets. Accordingly, this project proposes an interactive visual analytics prototype to support a clearer understanding of where Singapore’s visitors come from and how the structure of its source markets changes over time.
Preparation
Package Installation & Loading
tidyverse- A family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.plotly- R library for plotting interactive statistical graphs.patchwork- combining multiple ggplot2 graphs into one figure.readxl- R package for reading Excel workbooks into tidy analysis workflows.lubridate- provides intuitive functions for parsing, manipulating, and performing arithmetic on dates, times, and time zonescellranger- Translate Spreadsheet Cell Ranges to Rows and Columnsknitr- Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniquesscales- Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends
Data Exploration & Cleaning
path <- "../data/raw/tourism_update.xlsx"
# Read data starting from the actual observation rows
tour <- read_excel(
path,
sheet = "My Series",
skip = 29,
col_names = FALSE,
.name_repair = "minimal"
)
# Assign proper column names manually
names(tour) <- c(
"Date",
"Visitor Arrivals",
"Tourism Receipts: YTD: YoY",
"Tourist Expenditure Per Capita",
"Visitor Arrivals: China",
"Average Length of Stay",
"Visitor Arrivals: Malaysia",
"Visitor Arrivals: India",
"Visitor Arrivals: Indonesia",
"Visitor Arrivals: Australia",
"Number of Hotels",
"Visitor Days",
"Hotel Room Occupancy Rate",
"Visitor Arrivals: West Asia",
"Visitor Arrivals: Taiwan",
"Total Room Revenue",
"Number of Hotels.1",
"Average Length of Stay.1",
"Visitor Arrivals: Hong Kong SAR (China)",
"Hotel Revenue: per Available Room: Luxury",
"Hotel Revenue: per Available Room: Mid-Tier",
"Visitor Arrivals: ASEAN",
"Visitor Arrivals: Italy",
"Visitor Arrivals: Russian Federal (CIS)",
"Visitor Arrivals: France",
"Visitor Arrivals: Philippines",
"Visitor Arrivals: Spain",
"No of Hotel Room Stock",
"Visitor Arrivals: Thailand",
"Visitor Arrivals: Ireland",
"Visitor Arrivals: United Arab Emirates",
"Visitor Arrivals: United Kingdom",
"Visitor Arrivals: Africa",
"Visitor Arrivals: Bangladesh",
"Visitor Arrivals: Iran",
"Visitor Arrivals: New Zealand",
"Visitor Arrivals: Israel",
"Visitor Arrivals: North Asia",
"Visitor Arrivals: 8-10 Days",
"Visitor Arrivals: 11-14 Days",
"Visitor Arrivals: 15 Days & Over",
"Visitor Arrivals: Americas",
"Visitor Arrivals: Germany",
"Visitor Arrivals: Scandinavia: Sweden",
"Visitor Arrivals: Switzerland",
"Visitor Arrivals: USA",
"Visitor Arrivals: Canada",
"Visitor Arrivals: Mauritius",
"Visitor Arrivals: Kuwait",
"Visitor Arrivals: Egypt",
"Visitor Arrivals: Brunei",
"Visitor Arrivals: Finland",
"Visitor Arrivals: Japan",
"Visitor Arrivals: South Korea",
"Visitor Arrivals: Myanmar",
"Visitor Arrivals: Netherlands",
"Visitor Arrivals: Scandinavia: Norway",
"Visitor Arrivals: Saudi Arabia",
"Visitor Arrivals: Sri Lanka",
"Visitor Arrivals: Vietnam",
"Visitor Arrivals: Pakistan",
"Visitor Arrivals: Republic of South Africa"
)
# Convert date column
tour <- tour %>%
mutate(Date = as.Date(Date))
# Keep only 2017 to 2025
tour_clean <- tour %>%
filter(year(Date) >= 2017 & year(Date) <= 2025)names(tour) [1] "Date"
[2] "Visitor Arrivals"
[3] "Tourism Receipts: YTD: YoY"
[4] "Tourist Expenditure Per Capita"
[5] "Visitor Arrivals: China"
[6] "Average Length of Stay"
[7] "Visitor Arrivals: Malaysia"
[8] "Visitor Arrivals: India"
[9] "Visitor Arrivals: Indonesia"
[10] "Visitor Arrivals: Australia"
[11] "Number of Hotels"
[12] "Visitor Days"
[13] "Hotel Room Occupancy Rate"
[14] "Visitor Arrivals: West Asia"
[15] "Visitor Arrivals: Taiwan"
[16] "Total Room Revenue"
[17] "Number of Hotels.1"
[18] "Average Length of Stay.1"
[19] "Visitor Arrivals: Hong Kong SAR (China)"
[20] "Hotel Revenue: per Available Room: Luxury"
[21] "Hotel Revenue: per Available Room: Mid-Tier"
[22] "Visitor Arrivals: ASEAN"
[23] "Visitor Arrivals: Italy"
[24] "Visitor Arrivals: Russian Federal (CIS)"
[25] "Visitor Arrivals: France"
[26] "Visitor Arrivals: Philippines"
[27] "Visitor Arrivals: Spain"
[28] "No of Hotel Room Stock"
[29] "Visitor Arrivals: Thailand"
[30] "Visitor Arrivals: Ireland"
[31] "Visitor Arrivals: United Arab Emirates"
[32] "Visitor Arrivals: United Kingdom"
[33] "Visitor Arrivals: Africa"
[34] "Visitor Arrivals: Bangladesh"
[35] "Visitor Arrivals: Iran"
[36] "Visitor Arrivals: New Zealand"
[37] "Visitor Arrivals: Israel"
[38] "Visitor Arrivals: North Asia"
[39] "Visitor Arrivals: 8-10 Days"
[40] "Visitor Arrivals: 11-14 Days"
[41] "Visitor Arrivals: 15 Days & Over"
[42] "Visitor Arrivals: Americas"
[43] "Visitor Arrivals: Germany"
[44] "Visitor Arrivals: Scandinavia: Sweden"
[45] "Visitor Arrivals: Switzerland"
[46] "Visitor Arrivals: USA"
[47] "Visitor Arrivals: Canada"
[48] "Visitor Arrivals: Mauritius"
[49] "Visitor Arrivals: Kuwait"
[50] "Visitor Arrivals: Egypt"
[51] "Visitor Arrivals: Brunei"
[52] "Visitor Arrivals: Finland"
[53] "Visitor Arrivals: Japan"
[54] "Visitor Arrivals: South Korea"
[55] "Visitor Arrivals: Myanmar"
[56] "Visitor Arrivals: Netherlands"
[57] "Visitor Arrivals: Scandinavia: Norway"
[58] "Visitor Arrivals: Saudi Arabia"
[59] "Visitor Arrivals: Sri Lanka"
[60] "Visitor Arrivals: Vietnam"
[61] "Visitor Arrivals: Pakistan"
[62] "Visitor Arrivals: Republic of South Africa"
head(tour$Date)[1] "2015-12-01" "2016-12-01" "2017-01-01" "2017-02-01" "2017-03-01"
[6] "2017-04-01"
class(tour$Date)[1] "Date"
unique(year(tour_clean$Date))[1] 2017 2018 2019 2020 2021 2022 2023 2024 2025
Some preparation of data to be noticed:
- The year 2015, 2016 and 2026 were removed because they do not consist of full year data from Jan to Dec
country_cols <- c(
"Visitor Arrivals: China",
"Visitor Arrivals: Malaysia",
"Visitor Arrivals: India",
"Visitor Arrivals: Indonesia",
"Visitor Arrivals: Australia",
"Visitor Arrivals: Taiwan",
"Visitor Arrivals: Hong Kong SAR (China)",
"Visitor Arrivals: Italy",
"Visitor Arrivals: Russian Federal (CIS)",
"Visitor Arrivals: France",
"Visitor Arrivals: Philippines",
"Visitor Arrivals: Spain",
"Visitor Arrivals: Thailand",
"Visitor Arrivals: Ireland",
"Visitor Arrivals: United Arab Emirates",
"Visitor Arrivals: United Kingdom",
"Visitor Arrivals: Bangladesh",
"Visitor Arrivals: Iran",
"Visitor Arrivals: New Zealand",
"Visitor Arrivals: Israel",
"Visitor Arrivals: Germany",
"Visitor Arrivals: Scandinavia: Sweden",
"Visitor Arrivals: Switzerland",
"Visitor Arrivals: USA",
"Visitor Arrivals: Canada",
"Visitor Arrivals: Mauritius",
"Visitor Arrivals: Kuwait",
"Visitor Arrivals: Egypt",
"Visitor Arrivals: Brunei",
"Visitor Arrivals: Finland",
"Visitor Arrivals: Japan",
"Visitor Arrivals: South Korea",
"Visitor Arrivals: Myanmar",
"Visitor Arrivals: Netherlands",
"Visitor Arrivals: Scandinavia: Norway",
"Visitor Arrivals: Saudi Arabia",
"Visitor Arrivals: Sri Lanka",
"Visitor Arrivals: Vietnam",
"Visitor Arrivals: Pakistan",
"Visitor Arrivals: Republic of South Africa"
)
tour_long <- tour_clean %>%
select(Date, all_of(country_cols)) %>%
pivot_longer(
cols = -Date,
names_to = "Country",
values_to = "Arrivals"
) %>%
mutate(
Country = str_remove(Country, "^Visitor Arrivals: ")
)The dataset was reshaped into long format so that each row represents one country’s visitor arrivals for one specific month. The resulting structure contains three main variables: Date, Country, and Arrivals. This format is more suitable for time-series visualization and comparative analysis in R, as it allows countries to be treated as categories within a single variable rather than as separate columns. As a result, it supports cleaner code, easier grouping and filtering, and more effective use of ggplot2 for visualizing changes in visitor arrivals over time.
Before continuing with the analysis results, three date windows are used throughout the EDA: AY2017 Start - AY2019 End is treated as the pre_covid period, AY2020 Start - AY2021 End is treated as the covid_shock period, and AY2022 Start - AY2025 End is treated as the recovery period.
EDA 1 - Singapore Visitors
geo_year <- tour_long %>%
mutate(Year = year(Date)) %>%
group_by(Year, Country) %>%
summarise(
Arrivals = sum(Arrivals, na.rm = TRUE),
.groups = "drop"
)
geo_year <- geo_year %>%
mutate(
Country_map = recode(
Country,
"USA" = "United States",
"Russian Federal (CIS)" = "Russia",
"Hong Kong SAR (China)" = "Hong Kong",
"Scandinavia: Sweden" = "Sweden",
"Scandinavia: Norway" = "Norway",
"Republic of South Africa" = "South Africa"
)
)plot_ly(
data = geo_year,
type = "choropleth",
locations = ~Country_map,
locationmode = "country names",
z = ~Arrivals,
frame = ~Year,
text = ~paste(
"Country:", Country,
"<br>Year:", Year,
"<br>Arrivals:", format(Arrivals, big.mark = ",")
),
hoverinfo = "text",
colorscale = "Reds"
) %>%
layout(
title = "Visitor Arrivals to Singapore by Source Country (2017-2025)",
geo = list(showframe = FALSE, showcoastlines = TRUE)
)The geospatial results indicate that Singapore’s visitor arrivals are heavily concentrated in Asia, with nearby regional markets contributing the largest share of tourism inflows. This highlights the continued importance of geographical proximity, regional connectivity, and established travel links in shaping visitor demand. The map also shows that arrivals are not evenly distributed across the world, but are instead clustered around a limited number of key source markets.
Over time, the spatial pattern changes noticeably, especially around the pandemic period. Visitor distribution weakens during the years of disruption and gradually strengthens again in the later years, although the pace of recovery appears uneven across countries. This suggests that while Singapore’s tourism sector has recovered, the composition of its visitor markets may not have returned uniformly to its earlier pattern. Overall, the findings show that Singapore’s tourism demand remains regionally concentrated, but also sensitive to major external disruptions.
EDA 2 - Top Visitor Countries Comparison Pre and Post Covid
p1_data <- tour_long %>%
filter(year(Date) >= 2017 & year(Date) <= 2019)
p2_data <- tour_long %>%
filter(year(Date) >= 2022 & year(Date) <= 2025)p1_total_market <- tour_clean %>%
filter(year(Date) >= 2017 & year(Date) <= 2019) %>%
summarise(
Total_Market_Avg = round(mean(`Visitor Arrivals`, na.rm = TRUE), 0)
) %>%
pull(Total_Market_Avg)
# Country ranking in P1
p1_rank <- p1_data %>%
group_by(Country) %>%
summarise(
Avg_Monthly_Arrivals = round(mean(Arrivals, na.rm = TRUE), 0),
.groups = "drop"
) %>%
mutate(
Overall_Share_Percent = (Avg_Monthly_Arrivals / p1_total_market) * 100
) %>%
arrange(desc(Avg_Monthly_Arrivals)) %>%
mutate(Rank = row_number()) %>%
slice_head(n = 5)
p1_rank# A tibble: 5 × 4
Country Avg_Monthly_Arrivals Overall_Share_Percent Rank
<chr> <dbl> <dbl> <int>
1 China 285357 18.7 1
2 Indonesia 252402 16.5 2
3 India 114787 7.51 3
4 Malaysia 101198 6.62 4
5 Australia 92571 6.05 5
plot_ly(
data = p1_rank,
x = ~Avg_Monthly_Arrivals,
y = ~reorder(Country, Avg_Monthly_Arrivals),
type = "bar",
orientation = "h",
text = ~paste0(round(Overall_Share_Percent, 1), "%"),
textposition = "outside",
hovertemplate = "Average Monthly Arrivals: %{x:,}<extra></extra>"
) %>%
layout(
title = "P1 (2017-2019): Top 5 Visitor Countries",
xaxis = list(title = "Average Monthly Arrivals"),
yaxis = list(title = "Country")
)During the pre-COVID period (2017-2019), Singapore’s inbound tourism market was led by China and Indonesia, which recorded average monthly arrivals of 285,357 and 252,402 respectively. Their corresponding overall market shares, 18.66% and 16.51%, show that these two countries formed the dominant core of Singapore’s visitor market before the pandemic. India, Malaysia, and Australia followed at a noticeably lower level, with average monthly arrivals of 114,787, 101,198, and 92,571, accounting for 7.51%, 6.62%, and 6.05% of Singapore’s total inbound visitor arrivals respectively.
These results indicate that Singapore’s pre-pandemic visitor structure was concentrated in a relatively small number of major source markets, with China and Indonesia clearly occupying the most prominent positions. The gap between the top two countries and the remaining three suggests that the market was not evenly distributed even among the leading visitor sources. Instead, a substantial share of total visitor demand was anchored in a narrow group of dominant markets. This pattern points to a relatively concentrated tourism structure prior to COVID-19.
p2_total_market <- tour_clean %>%
filter(year(Date) >= 2022 & year(Date) <= 2025) %>%
summarise(
Total_Market_Avg = round(mean(`Visitor Arrivals`, na.rm = TRUE), 0)
) %>%
pull(Total_Market_Avg)
# Country ranking in P2
p2_rank <- p2_data %>%
group_by(Country) %>%
summarise(
Avg_Monthly_Arrivals = round(mean(Arrivals, na.rm = TRUE), 0),
.groups = "drop"
) %>%
mutate(
Overall_Share_Percent = (Avg_Monthly_Arrivals / p2_total_market) * 100
) %>%
arrange(desc(Avg_Monthly_Arrivals)) %>%
mutate(Rank = row_number()) %>%
slice_head(n = 5)
p2_rank# A tibble: 5 × 4
Country Avg_Monthly_Arrivals Overall_Share_Percent Rank
<chr> <dbl> <dbl> <int>
1 Indonesia 173636 15.6 1
2 China 159896 14.4 2
3 India 86675 7.80 3
4 Malaysia 86258 7.76 4
5 Australia 85258 7.67 5
plot_ly(
data = p2_rank,
x = ~Avg_Monthly_Arrivals,
y = ~reorder(Country, Avg_Monthly_Arrivals),
type = "bar",
orientation = "h",
text = ~paste0(round(Overall_Share_Percent, 1), "%"),
textposition = "outside",
hovertemplate = "Average Monthly Arrivals: %{x:,}<extra></extra>"
) %>%
layout(
title = "P2 (2022-2025): Top 5 Visitor Countries",
xaxis = list(title = "Average Monthly Arrivals"),
yaxis = list(title = "Country")
)During the post-COVID recovery period (2022-2025), Singapore’s visitor market recovered with a structure that remained broadly similar to the pre-pandemic period, but with some notable changes in relative ranking and market share. Indonesia ranked first during this period with an average monthly arrival figure of 173,636, accounting for 15.62% of Singapore’s total inbound visitor arrivals. China followed in second place with 159,896 average monthly arrivals and an overall market share of 14.39%. This indicates that although both countries remained central to Singapore’s tourism recovery, Indonesia overtook China in the post-pandemic period and emerged as the largest individual source market.
India remained in third place with an average monthly arrival level of 86,675, representing 7.80% of Singapore’s overall inbound market. Malaysia and Australia followed closely with 86,258 (7.76%) and 85,258 (7.67%) respectively. Compared with the pre-COVID period, the composition of the top five visitor source countries remained unchanged, suggesting that Singapore’s core visitor base demonstrated a relatively high degree of structural stability. However, the reduced arrival levels across all five countries indicate that recovery remained incomplete in absolute terms, even though the same major source markets continued to dominate the visitor mix.
A clearer view of recovery emerges when the P2 results are compared directly with the pre-COVID baseline. China’s average monthly arrivals declined from 285,357 in P1 to 159,896 in P2, while Indonesia fell from 252,402 to 173,636. Similar reductions are also observed for India, Malaysia, and Australia. These changes suggest that by 2022-2025, Singapore’s tourism sector had largely regained its core market structure, but not yet its earlier scale. In other words, recovery appears to have been stronger in terms of preserving the composition of key source markets than in terms of fully restoring visitor volume.
The ranking reversal between China and Indonesia is especially important. It suggests that the recovery process was uneven across major source markets. Indonesia’s stronger position in the post-COVID period may reflect a faster rebound in short-haul regional travel, whereas China’s drop from first to second place points to a relatively slower return compared with its pre-pandemic dominance. At the same time, the continued presence of India, Malaysia, and Australia in the top five reinforces the view that Singapore’s tourism recovery remained firmly anchored in established Asia-Pacific markets.
Conclusion
Overall, the analysis shows that Singapore’s visitor market remained strongly concentrated in a relatively small number of major source countries across both the pre-COVID and post-COVID periods. In the pre-pandemic years, China clearly dominated Singapore’s inbound tourism market, followed by a smaller second tier of key Asia-Pacific markets. In the post-COVID period, Indonesia overtook China to become the leading source market, while India, Malaysia, and Australia remained in the rest of the top five. This indicates that Singapore’s tourism recovery preserved the same core set of dominant source markets, even though their internal ordering changed and total visitor volume remained below the pre-pandemic level.
At the same time, the results consistently highlight Singapore’s strong dependence on Asia-Pacific markets. The leading visitor source countries in both periods were overwhelmingly drawn from Asia and the broader regional travel network, indicating that proximity, connectivity, and established travel demand remain central to Singapore’s tourism performance. Taken together, these findings suggest that while Singapore’s tourism market has shown resilience in maintaining a concentrated regional core, the recovery process has not fully restored the scale of the pre-pandemic visitor structure, even though the top-five composition remains stable.