Modeling the seasonal pattern of domain counts: Associations with air travel and economic sentiment
ADAM report 1/2025
Code
library(dplyr)
library(readr)
library(ggplot2)
library(readr)
library(plotly)
library(lubridate)
library(tidyr)
library(knitr)
library(kableExtra)
library(formattable)
library(jsonlite)
library(mgcv)
library(gratia)
library(forecast)
library(forcats)
library(colorspace)
library(countrycode)
library(ggthemes)
library(scales)
library(performance)
library(purrr)
library(GGally)
theme_set(theme_minimal())
1 Introduction
Historically, we have observed a distinctive double-peaked seasonal pattern in domain registrations—monthly domain registrations usually increase the most during March and November, with a notable decrease in between and a slight dip at the end of the year (see Domain report 2024). Furthermore, monthly differences in second level domain counts reveal a similarly-shaped depression between the spring and autumn peaks, showing that the monthly increase in total domain counts slows down (and even turns into a decrease) as summer approaches (see Figure 1), with another slighter dip in December.1 What processes might lie behind such regular seasonal pattern?
Code
<- read_csv("data_hotels_cz.csv")
data_cz
plot_ly(
data = data_cz,
type = "scatter",
mode = "line",
x = ~ month,
y = ~ diff_domains,
hoverinfor = "text",
text = paste0(
"<b>Date</b>: ", data_cz$month, "-", data_cz$year, "<br>",
"<b>Domains</b>: ", data_cz$domains, "<br>",
"<b>Domain diff</b>: ", data_cz$diff_domains
),transforms =
list(
list(
type = "groupby",
groups = data_cz$year
)
)|>
) layout(
title = "Monthly differences in total domain counts",
xaxis = list(title = "Month"),
yaxis = list(title = "Domain difference")
)
When we observed this pattern in previous analyses, we assumed that the decreases might be associated with seasonal and cultural factors, such as preferences for outdoor activities during the summer months or Christmas and New Year’s Eve celebrations in December. However, we never empirically tested these assertions.
Alternatively, we could also hypothesize that domain holders’ economic situation fluctuates throughout the year, influencing the willingness to create or hold domains. In other words, if the overall economic situation of domain holders worsens, they should hold less domains, and inversely, when the economic situation improves, the domain change should be positive.
Therefore, this report aims to investigate two hypotheses which may provide explanations for the seasonal changes in domain counts:
Seasonally, changes in domain counts should be negative when Czech citizens vacation more frequently and positive when Czech citizens vacation less frequently.
Seasonally, changes in domain counts should be positive when the Czech citizens’ economic situation improves.
2 Data exploration
For this analysis, we utilized data on total domain counts of second-level domains under .CZ from the ADAM project’s database. For the vacation hypothesis, we utilized Eurostat’s data for the count of commercial flights (Eurostat 2024a), and data on hotel overnight stays from the Czech statistical office. For the economic hypothesis, we used the economic sentiment indicator (Eurostat 2024b), and inflation (Eurostat 2022).
Vacation data
- Count of flights captures scheduled and non-scheduled commercial air flights (passengers, freight, and mail) performed under Instrument Flight Rules (IFR) reported for the Czech Republic (see further details). Note that freight and mail are more or less stable throughout the year (with the exception of the Christmas period), so the seasonality in the count of flights lies in passenger transport.
- Hotel overnight stays capture the occupancy of collective accommodation establishments by residents of the Czech Republic.
Economic data
- Economic Sentiment Indicator (ESI) is calculated from a selection of questions in the industry, services, retail trade, construction and consumer surveys at country level and at aggregate level (EU and euro area) in order to track overall economic activity.
- Inflation is an economic indicator that measures the change of the prices of consumer goods and services acquired by households over time.
For the count of flights, hotel overnight stays, and domain counts, we computed monthly differences (i.e., the values capture the difference from the previous month). The inflation and economic sentiment variables were kept as they were.
In the graphs below (Figure 2 & Figure 3), we can observe that monthly differences in domains, hotel overnight stays, and flights exhibit some form of seasonality. However, it is evident that in 2020, this seasonality was significantly disrupted by the COVID-19 pandemic. The pandemic also had a pronounced impact on the values of the Economic Sentiment Indicator and inflation, both of which do not appear to exhibit clear seasonality, particularly the inflation graph.
Code
<-data_cz |>
plotselect(date,
diff_domains,
diff_flights,|>
diff_hotels) rename(Domains = diff_domains,
Flights = diff_flights,
Hotels = diff_hotels) |>
gather(type, n, Domains,
Flights,|>
Hotels)ggplot(aes(date, n, group =1, colour= type,
text = paste0(
date,"<b>\n", "Monthly differences", "</b>: ",
formatC(n,
format = "d",
big.mark = " "
)
)
)+
) geom_line() +
facet_wrap(~type, scales="free_y", ncol=2) +
theme(legend.position = "none") +
ylab("Monthly differences") +
xlab("Date")
ggplotly(plot, tooltip = "text")
Code
<- data_cz |>
plot2 select(date,
economy_sentiment,|>
inflation) rename(
"Economic sentiment" = economy_sentiment,
"Inflation" = inflation
|>
) gather(type, n,
"Economic sentiment",
|>
Inflation) ggplot(aes(date, n, group = 1,
colour = type,
text =
paste0(
date,"<b>\n", "Index value", "</b>: ",
formatC(n,
format = "d",
big.mark = " "
)
)
)+
) geom_line() +
facet_wrap(~type) +
theme(legend.position = "none") +
ylab("Index value") +
xlab("Date")
ggplotly(plot2, tooltip = "text")
In Figure 4, we can notice significant correlations between the monthly domain differences on one side and the monthly differences in the number of flights, the number of flights, and the Economic Sentiment Indicator values on the other side. This suggests a possible relationship between domain count trends and broader economic and travel activity within the Czech Republic.
Code
|>
data_cz select(diff_domains,
diff_flights,
flights,
diff_hotels,
economy_sentiment,
inflation|>
) rename(
"Domain diff" = diff_domains,
"Flight diff" = diff_flights,
"Flights" = flights,
"Hotel diff" = diff_hotels,
"Economic sentiment" = economy_sentiment,
"Inflation" = inflation
|>
) ggpairs(progress = FALSE) +
theme(axis.text.x =
element_text(
angle = 90,
hjust = 1,
size = 8)
)
However, it remains a question whether these correlations prevail once seasonality is taken into account.
3 Models
Prior to modeling, we removed observations before July 2020 as the values for the flights and the Economic Sentiment Indicator were hugely influenced by the COVID-19 pandemic before this date. We were not interested in such irregularities in this analysis.
Code
<- data_cz |>
data_cz filter(date >= "2020-07-01")
Because we observed significant correlations between the monthly domain differences and both commercial flight variations and the Economic Sentiment Indicator in Figure 4, we specified a generalized additive model (GAM, see Wood 2017) to test the proposed hypotheses. In an initial model, we predicted the monthly domain differences by a tensor product term interacting the economic sentiment with monthly time-flow, another tensor product term interacting the monthly differences in flights with monthly time-flow, and a smooth term for a yearly trend.
Code
<- gam(
gam_cz ~
diff_domains te(economy_sentiment, month, k = c(20, 12)) +
te(diff_flights, month, k = c(20, 12)) +
s(year, k = 4),
data = data_cz,
method = "REML"
)saveRDS(gam_cz, file = "gam_cz.rds")
Code
<- readRDS(file = "gam_cz.rds") gam_cz
However, because (a) the model estimated an insignificant interaction between monthly time-flow and the Economic Sentiment Indicator, and (b) a follow-up model, in which we dropped the interaction term in favor of a simple smooth term for the Economic Sentiment Indicator, proved more interesting, we report the initial model in the Appendix (Section 5) and focus on the follow-up model first (Section 3.1).
Note that we did not use the monthly difference in the number of hotel overnight stays variable as a predictor in the models reported below because of its positive correlation with the monthly difference of flights. Including the hotel overnight stays in the models would cause concurvity issues.
3.1 Model 1
In model 1, we predicted the monthly differences in domain counts by a thin-plate smooth term for the Economic Sentiment Indicator, by a tensor product interaction between the monthly time-flow and the monthly difference in flights, and by a thin-plate smooth term for a yearly trend.
Code
<- gam(
gam_cz_1 ~
diff_domains s(economy_sentiment) +
te(diff_flights, month, k = 12) +
s(year, k = 4),
data = data_cz,
method = "REML"
)saveRDS(gam_cz_1, file = "gam_cz_1.rds")
Code
<- readRDS(file = "gam_cz_1.rds") gam_cz_1