Modeling the seasonal pattern of domain counts: Associations with air travel and economic sentiment

ADAM report 1/2025

Author

Maria Quiros Segovia & Dan Řezníček

Published

January 20, 2025

Code
library(dplyr)
library(readr)
library(ggplot2)
library(readr)
library(plotly)
library(lubridate)
library(tidyr)
library(knitr)
library(kableExtra)
library(formattable)
library(jsonlite)
library(mgcv)
library(gratia)
library(forecast)
library(forcats)
library(colorspace)
library(countrycode)
library(ggthemes)
library(scales)
library(performance)
library(purrr)
library(GGally)
theme_set(theme_minimal())

1 Introduction

Historically, we have observed a distinctive double-peaked seasonal pattern in domain registrations—monthly domain registrations usually increase the most during March and November, with a notable decrease in between and a slight dip at the end of the year (see Domain report 2024). Furthermore, monthly differences in second level domain counts reveal a similarly-shaped depression between the spring and autumn peaks, showing that the monthly increase in total domain counts slows down (and even turns into a decrease) as summer approaches (see Figure 1), with another slighter dip in December.1 What processes might lie behind such regular seasonal pattern?

Code
data_cz <- read_csv("data_hotels_cz.csv")

plot_ly(
  data = data_cz,
  type = "scatter",
  mode = "line",
  x    = ~ month,
  y    = ~ diff_domains,
  hoverinfor = "text",
  text       = paste0(
    "<b>Date</b>: ", data_cz$month, "-", data_cz$year, "<br>",
    "<b>Domains</b>: ", data_cz$domains, "<br>",
    "<b>Domain diff</b>: ", data_cz$diff_domains
    ),
  transforms =
    list(
      list(
        type = "groupby",
        groups = data_cz$year
        )
      )
  ) |>
  layout(
    title = "Monthly differences in total domain counts",
    xaxis = list(title = "Month"),
    yaxis = list(title = "Domain difference")
    )
Figure 1: Seasonality of monthly differences in total domain counts.

When we observed this pattern in previous analyses, we assumed that the decreases might be associated with seasonal and cultural factors, such as preferences for outdoor activities during the summer months or Christmas and New Year’s Eve celebrations in December. However, we never empirically tested these assertions.

Alternatively, we could also hypothesize that domain holders’ economic situation fluctuates throughout the year, influencing the willingness to create or hold domains. In other words, if the overall economic situation of domain holders worsens, they should hold less domains, and inversely, when the economic situation improves, the domain change should be positive.

Therefore, this report aims to investigate two hypotheses which may provide explanations for the seasonal changes in domain counts:

  1. Seasonally, changes in domain counts should be negative when Czech citizens vacation more frequently and positive when Czech citizens vacation less frequently.

  2. Seasonally, changes in domain counts should be positive when the Czech citizens’ economic situation improves.

2 Data exploration

For this analysis, we utilized data on total domain counts of second-level domains under .CZ from the ADAM project’s database. For the vacation hypothesis, we utilized Eurostat’s data for the count of commercial flights (Eurostat 2024a), and data on hotel overnight stays from the Czech statistical office. For the economic hypothesis, we used the economic sentiment indicator (Eurostat 2024b), and inflation (Eurostat 2022).

Definitions

Vacation data

  • Count of flights captures scheduled and non-scheduled commercial air flights (passengers, freight, and mail) performed under Instrument Flight Rules (IFR) reported for the Czech Republic (see further details). Note that freight and mail are more or less stable throughout the year (with the exception of the Christmas period), so the seasonality in the count of flights lies in passenger transport.
  • Hotel overnight stays capture the occupancy of collective accommodation establishments by residents of the Czech Republic.

Economic data

  • Economic Sentiment Indicator (ESI) is calculated from a selection of questions in the industry, services, retail trade, construction and consumer surveys at country level and at aggregate level (EU and euro area) in order to track overall economic activity.
  • Inflation is an economic indicator that measures the change of the prices of consumer goods and services acquired by households over time.

For the count of flights, hotel overnight stays, and domain counts, we computed monthly differences (i.e., the values capture the difference from the previous month). The inflation and economic sentiment variables were kept as they were.

In the graphs below (Figure 2 & Figure 3), we can observe that monthly differences in domains, hotel overnight stays, and flights exhibit some form of seasonality. However, it is evident that in 2020, this seasonality was significantly disrupted by the COVID-19 pandemic. The pandemic also had a pronounced impact on the values of the Economic Sentiment Indicator and inflation, both of which do not appear to exhibit clear seasonality, particularly the inflation graph.

Code
plot<-data_cz |>
  select(date,
         diff_domains,
         diff_flights,
         diff_hotels) |>
  rename(Domains = diff_domains,
         Flights = diff_flights,
         Hotels =  diff_hotels) |>
  gather(type, n, Domains,
         Flights,
         Hotels)|>
  ggplot(aes(date, n, group =1, colour= type,
             text = paste0(
               date,
               "<b>\n", "Monthly differences", "</b>: ",
               formatC(n,
                       format = "d",
                       big.mark = " "
                       )
               )
             )
         ) +
  geom_line() +
  facet_wrap(~type, scales="free_y", ncol=2) +
  theme(legend.position = "none") +
  ylab("Monthly differences") +
  xlab("Date")

ggplotly(plot, tooltip = "text")
Figure 2: Data seasonality—Vacation patterns data.
Code
plot2 <- data_cz |>
  select(date,
         economy_sentiment,
         inflation) |>
  rename(
    "Economic sentiment" = economy_sentiment,
    "Inflation"          = inflation
  ) |>
  gather(type, n,
         "Economic sentiment",
         Inflation) |>
  ggplot(aes(date, n, group = 1,
             colour = type,
             text =
               paste0(
                 date,
                 "<b>\n", "Index value", "</b>: ",
                 formatC(n,
                         format = "d",
                         big.mark = " "
                         )
                 )
             )
         ) +
  geom_line() +
  facet_wrap(~type) +
  theme(legend.position = "none") +
  ylab("Index value") +
  xlab("Date")

ggplotly(plot2, tooltip = "text")
Figure 3: Data seasonality—Economic indicators data.

In Figure 4, we can notice significant correlations between the monthly domain differences on one side and the monthly differences in the number of flights, the number of flights, and the Economic Sentiment Indicator values on the other side. This suggests a possible relationship between domain count trends and broader economic and travel activity within the Czech Republic.

Code
data_cz |>
  select(diff_domains,
         diff_flights,
         flights,
         diff_hotels,
         economy_sentiment,
         inflation
         ) |>
  rename(
    "Domain diff"        = diff_domains,
    "Flight diff"        = diff_flights,
    "Flights"            = flights,
    "Hotel diff"         = diff_hotels,
    "Economic sentiment" = economy_sentiment,
    "Inflation"          = inflation
  ) |>
  ggpairs(progress = FALSE) +
  theme(axis.text.x =
          element_text(
            angle = 90,
            hjust = 1,
            size  = 8)
        )

Figure 4: Correlations.

However, it remains a question whether these correlations prevail once seasonality is taken into account.

3 Models

Prior to modeling, we removed observations before July 2020 as the values for the flights and the Economic Sentiment Indicator were hugely influenced by the COVID-19 pandemic before this date. We were not interested in such irregularities in this analysis.

Code
data_cz <- data_cz |>
  filter(date >= "2020-07-01")

Because we observed significant correlations between the monthly domain differences and both commercial flight variations and the Economic Sentiment Indicator in Figure 4, we specified a generalized additive model (GAM, see Wood 2017) to test the proposed hypotheses. In an initial model, we predicted the monthly domain differences by a tensor product term interacting the economic sentiment with monthly time-flow, another tensor product term interacting the monthly differences in flights with monthly time-flow, and a smooth term for a yearly trend.

Code
gam_cz <- gam(
  diff_domains ~
    te(economy_sentiment, month, k = c(20, 12)) +
    te(diff_flights, month,  k = c(20, 12)) +
    s(year, k = 4),
  data   = data_cz,
  method = "REML"
)
saveRDS(gam_cz, file = "gam_cz.rds")
Code
gam_cz <- readRDS(file = "gam_cz.rds")

However, because (a) the model estimated an insignificant interaction between monthly time-flow and the Economic Sentiment Indicator, and (b) a follow-up model, in which we dropped the interaction term in favor of a simple smooth term for the Economic Sentiment Indicator, proved more interesting, we report the initial model in the Appendix (Section 5) and focus on the follow-up model first (Section 3.1).

Generalized additive models (GAMs) are often portrayed to be situated in a middle ground between interpretable but often inflexible linear models and flexible but black-boxish machine learning models. GAMs can be used to model nonlinear relationships (overcoming limits of linear models) while still providing inferential statistics and explanatory insights (avoiding the black-box nature of predictions made by machine learning models).

To capture these non-linear relationships, GAMs use smooth functions which are functions that are composed of smaller basis functions. While the smaller basis functions capture smaller fractions of the relationships, they add up into the bigger smooth function, which is in turn able to describe nonlinear relationships between the variables.

For an introduction on GAMs, an interactive course by Noam Ross or an introductory text by Michael Clark are recommended. Furthermore, introductory lectures by Noam Ross and Gavin Simpson are also freely available.

Note that we did not use the monthly difference in the number of hotel overnight stays variable as a predictor in the models reported below because of its positive correlation with the monthly difference of flights. Including the hotel overnight stays in the models would cause concurvity issues.

Concurvity is a generalization of collinearity to the framework of generalized additive models. Similarly to collinearity issues within the generalized linear models, concurvity describes a computational issue within a generalized additive model when one smooth term can be approximated by other smooth terms. Concurvity is estimated on a range from 0 (no overlap between the smooths) to 1 (complete overlap between the smooth functions). As stated by Simon Wood in the mgcv documentation, concurvity often becomes an issue when “… a smooth of space is included in a model, along with smooths of other covariates that also vary more or less smoothly in space. Similarly it tends to be an issue in models including a smooth of time, along with smooths of other time varying covariates. Concurvity can be viewed as a generalization of co-linearity, and causes similar problems of interpretation. It can also make estimates somewhat unstable (so that they become sensitive to apparently innocuous modelling details, for example).”

3.1 Model 1

In model 1, we predicted the monthly differences in domain counts by a thin-plate smooth term for the Economic Sentiment Indicator, by a tensor product interaction between the monthly time-flow and the monthly difference in flights, and by a thin-plate smooth term for a yearly trend.

Code
gam_cz_1 <- gam(
  diff_domains ~
    s(economy_sentiment) +
    te(diff_flights, month, k = 12) +
    s(year, k = 4),
  data   = data_cz,
  method = "REML"
  )
saveRDS(gam_cz_1, file = "gam_cz_1.rds")
Code
gam_cz_1 <- readRDS(file = "gam_cz_1.rds")

3.1.1 Residuals

Code
appraise(gam_cz_1)