Exploring the distribution of mail services in the CZ domain

ADAM report 1/2023

Authors

Dan Řezníček

Ladislav Lhotka

Published

July 27, 2023

Code
library(dplyr)
library(ggplot2)
library(plotly)
library(lubridate)
library(tidyr)
library(knitr)
library(kableExtra)
library(formattable)
library(jsonlite)
library(forcats)
library(colorspace)
library(readr)
library(brms)
library(bayesplot)
library(tidybayes)
library(future)
library(countrycode)
library(rnaturalearth)
library(sf)
library(lwgeom)
library(maps)
library(units)
library(s2)
library(corrplot)
library(performance)
library(ggthemes)
library(patchwork)
library(units)
library(stringr)
library(stringi)
library(cmdstanr)
library(scales)
theme_set(theme_minimal())
Code
root_endpoint <- "https://stats.nic.cz"

api_endpoint <- function(endpoint, ...) {
  pars <- list(...)
  if (length(pars) == 0) {
    return(paste0(root_endpoint, endpoint))
  }
  pstr <- mapply(function(x, y) paste(x, URLencode(toString(y)), sep = "="),
                 names(pars), pars)
  paste0(root_endpoint, endpoint, "?", paste(pstr, collapse = "&"))
}

Introduction

During the early years of the Internet, all institutions operated their own mail servers. Their MX (mail exchange) resource records in the DNS thus pointed to domain names within the same domain. However, over the past 10 or 15 years, MX records have undergone a process of outsourcing to mail service providers ranging from local ISPs to large national or global technology companies such as Seznam.cz or Google’s GMail. On the other hand, many second-level domains still have MX records pointing to mail servers within the same domain (see Table 1). In this report, we explore MX records of all second-level domains under CZ with the aim of assessing the distribution and accumulation of mail services across countries and autonomous systems. We provide both exploratory descriptive statistics and also some modeling-based insights.

Code
# These values were taken from a full scan, 2023-07-06.
mx_table <- data.frame(
  "Domains with:" =
    c(" - at least one MX record with its own name.",
      " - at least one MX record without its own name.",
      " - all MX records with its own name."),
  Count  =
    c(132142,
      808737,
      116388),
  check.names = FALSE
  )
mx_table |>
  kable("html",
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 1: Outsourcing of MX records.
Domains with: Count
- at least one MX record with its own name. 132 142
- at least one MX record without its own name. 808 737
- all MX records with its own name. 116 388

We utilized two primary datasets collected by the CZ.NIC’s DNS crawler, aggregating domain counts by (1) countries and (2) autonomous systems from which the mail servers indicated in MX records are operated. In these datasets, domain names appearing in MX records were resolved to IP address(es) that were then assigned to the corresponding countries or autonomous systems via a geolocation database. Note that a second-level domain may have multiple mail servers operating from various countries and/or autonomous systems. If it is the case, such a domain contributes to the counts of multiple countries or autonomous systems.

The data used in this report are static in the sense that the datasets do not provide a dynamic view of CZ mail server trends (e.g., time series) but just a limited peek into their distributions at a certain date or time period:

  • For the countries, all data correspond to a single day of 2023-06-01.

  • For the autonomous systems, the data belong to an interval from 2023-04-20 to 2023-06-15.

Sections labeled as ▶ Code that are interspersed in the text below can be expanded to reveal the actual R code used for producing the statistics, graphs, and tables.

Country perspective

First, we explored geolocations of mail servers of all second-level domains under CZ, using data provided by DNS crawler endpoint /crawler_mail_cc, which aggregates the count of domains with MX records by countries.

Code
df_cc  <- fromJSON(api_endpoint("/crawler_mail_cc"))
country_flags <- read.csv("CSV/country_flags.csv", sep = ";")
Code
df_cc <- df_cc |>
  mutate(ts = as.Date(ts)) |>
  group_by(cc) |>
  filter(ts == "2023-06-01") |>
  ungroup() |>
  rename("mx" = "domains") |>
  mutate(perc = mx / sum(mx) *100) |>
  replace_na(list(cc = "Unknown"))
country_flags <- country_flags |>
  rename(cc = country_code)
df_cc <- left_join(df_cc, country_flags, by = "cc") |>
  replace_na(list(country_flags = "?"))

Distribution of domains with MX records

Table 2 describes the distribution of second-level domains that have at least one MX record. The first row shows frequencies for all countries; however, as the global distribution is hugely influenced by the CZ frequencies (the maximum value at 1 056 497, 62.96% of all such domains; see also Figure 1 below and Table 4 in the Appendix), the second row presents frequencies for all countries except for the Czech Republic, alleviating the bias stemming from the CZ dominance.

Code
a <- df_cc |>
  summarize(Subset = "All countries",
            Min    = min(mx),
            Mean   = mean(mx),
            Median = median(mx),
            Max    = max(mx),
            Total  = sum(mx)
  ) 
b <- df_cc |>
  filter(cc != "CZ") |>
  summarize(Subset = "No CZ",
            Min    = min(mx),
            Mean   = mean(mx),
            Median = median(mx),
            Max    = max(mx),
            Total  = sum(mx)
  )

descriptives_table <- bind_rows(a, b)
descriptives_table |>
  kable("html",
        digits = 2,
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 2: Distribution of domains with MX records.
Subset Min Mean Median Max Total
All countries 1 18 239.73 68 1 056 497 1 678 055
No CZ 1 6 830.31 61 112 651 621 558

To illustrate the absolute count of these domains, we also filtered countries with more than 100 domains and plotted a barchart which, once again, shows the overwhelming and unsurprising dominance of domains with MX records pointing to mail servers within the Czech Republic.

Code
df_cc_100 <- df_cc |>
  filter(mx > 100) |> 
  arrange(mx)
plot_ly(
  data = df_cc_100,
  x    = ~mx,
  type = "bar",
  name = "Domains by country",
  hovertemplate = paste0(
    "<b>", "Country: ", "</b>", df_cc_100$cc,
    " ", df_cc_100$country_flag, "<br>",
    "<b>", "Domain count: ", "</b>", df_cc_100$mx,
    " (", round(df_cc_100$perc, 2), "%)",
    "<extra></extra>"
  ),
  text         = paste0(
    df_cc_100$country_flag, " ", df_cc_100$cc),
  textposition = "auto",
  marker       = list(color = c(rep("#003893", 41), "#F4587A")
  )
) |>
  layout(
    hovermode = "y unified",
    xaxis     = list(title = "Domain count"),
    yaxis = list (
      title          = "Country",
      showticklabels = FALSE
    )
  )
Figure 1: Distribution of domains with MX records by country (>100).

A complete list of all countries with domains that use MX records can be inspected in Table 4 in the Appendix. To provide additional insight into the relative weight of foreign countries, Table 4 also presents a column Domains percentage (no CZ) which specifies the percentage of all domains with MX records excluding those with mail servers located in the Czech Republic.

Spatial distribution

Furthermore, we explored the spatial distribution of mail servers indicated in the domains’ MX records. In Figure 2, we present log-transformed values of the absolute count of domains with mail servers in given countries. To inspect the absolute counts, hover the mouse cursor over the respective countries. Note, however, that the color scale representing log-transformed values is somewhat misleading in that it flattens the differences.

Code
df_cc <- df_cc |>
  mutate(cc_iso = countrycode(cc, origin  = "iso2c",
                              destination = "iso3c")
         ) |>
  replace_na(list(cc = "Unknown"))
df_cc <- df_cc |>
  mutate(mx_log = log(mx)) |>
  replace_na(list(cc = "Unknown"))
plot_ly(
  df_cc,
  type          = "choropleth",
  locations     = df_cc$cc_iso,
  z             = df_cc$mx_log,
  colors        = "Blues",
  hovertemplate = paste0(
    "<b>Count: </b>", df_cc$mx,
    " in ", df_cc$country_flag, " ", df_cc$cc, "<br>",
    "<b>Log: </b>", round(df_cc$mx_log, 2),
    "<extra></extra>"
    )
  ) |>
  colorbar(title = "Log")
Figure 2: Worldwide distribution of domains with MX records (log values).

Modeling mail servers outside of the Czech Republic, part 1

Unsurprisingly, IP addresses of most mail servers listed in MX records are geolocated in the Czech Republic. However, one can ask what factors might lead domain holders to set up mail servers outside of the Czech Republic? To provide a better understanding of such variations and possible influences, we utilized Bayesian regression models to test the associations between the count of domains with MX records that indicate mail servers operated from abroad and five predictors which we hypothesized to associate with the these counts.

Linear regression analysis is used to predict the value of a response variable (i.e., dependent variable) based on the value of another predictor variable (i.e., independent variable). In the Bayesian framework, linear regression formulates probability distributions for such predictions (in comparison to point estimates formulated within the frequentist framework). That is to say, the response variable is not estimated as a single value, but is assumed to be drawn from a probability distribution as Bayesian linear regression does not attempt to find one “best” value of the model parameters, but rather to determine the posterior distribution for the model parameters. When doing so, the analyst can also incorporate prior knowledge about what the parameters should be (or use non-informative priors if one has no guesses about the possible values of the parameters).

The result of a Bayesian linear regression model are distributions of possible model parameters (most notably the values by which the response variable changes depending on the predictor variables) based on the data and the priors. Practically, one can express the relationships between the response and predictors variables by describing these posterior distributions. For example, one can report the mean of the parameter’s posterior distribution and its credible interval, which helps to capture the uncertainty about the possible values of the parameter.

Predictors

Geographic distance from the Czech Republic

First, domain holders might prefer mail servers in countries that are geographically closer to the Czech Republic as they usually provide better round-trip times. We thus hypothesized that there should be a higher count of domains that use MX records in countries closer to the Czech Republic.

To calculate the distances between the Czech Republic and other countries, we first calculated centroids for each country (the geographic point at the center of gravity for a polygon that approximates each country’s borders) and then calculated the distance between the Czech Republic’s centroid coordinates and respective foreign countries’ centroids. As the distribution of these values showed great dispersion, we log-transformed them for the purposes of modeling.

Code
df_cc2 <- df_cc |>
  select(mx, cc_iso)

sf::sf_use_s2(FALSE)
world <- ne_countries(returnclass = "sf", scale = "small")
world <- world |>
  mutate(centroids = st_centroid(geometry))
CZ_polygon <- world |>
  filter(iso_a2 == "CZ")
CZ_centroid <- CZ_polygon |>
  st_centroid(CZ_polygon)
world <- world |>
  mutate(CZ_centroid = CZ_centroid)
world <- world |>
  mutate(distances = st_distance(CZ_centroid,
                                 centroids,
                                 by_element = TRUE)
         ) |>
  select(iso_a3, name, distances) |>
  rename(cc_iso = iso_a3) |>
  st_drop_geometry()
world <- world |>
  mutate(cc_iso = countrycode(name,
                              origin      = "country.name",
                              destination = "iso3c")) |>
  drop_na()

tiny_countries <- ne_countries(type        = "tiny_countries",
                               returnclass = "sf")
tiny_countries <- tiny_countries |>
  mutate(CZ_centroid = CZ_centroid)
tiny_countries <- tiny_countries |>
  mutate(distances = st_distance(CZ_centroid,
                                 geometry,
                                 by_element = TRUE)) |>
  select(iso_a3, name, distances) |>
  st_drop_geometry() |>
  mutate(cc_iso = countrycode(name,
                              origin      = "country.name",
                              destination = "iso3c"))
world <- bind_rows(world, tiny_countries)
df_cc2 <- full_join(df_cc2, world, by = "cc_iso")

df_cc2 <- df_cc2 |>
  filter(cc_iso != "CZE") |>
  select(-iso_a3) |>
  drop_units()

Population size

We also hypothesized that domain holders are more likely to have mail servers in countries with higher populations, as mail service providers are more motivated to deploy their servers in such countries.

The values for population sizes were downloaded from The World Bank. We used the most recent values (year 2021) which we log-transformed.

Code
pop <- read.csv("CSV/POP.csv", sep = ",")
pop <- pop |>
  select(cc, X2021) |>
  rename(population = X2021,
         cc_iso     = cc)
df_cc2 <- full_join(df_cc2, pop, by = "cc_iso")

GDP per capita

Furthermore, we also hypothesized that GDP per capita might increase the count of domains with MX records pointing to mail servers located in foreign countries. Again, the idea was that such countries have a better network infrastructure and may thus be more attractive for server deployments.

Again, the values for GDP per capita were downloaded from The World Bank. We used the most recent values (year 2021, in US dollars) which we log-transformed.

Code
gdp <- read.csv("CSV/GDP.csv", sep = ",")
gdp <- gdp |>
  select(cc, X2021) |>
  rename(gdp    = X2021,
         cc_iso = cc)
df_cc2 <- full_join(df_cc2, gdp, by = "cc_iso")

Export and import value

However, the size of Czech export to foreign countries and foreign countries’ import from the Czech Republic might provide a more granular and trading-network-reflective proxies for international connections than GDP per capita, geographic distance, and population size. Therefore, we hypothesized that the value of Czech Republic’s export to foreign countries would associate with the count of domains that use mail servers operated from abroad. Similarly, we hypothesized that the value of foreign countries’ import from the Czech Republic would associate with the count of these domains.

The values for the size of Czech export and import were downloaded from The Observatory of Economic Complexity (OEC). These values are proxies for a direct trading relationship between the Czech Republic and a given country. We used the most recent freely available values (year 2021) which we log-transformed.

Code
export <- read.csv("CSV/EXPORT.csv", sep = ",")
export <- export |>
  mutate(ISO.3 = replace(ISO.3,
                         Country == "Chinese Taipei",
                         "twn")) |>
  rename(cc_iso = ISO.3,
         export = Trade.Value) |>
  select(cc_iso, export) |>
  mutate(cc_iso = toupper(cc_iso))

df_cc2 <- full_join(df_cc2, export, by = "cc_iso")

import <- read.csv("CSV/IMPORT.csv", sep = ",")
import <- import |>
  mutate(ISO.3 = replace(ISO.3,
                         Country == "Chinese Taipei",
                         "twn")) |>
  rename(cc_iso = ISO.3,
         import = Trade.Value) |>
  select(cc_iso, import) |>
  mutate(cc_iso = toupper(cc_iso))

df_cc2 <- full_join(df_cc2, import, by = "cc_iso")

Upon pairing all the predictor data with our original data on domains using MX records into one dataset, we obtained a number of rows with missing values.

First, we manually adjusted the data for the population size, GDP per capita, and geographic distance for larger countries, filling in the missing values.

Second, as our original dataset does not contain data on countries that do not operate any mail servers under CZ, we assumed that such countries have zero domains using MX records and replaced such missing values with zeros. Similarly, in the case of Czech export and foreign import, we assumed that countries not captured in the OEC datasets do not trade with the Czech Republic, hence we replaced such missing values with zeros.

Lastly, for small (mostly island) countries, we did not attempt to fill in the missing values. Therefore, these countries are omitted in the analysis (as regression models cannot handle rows with missing values) and can be inspected in Table 6 in the Appendix.

Code
df_cc2 <- df_cc2 |>
  mutate(population = replace(population,
                              cc_iso == "TWN",
                              23859912),
         gdp        = replace(gdp,
                              cc_iso == "TWN",
                              33186.34),
         gdp        = replace(gdp,
                              cc_iso == "VEN",
                              2071.6),
         gdp        = replace(gdp,
                              cc_iso == "KWT",
                              28884.3),
         gdp        = replace(gdp,
                              cc_iso == "TKM",
                              10111),
         gdp        = replace(gdp,
                              cc_iso == "YEM",
                              549.49),
         gdp        = replace(gdp,
                              cc_iso == "CUB",
                              7291),
         gdp        = replace(gdp,
                              cc_iso == "GRL",
                              54571),
         gdp        = replace(gdp,
                              cc_iso == "ERI",
                              614),
         gdp        = replace(gdp,
                              cc_iso == "PRK",
                              654),
         distances  = replace(distances,
                              cc_iso == "SYC",
                              7165900),
         distances  = replace(distances,
                              cc_iso == "HKG",
                              8750280),
         distances  = replace(distances,
                              cc_iso == "AND",
                              1330730),
         name       = replace(name,
                              cc_iso == "HKG",
                              "Hong Kong"),
         name       = replace(name,
                              cc_iso == "SYC",
                              "Seychelles")) |>
  mutate(mx        = replace_na(mx, 0),
         export    = replace_na(export,0),
         import    = replace_na(import, 0),
         distances = distances/1000) |>
  mutate(distance_log   = log(distances + 1),
         population_log = log(population + 1),
         gdp_log        = log(gdp + 1),
         export_log     = log(export + 1),
         import_log     = log(import +1)) |>
  dplyr::distinct(cc_iso, .keep_all = TRUE)

df_cc_NAs <- 
  df_cc2 |>
  mutate(mx     = replace_na(mx, 0),
         export = replace_na(export, 0))
df_cc_NAs <- df_cc_NAs[!complete.cases(df_cc_NAs),]

df_cc2 <- df_cc2 |> drop_na()

Passive properties or active channels?

While all of the proposed predictors showed some correlations between each other (see Figure 3 below), they theoretically differ in how they function as proxies for domain holders’ tendency for setting up their mail services outside of the Czech Republic. While geographic distance, population size, and GDP per capita are properties describing respective countries, the values of export and import describe the interaction between the Czech Republic and given countries. In other words, while there is no “human design” behind geographic distance or population size of foreign countries in respect to mail servers indicated in MX records of second-level domains under CZ, both of these variables could influence these counts as a passive side-effect of geographic closeness or high-enough population numbers. Similarly, GDP per capita of foreign countries is an index not directly connected to the Czech Republic but may influence the count of domains as a side-effect of foreign countries’ economic power.

In contrast, Czech export to foreign countries and foreign import from the Czech Republic can be conceptualized as intentionally directed and active networking channels. Therefore, the count of domains with MX records pointing to mail servers in given countries could increase as domain holders might spend resources on developing their own CZ infrastructure and operations in countries where trading with the Czech Republic is more intense.

Code
df_cc2 |>
  mutate(mx_log = log(mx + 1)) |>
  select(mx_log,
         distance_log,
         population_log,
         gdp_log,
         export_log,
         import_log) |>
  drop_units() |>
  drop_na() |>
  cor() |>
  corrplot(method      = "color",
           type        = "lower",
           diag        = FALSE,
           addCoef.col = "Black")

Figure 3: Predictor correlations.

Results

Because the distribution of domain counts was over-dispersed and showed high zero-inflation (i.e., there were many countries with no mail servers indicated in the domains’ MX records), we used zero-inflated negative binomial distribution to model both the count of domains with MX records and the proportion of zeros.

Zero-inflated negative binomial regression can be used for modeling count variables with excessive zeros (i.e., zero-inflation) where the distribution of the count response variable also shows overdispersion (i.e., its variability is greater than would be expected in more classical models like Poisson). Furthermore, the excess zeros can be conceptualized as generated by a separate process from the count values and can be, therefore, modeled independently from the count of the response variable.

Models 1-3

Code
m1 <-
  brm(
    bf(
      mx ~ 
        log(distances + 1) +
        log(population + 1) +
        log(gdp + 1) +
        log(export + 1) +
        log(import + 1),
      zi ~
        log(distances + 1) +
        log(population + 1) +
        log(gdp + 1) +
        log(export + 1) +
        log(import + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
Code
m2 <-
  brm(bf(mx ~ log(distances + 1),
         zi ~ log(distances + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
Code
m3 <-
  brm(bf(mx ~ log(population + 1),
         zi ~ log(population + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())

In model m1, we used all five predictors to estimate the count of domains with MX records pointing to mail servers located in foreign countries and the probability that no mail servers are located in the respective countries. However, as we observed severe issues with multicollinearity (when predictors correlate with each other highly, the resulting model parameters become unreliable), we do not discuss model m1 (for the code and results, see the Appendix), and focus on modeling each predictor separately.

For the sake of brevity, we also refer the reader to the Appendix for the results of models m2 (geographic distance) and m3 (population size). While they found associations with the count of domains and their zero-inflation, they performed less efficiently than other models (see below) and found weaker effects than model m4 (GDP per capita).

Model 4: GDP per capita

Code
m4 <-
  brm(bf(mx ~ log(gdp + 1),
         zi ~ log(gdp + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
Code
summary(m4)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(gdp + 1) 
         zi ~ log(gdp + 1)
   Data: df_cc2 (Number of observations: 188) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept      -16.17      2.42   -20.68   -11.21 1.00     6798     7136
zi_Intercept    15.43      5.57     5.80    27.64 1.00     7161     5942
loggdpP1         2.41      0.25     1.90     2.89 1.00     7214     7681
zi_loggdpP1     -1.99      0.69    -3.56    -0.84 1.00     6693     5374

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.11      0.02     0.08     0.15 1.00     7782     7201

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The results summary table above presents a great deal of information. However, only a part of the section titled Population-Level Effects is usually of prime interest.

First, the reader should focus on the columns Estimate (mean of the posterior distribution for respective predictors), l-95% CI (lower bound of the 95% credible interval), and u-95% CI (upper bound of the 95% credible interval). As noted above, these values help us interpret the parameters’ posterior distributions.

  • The mean estimate describes the central tendency of the parameter and can be interpreted as the value where the association between the predictor and the response variable probably lies. Then, it represents how much the values of the response change depending on the values of the predictor. Note that this value is not a single point-estimate, but a mean of many possible regression lines that the model fitted on the data.

  • Furthermore, we are interested in the 95% credible intervals (95% CIs) which capture where 95% of parameters’ posterior distribution lie. CIs are described by their lower (l-95% CI) and upper (u-95% CI) bound (i.e., 95% of the parameter’s distribution lie somewhere between its lower and upper bound). When looking at credible intervals, we are interested whether they contain zero—if they do, we can conclude that there is no relationship between the variables; if they don’t, there possibly is a relationship. For example, a parameter with mean M = 2.41 and 95% credible interval between 1.90 and 2.88 does not contain zero and therefore suggests a positive relationship between the response and predictor variables. However, a parameter with mean M = 2.41 and 95% credible interval between -1.90 and 2.88 contains zero, suggesting a null effect.

Then, in the same section (Population-Level Effects), we are mostly interested in rows describing the parameters for the investigated predictors (here labelled as loggdpP1 and zi_loggdpP1). These two describe the respective relationships between the response and predictor variables. For loggdpP1, we see that the effect is positive as the 95% CI lies above zero, suggesting that GDP per capita increases the count of domains. For zi_loggdpP1 (the zi stand for zero-inflation), we observe a negative effect as the 95% CI lies below zero, suggesting that GDP per capita decreases the probability of observing zero domains with MX records.

However, the reported parameter values are not intuitively interpretable as the predictors have been log-transformed in the formula of the model, the model utilized a log link function to map the predictor values on the response values, and a logit link function to map the predictor values on the zero-inflation probability. In the case of predictors for the count of domains, we can interpret the results as a percent increase. For example, in model m4, 1% increase in GDP per capita associates with 2.4% increase in the count of domains with MX records. For any x percent increase, one has to calculate 1.x to the power of the coefficient, subtract 1, and multiply by 100. For example, a 30% increase in the GDP per capita results in 87.7% increase in the count of domains ((1.30^2.40 - 1) * 100 = 87.7).

Code
m4_plot <- 
  plot(
    conditional_effects(m4,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylim(0,1000000) +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  xlab("GDP per capita (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m4_plot_zi <- 
  plot(
    conditional_effects(m4,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("Zero-inflation probability") +
  xlab("GDP per capita (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m4_plot_zi
m4_plot

(a) Zero-inflation.

(b) Domains’ count.

Figure 4: Associations between GDP per capita and domains with MX records.

Naturally, charts provide an important visual aid for understanding the relationships between the predictor and response variables. In a nutshell, the (blue) regression lines help us understand how the values of the response variable on the y-axis change depending on the values of the predictor on the x-axis (i.e., what is their association).

Note that for the domain count (right subplot), both variables are on their original scale. For the zero-inflation (left subplot), the GDP per capita is on its original scale, while the values for the probability of zero-inflation are mapped between 0 and 1, using a logit link.

In the domain count chart (right), we can also inspect a scatter of the countries in which the mail servers are set up. Here, each dot represents one country positioned accordingly to the country’s GDP per capita and the respective count of domains with MX records. Note that not all countries are represented in these charts, as the y-axis has been cut to focus on the regression lines.

Crucially, as we used Bayesian regression analysis, we have not obtained only one regression line, which would be fitted to the data-points, but a whole distribution of lines. Therefore, we may observe a number of lines that have been sampled from the posterior distribution. This sample helps to visualize the distribution of a given parameter—lines get more dense closer to the middle (where the mean is represented by one thick white line). Furthermore, the sample is also bounded by the 95% CI. Together, they illustrate both the relationship between the variables and the model’s uncertainty about this relationship.

Model m4 estimated that GDP per capita associates negatively with the zero-inflation of domains using MX records and positively with their count. Note that the left subplot presents the zero-inflation (descending lines suggest lower probability of observing zero domains using MX records) and the right subplot the count of domains (rising lines suggest higher counts of domains using MX records).

Model 5: Czech export

Code
m5 <-
  brm(bf(mx ~ log(export + 1),
         zi ~ log(export + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
Code
summary(m5)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(export + 1) 
         zi ~ log(export + 1)
   Data: df_cc2 (Number of observations: 188) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -4.07      1.45    -6.74    -1.13 1.00     7635     6708
zi_Intercept      22.28      4.47    14.57    32.17 1.00     8503     6363
logexportP1        0.60      0.07     0.45     0.74 1.00     8168     7520
zi_logexportP1    -1.31      0.27    -1.91    -0.86 1.00     8611     6199

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.16      0.02     0.12     0.21 1.00     9927     7582

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m5_plot <- 
  plot(
    conditional_effects(m5,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylim(0,180000) +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  xlab("Czech export (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m5_plot_zi <- 
  plot(
    conditional_effects(m5,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("Zero-inflation probability") +
  xlab("Czech export (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m5_plot_zi
m5_plot

(a) Zero-inflation.

(b) Domains’ count.

Figure 5: Associations between Czech export value and domains with MX records.

Model m5 estimated that Czech export associates negatively with the zero-inflation of domains using MX records and positively with their count.

Model 6: Foreign import

Code
m6 <-
  brm(bf(mx ~ log(import + 1),
         zi ~ log(import + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
Code
summary(m6)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(import + 1) 
         zi ~ log(import + 1)
   Data: df_cc2 (Number of observations: 188) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -1.26      1.18    -3.37     1.23 1.00     8881     6271
zi_Intercept      12.82      2.42     8.67    18.19 1.00     9939     7171
logimportP1        0.47      0.06     0.34     0.58 1.00     9347     6882
zi_logimportP1    -0.81      0.16    -1.16    -0.55 1.00     9417     6684

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.15      0.02     0.11     0.20 1.00     9303     8589

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m6_plot <- 
  plot(
    conditional_effects(m6,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  xlab("Foreign import from CZ (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m6_plot_zi <- 
  plot(
    conditional_effects(m6,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("Zero-inflation probability") +
  xlab("Foreign import from CZ (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m6_plot_zi
m6_plot

(a) Zero-inflation.

(b) Domains’ count.

Figure 6: Associations between foreign import and domains with MX records.

Model m6 estimated that foreign import from the Czech Republic associates negatively with the zero-inflation of domains using MX records and positively with their count.

Model performance comparison

Code
comp1 <- compare_performance(m2, m3, m4, m5, m6,
                             metrics = c("LOOIC", "WAIC"))
comp1
# Comparison of Model Performance Indices

Name |   Model |     ELPD | ELPD_SE | LOOIC (weights) | LOOIC_SE | WAIC (weights)
---------------------------------------------------------------------------------
m2   | brmsfit | -704.135 |  60.949 |  1408.3 (<.001) |  121.899 | 1407.7 (<.001)
m3   | brmsfit | -709.610 |  60.620 |  1419.2 (<.001) |  121.240 | 1419.0 (<.001)
m4   | brmsfit | -670.400 |  59.146 |  1340.8 (0.162) |  118.292 | 1339.9 (<.001)
m5   | brmsfit | -651.562 |  60.243 |  1303.1 (0.838) |  120.487 | 1299.7 (>.999)
m6   | brmsfit | -659.931 |  59.915 |  1319.9 (<.001) |  119.830 | 1317.2 (<.001)

ELPD: Expected log predictive density. Larger ELPD values mean better fit.

LOOIC: Leave-one-out cross-validation (LOO) information criterion. Lower LOOIC values mean better fit.

WAIC: Widely applicable information criterion. Lower WAIC values mean better fit.

When compared, model m5 (using the export predictor, logexportP1) seems to be the best fit for the data. The model m6 (using the import predictor, logimportP1) ranked second, followed by the model m4 (using the GDP per capita predictor, loggdpP1). Models m2 (distance, logdistancesP1) and m3 (population, logpopulationP1) fared worse than models m4, m5, and m6.

Discussion

We investigated what factors might motivate domain holders to set up mail servers outside of the Czech Republic. We explored the influence of (1) foreign countries’ geographic distance from the Czech Republic, (2) the foreign countries’ population size, (3) the foreign countries’ GDP per capita, (4) Czech Republic’s export to the respective foreign countries, and (5) foreign countries’ import from the Czech Republic. We found that all of these predictors are associated with the counts of domains that use MX records pointing to foreign mail servers, however, due to multicollinearity issues, we were not able to estimate them within one model.

As our models predicted both the count of domains and also their zero-inflation (in many foreign countries, there are no mail servers for CZ domains), we found that nearly all of the predictors had complementary associations with both of these processes. For example, as foreign countries’ GDP per capita increased, the count of domains using mail servers abroad increased, while the probability of operating zero mail servers from abroad decreased. In other words, economically stronger countries had lower chances of having no mail servers outsourced from the Czech Republic and also higher count of such mail servers. This complementary trend applies to all of the predictors.

Interestingly, GDP per capita found the strongest associations with the count of domains that use foreign mail server services and their zero-inflation; however, models using the export and import values provided a better fit for the data.

Knowing that these modeling results are correlational associations and cannot suggest causal links, we can summarize that domain holders’ interest in foreign mail server services:

  1. Decreases with rising geographic distance from the Czech Republic.

  2. Increases with rising population size of foreign countries (However, the positive association with the count of domains that use foreign mail servers is not certain.).

  3. Increases with rising GDP per capita of foreign countries.

  4. Increases with rising value of Czech export to foreign countries.

  5. Increases with rising value of foreign import from the Czech Republic.

Still, we may speculate about the causal processes behind the associations observed for export and import values. As we noted above, in comparison to the “passive” proxies of geographic distance, population size, and GDP per capita, export and import values reflect “active”, intentionally directed channels in human networks. Then, intense trading between Czech and foreign companies may motivate these actors to increasingly spend resources on developing larger mail server infrastructure. However, to establish these causal effects, further analysis using time series would be needed (given that natural variation in export and import values would provide sufficient data for such a natural experiment).

Autonomous systems perspective

In this section, we explored domains using MX records in respect to autonomous systems, using data provided by CZ.NIC’s DNS crawler endpoint /crawler_mail_asn. To provide further detail, we also paired these data with public information on autonomous systems holders and their respective domicile.

Code
df_asn <- fromJSON(api_endpoint("/crawler_mail_asn"))
df_asn <- df_asn |>
  mutate(ts = as.Date(ts)) |>
  group_by(asn) |>
  filter(ts <= "2023-06-15") |>
  filter(ts == max(ts)) |>
  ungroup() |>
  rename("mx" = "domains")

asn <-
  read_delim("CSV/asn.txt",
             delim         = "_", #intentionally incorrect
             escape_double = FALSE,
             trim_ws       = TRUE)
asn <- asn |>
  rename(col = "0 -Reserved AS-, ZZ") |>
  mutate(col = sub(" ", "___", col))
asn$col <- sub(",([^,]*)$", " 9199999\\1", asn$col)
asn <- asn |>
  separate_wider_delim(
    col,
    delim    = "___",
    names    = c("asn", "col")
    )
asn <- asn |>
  separate_wider_delim(
    col,
    delim = " 9199999 ",
    names = c("holder", "country"),
    too_few = "debug"
    ) |>
  select(asn, holder, country)

df_asn <- left_join(df_asn, asn, by = "asn") |>
  rename("cc" = "country") |>
  mutate(cc_iso = countrycode(cc,
                              origin = "iso2c",
                              destination = "iso3c"))

df_asn <- left_join(df_asn, country_flags, by = "cc")

Distribution of domains with MX records

Table 3 presents the distribution of domains using MX records in all autonomous systems with at least one domain with MX record. The first row shows statistics for all autonomous systems operated by companies with domicile in the Czech Republic, while the second row specifies values for all autonomous systems with foreign holders.

Code
cz <- df_asn |>
  filter(cc == "CZ") |>
  summarize(Subset  = "CZ",
            Min     = min(mx),
            Mean    = mean(mx),
            Median  = median(mx),
            Max     = max(mx),
            Total   = sum(mx),
            Percent = paste0(
                        round(sum(mx) /
                              sum(df_asn$mx) *
                              100, 2),
                        "%")
  )
foreign <- df_asn |>
  filter(cc != "CZ") |>
  summarize(Subset  = "Foreign",
            Min     = min(mx),
            Mean    = mean(mx),
            Median  = median(mx),
            Max     = max(mx),
            Total   = sum(mx),
            Percent = paste0(
                        round(sum(mx) /
                              sum(df_asn$mx) *
                              100, 2),
                        "%")
    )

descriptives_table <- bind_rows(cz, foreign)
descriptives_table |>
  kable("html",
        digits = 2,
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 3: Distribution of domains with MX records across autonomous systems.
Subset Min Mean Median Max Total Percent
CZ 1 2 557.87 39 215 911 1 222 660 75.26%
Foreign 1 126.89 2 58 605 365 050 22.47%

Next, to illustrate the absolute count of domains using MX records across autonomous systems, we filtered autonomous systems with more than 1000 domains with MX records and plotted a barchart (see Figure 7). Note that red-colored bars represent autonomous systems with domicile in the Czech Republic, while the blue-colored bars all autonomous systems with domicile outside of the Czech Republic (with one teal-colored bar for unknown autonomous systems). Similarly as above, the chart illustrates the unsurprising dominance of autonomous systems with domicile in the Czech Republic.

Figure 7: Domains with MX records by ASN (>1000).

A longer list of autonomous systems with more than 500 domains using MX records can be inspected in Table 5 in the Appendix. To provide a separate insight into the representation of foreign domains using MX records, Table 5 also presents a column Domains' percentage (no CZ) which specifies the percentage of all domains with MX records excluding the autonomous systems with domicile within the Czech Republic. Similarly, the table also presents a column Domains' percentage (CZ only) specifying the percentage of all domains using MX records with domicile within the Czech Republic only.

Modeling mail servers outside of the Czech Republic, part 2

As in the country-focused section, we tested the associations between the count of domains with MX records pointing to mail servers outside of the Czech Republic and the same set of predictors. However, the fact that autonomous systems often span across the borders of many countries makes the assumption of domains with mail servers outside of the Czech Republic somewhat complicated. As a workaround, we used the autonomous systems’ holders domicile as a key by which we paired the counts of domains with MX records to the predictor values. While problematic—the location of autonomous systems’ holders domicile may easily differ from the actual geolocations of mail servers’ IP addresses—, this approach enabled us to use the country-specific predictors to model the count of domains.

Code
df_asn2 <- df_asn |>
  select(asn, mx, holder, cc_iso)
df_asn2 <- full_join(df_asn2, world,
                     by           = "cc_iso",
                     multiple     = "all",
                     relationship = "many-to-many",
                     na_matches   = "na")
df_asn2 <- full_join(df_asn2, pop,
                     by           = "cc_iso",
                     multiple     = "all",
                     relationship = "many-to-many",
                     na_matches   = "na")
df_asn2 <- full_join(df_asn2, gdp,
                     by           = "cc_iso",
                     multiple     = "all",
                     relationship = "many-to-many",
                     na_matches   = "na")
df_asn2 <- full_join(df_asn2, export,
                     by           = "cc_iso",
                     multiple     = "all",
                     relationship = "many-to-many",
                     na_matches   = "na")
df_asn2 <- full_join(df_asn2, import,
                     by           = "cc_iso",
                     multiple     = "all",
                     relationship = "many-to-many",
                     na_matches   = "na")
df_asn2 <- df_asn2 |>
  select(-iso_a3) |>
  mutate(mx = replace_na(mx, 0),
         export  = replace_na(export, 0)) |>
  filter(cc_iso != "CZE") |>
  mutate(population = replace(population,
                              cc_iso == "TWN",
                              23859912),
         gdp        = replace(gdp,
                              cc_iso == "TWN",
                              33186.34),
         gdp        = replace(gdp,
                              cc_iso == "VEN",
                              2071.6),
         gdp        = replace(gdp,
                              cc_iso == "KWT",
                              28884.3),
         gdp        = replace(gdp,
                              cc_iso == "TKM",
                              10111),
         gdp        = replace(gdp,
                              cc_iso == "YEM",
                              549.49),
         gdp        = replace(gdp,
                              cc_iso == "CUB",
                              7291),
         gdp        = replace(gdp,
                              cc_iso == "GRL",
                              54571),
         gdp        = replace(gdp,
                              cc_iso == "ERI",
                              614),
         gdp        = replace(gdp,
                              cc_iso == "PRK",
                              654),
         distances  = replace(distances,
                              cc_iso == "SYC",
                              7165900),
         distances  = replace(distances,
                              cc_iso == "HKG",
                              8750280),
         distances  = replace(distances,
                              cc_iso == "AND",
                              1330730),
         name       = replace(name,
                              cc_iso == "HKG",
                              "Hong Kong"),
         name       = replace(name,
                              cc_iso == "SYC",
                              "Seychelles")) |>
  drop_units() |>
  mutate(distances = distances/1000) |>
  mutate(distance_log   = log(distances + 1),
         population_log = log(population + 1),
         gdp_log        = log(gdp + 1),
         export_log     = log(export + 1),
         import_log     = log(import + 1),
         cc_iso         = as.factor(cc_iso)) |>
  drop_na(c(distance_log,
            population_log,
            gdp_log,
            export_log,
            import_log))

Results

Model 7: All predictors

Initially, we ran model m7 (see Appendix) using the negative binomial distribution to predict the count of domains using all five predictors with varying intercepts for the domicile of the autonomous systems’ holders. However, the posterior predictive check revealed that the model struggled to capture the inflation of autonomous systems with only one domain and simultaneously overestimated the count of zeros (see Figure 20 in the Appendix). Therefore, we opted for the use of zero-inflated negative binomial distribution to predict the count of domains in another model m8. However, as the inflation was not observed for zeros but for ones, we discarded zero observations from the dataset (there were 110 observations out of a total of 2898) and subtracted 1 from the count of domains, which enabled us to model the one-inflation of the domains’ count and obtain a better fit for the data. Furthermore, high multicollinearity was detected for the export predictor (and medium multicollinearity values for distance, population, and import predictors); therefore, we omitted the export and also import (as it was highly corelated with the export) predictors in model m8, and modeled them separately in m9 and m10.

A posterior predictive check is a method for comparing the values that the model predicts and the actual values upon which the model was estimated. The goal is to inspect whether the fitted model adequately describes the observed data. Put simply, if the data predicted by the posterior predictive check deviate from the actual observed data, we should worry and try to find a better approach for modeling the relationships.

Model 8: Distance, population, and GDP per capita

Code
df_asn3 <- df_asn2 |>
  mutate(mx = mx - 1) |>
  filter(mx>=0)
m8 <- brm(
  bf(mx ~
       log(distances + 1) +
       log(population + 1) +
       log(gdp + 1) +
       (1|cc_iso),
     zi ~ 
       log(distances + 1) +
       log(population + 1) +
       log(gdp + 1) +
       (1|cc_iso)),
  family  = zero_inflated_negbinomial(),
  data    = df_asn3,
  prior   = c(prior(normal(0,2), class = Intercept),
              prior(normal(0,2), class = b)),
  chains  = 4,
  iter    = 4000,
  warmup  = 1000,
  cores   = parallel::detectCores(),
  backend = "cmdstanr",
  threads = threading(2),
  control = list(adapt_delta = 0.99))
Code
summary(m8)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(distances + 1) + log(population + 1) + log(gdp + 1) + (1 | cc_iso) 
         zi ~ log(distances + 1) + log(population + 1) + log(gdp + 1) + (1 | cc_iso)
   Data: df_asn3 (Number of observations: 2860) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Group-Level Effects: 
~cc_iso (Number of levels: 79) 
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)        1.44      0.18     1.13     1.84 1.00     3521     6638
sd(zi_Intercept)     1.09      0.94     0.04     3.53 1.00     5367     5561

Population-Level Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept             -9.82      3.68   -17.00    -2.44 1.00     3473     5630
zi_Intercept         -66.26     32.16  -144.70   -21.68 1.00     4547     3413
logdistancesP1         0.20      0.20    -0.19     0.59 1.00     2690     4941
logpopulationP1        0.09      0.13    -0.17     0.34 1.00     2905     5053
loggdpP1               0.95      0.23     0.49     1.40 1.00     3315     5601
zi_logdistancesP1      1.28      1.18    -1.31     3.50 1.00     4233     4266
zi_logpopulationP1     0.75      0.95    -0.33     3.25 1.00     3843     3475
zi_loggdpP1            3.41      2.33     0.07     9.01 1.00     6174     4339

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.14      0.00     0.13     0.15 1.00    14485     9444

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m8_plot_1 <- 
  plot(
    conditional_effects(m8,
                        effect    = "distances",
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  ylim(0,500) +
  xlab("Geographic distance (km)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m8_plot_2 <- 
  plot(
    conditional_effects(m8,
                        effect    = "population",
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  ylim(0,500) +
  xlab("Population size") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m8_plot_3 <- 
  plot(
    conditional_effects(m8,
                        effect    = "gdp",
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  ylim(0,500) +
  xlab("GDP per capita (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))


m8_plot_1_zi <- 
  plot(
    conditional_effects(m8,
                        effect    = "distances",
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("One-inflation probability") +
  xlab("Geographic distance (km)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m8_plot_2_zi <- 
  plot(
    conditional_effects(m8,
                        effect    = "population",
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("One-inflation probability") +
  xlab("Population size") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m8_plot_3_zi <- 
  plot(
    conditional_effects(m8,
                        effect    = "gdp",
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("One-inflation probability") +
  xlab("GDP per capita (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m8_plot_1_zi
m8_plot_1
m8_plot_2_zi
m8_plot_2
m8_plot_3_zi
m8_plot_3

(a) Geographic distance (one-inflation).

(b) Geographic distance (domains’ count).

(c) Population size (one-inflation).

(d) Population size (domains’ count).

(e) GDP per capita (one-inflation).

(f) GDP per capita (domains’ count).

Figure 8: Associations between predictors and domains with MX records.

Model m8 estimated that GDP per capita positively associates with the count of domains and—strangely—also with their “zero”-inflation (which, in fact, was a recoded one-inflation). Other predictors estimated null associations.

The relationships between the predictors and the count of domains are plotted in Figure 8 above. Note that the left column plots the one-inflation (rising lines suggest higher probability of observing one domain using MX records) and the right column the count of domains (rising lines suggest higher counts of domains using MX records, for all values above one).

Model 9: Czech export

Code
m9 <- brm(
  bf(mx ~ log(export + 1) + (1|cc_iso),
     zi ~ log(export + 1) + (1|cc_iso)),
  family  = zero_inflated_negbinomial(),
  data    = df_asn3,
  prior   = c(prior(normal(0,2), class = Intercept),
              prior(normal(0,2), class = b)),
  chains  = 4,
  iter    = 4000,
  warmup  = 1000,
  cores   = parallel::detectCores(),
  backend = "cmdstanr",
  threads = threading(2),
  control = list(adapt_delta = 0.999))
Code
summary(m9)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(export + 1) + (1 | cc_iso) 
         zi ~ log(export + 1) + (1 | cc_iso)
   Data: df_asn3 (Number of observations: 2860) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Group-Level Effects: 
~cc_iso (Number of levels: 79) 
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)        1.58      0.20     1.24     2.00 1.00     2955     5804
sd(zi_Intercept)     1.97      1.20     0.10     4.69 1.00     1733     3195

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -2.06      2.37    -6.62     2.57 1.00     2694     4906
zi_Intercept     -17.65     19.11   -67.86     7.06 1.00     5675     5247
logexportP1        0.23      0.11     0.01     0.45 1.00     2412     4624
zi_logexportP1     0.47      0.83    -0.64     2.62 1.00     5387     5279

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.13      0.00     0.13     0.14 1.00     2304     5744

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m9_plot <- 
  plot(
    conditional_effects(m9,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  ylim(0,500) +
  xlab("Czech export (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m9_plot_zi <- 
  plot(
    conditional_effects(m9,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("One-inflation probability") +
  xlab("Czech export (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m9_plot_zi
m9_plot

(a) One-inflation.

(b) Domains’ count.

Figure 9: Associations between Czech export and domains with MX records.

Model m9 estimated a positive relationship between the Czech export and the count of domains and a null relationship with their one-inflation.

Model 10: Foreign import

Code
m10 <- brm(
  bf(mx ~ log(import + 1) + (1|cc_iso),
     zi ~ log(import + 1) + (1|cc_iso)),
  family  = zero_inflated_negbinomial(),
  data    = df_asn3,
  prior   = c(prior(normal(0,2), class = Intercept),
              prior(normal(0,2), class = b)),
  chains  = 4,
  iter    = 4000,
  warmup  = 1000,
  cores   = parallel::detectCores(),
  backend = "cmdstanr",
  threads = threading(2),
  control = list(adapt_delta = 0.999))
Code
summary(m10)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(import + 1) + (1 | cc_iso) 
         zi ~ log(import + 1) + (1 | cc_iso)
   Data: df_asn3 (Number of observations: 2860) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Group-Level Effects: 
~cc_iso (Number of levels: 79) 
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)        1.61      0.20     1.26     2.04 1.00     2685     4956
sd(zi_Intercept)     1.93      1.16     0.10     4.53 1.00     1635     3188

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          0.01      2.01    -3.92     3.99 1.00     2781     4714
zi_Intercept     -15.99     15.92   -56.43     4.03 1.00     6404     5153
logimportP1        0.13      0.10    -0.06     0.32 1.00     2333     4539
zi_logimportP1     0.40      0.69    -0.52     2.14 1.00     6209     5155

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.13      0.01     0.13     0.14 1.00     2617     6387

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m10_plot <- 
  plot(
    conditional_effects(m10,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  ylim(0,500) +
  xlab("Foreign import from CZ (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m10_plot_zi <- 
  plot(
    conditional_effects(m10,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("One-inflation probability") +
  xlab("Foreign import from CZ (USD)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))

m10_plot_zi
m10_plot

(a) One-inflation.

(b) Domains’ count.

Figure 10: Associations between foreign import and domains with MX records.

Model m10 estimated null relationship between the foreign import from the Czech Republic and the count of domains and also their one-inflation.

Model performance comparison

Code
comp2 <- compare_performance(m8, m9, m10,
                             metrics = c("LOOIC", "WAIC"))
comp2
# Comparison of Model Performance Indices

Name |   Model |      ELPD | ELPD_SE | LOOIC (weights) | LOOIC_SE |  WAIC (weights)
-----------------------------------------------------------------------------------
m8   | brmsfit | -8811.940 | 154.354 | 17623.9 (0.977) |  308.709 | 17616.2 (0.997)
m9   | brmsfit | -8818.239 | 154.085 | 17636.5 (0.023) |  308.171 | 17629.6 (0.001)
m10  | brmsfit | -8822.838 | 155.792 | 17645.7 (<.001) |  311.583 | 17629.1 (0.002)

The models performed similarly well on the data, although m8 fared the best.

Discussion

Similarly as above, we investigated what might motivate domain holders to set up mail servers “outside” of the Czech Republic. However, in comparison to the models reported in the country-focused section (where the count of domains with MX records was aggregated by the respective countries where the mail server services were located), data used in this analysis aggregated the count of domains with MX records by their respective autonomous systems. Furthermore, the predictor variables used in this analysis were paired to the domain counts by the domicile of the holders that operate the autonomous systems.

We found that foreign countries’ GDP per capita and the value of Czech export were associated with the count of domains with MX records pointing to mail servers operated by autonomous systems’ holders with domicile outside of the Czech Republic. Crucially, as the models focusing on autonomous systems did not observe severe multicollinearity issues as models focusing on countries, they provided us with better control over the investigated predictors. Then, we observed that while GDP per capita remained positively associated with the count of domains, the associations with geographic distance and population size disappeared. This difference from previous models seems to make sense as the geographic distance and population size predictors were not paired by the actual geolocation of autonomous systems but by the domicile of the autonomous systems’ holders, making them disconnected from the count of domains in these data.

Although the export and import values had to modeled separately due to multicollinearity issues, we found a positive relationship between the value of Czech export and the count of domains. Therefore, we may suggest that domain holder’s tendency to set up their mail servers “abroad” is positively associated only with economic proxies—at least when using data where the count of domains is aggregated by autonomous systems and the predictors are paired by the domicile of the autonomous systems’ holders.

Modeling domains with MX records among Czech companies

To complement the focus on domains with mail server services set up outside of the Czech Republic, we also modeled the variations in the count of domains within autonomous systems for which their holders have domicile in the Czech Republic. Here, we focused on modeling the relationship between the count of domains with MX records and the financial turnover of autonomous systems’ holders.

To do so, we surveyed the public register—searching for financial statements of respective companies—and created a dataset with values for their turnover. As there were 475 autonomous systems operated by Czech companies, we set an arbitrary limit where we collected turnover data only for companies with more than 100 domains with MX records within their autonomous systems. While we were not able to find turnover values for all of these companies (as the public register does not always provide such data), our final dataset contained turnover values for 117 Czech companies (out of 155 companies with more than 100 domains; ignoring 320 companies with less than 100 domains). The resulting dataset accounted for 975 267 domains with MX records (79.77% of all domains within autonomous systems operated by Czech holders), ignoring 247 321 domains (20.2%) out of which 8 566 domains were operated in autonomous systems with less than 100 domains. Note that such imperfections may bias the results below.

Code
df_companies <- read.csv("CSV/COMPANIES.csv", sep = ",")
df_asn_cz <- df_asn |>
  filter(cc == "CZ")
df_asn_cz |> summarize(sum(mx))
df_asn_cz |> filter(mx>=100) |> summarize(sum(mx))
df_asn_cz |> filter(mx<=100) |> summarize(sum(mx))

df_asn_cz <- left_join(df_companies, df_asn_cz, by = "holder")
df_asn_cz <- df_asn_cz |>
  select(holder, turnover, year, asn, mx) |>
  mutate(holder = sub(" ", "|", holder)) |>
  separate_wider_delim(
    holder,
    delim = "|",
    names = c("AS", "holder")
    )
df_asn_cz |> summarize(sum(mx))

Model 11

To model the relationship between the count of domains with MX records pointing to mail servers operated by holders with domicile in the Czech Republic and the their financial turnover, we used the negative binomial distribution and log-transformed the turnover values. Furthermore, as some companies operated more than one autonomous system, we also specified a varying intercept for the holders of autonomous systems to account for such nested variations.

Code
m11 <-
  brm(mx ~ log(turnover) + (1|holder),
      family = negbinomial(),
      data   = df_asn_cz,
      chains = 4,
      iter   = 4000,
      warmup = 1000)
Code
summary(m11)
 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: mx ~ log(turnover) + (1 | holder) 
   Data: df_asn_cz (Number of observations: 117) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Group-Level Effects: 
~holder (Number of levels: 101) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.57      0.16     1.28     1.89 1.00     4636     7036

Population-Level Effects: 
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept       1.69      1.75    -1.69     5.18 1.00     5825     7470
logturnover     0.30      0.10     0.11     0.49 1.00     5470     7908

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.87      0.15     0.62     1.20 1.00     5001     7603

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m11_plot <- 
  plot(
    conditional_effects(m11,
                        ndraws    = 100,
                        spaghetti = TRUE),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  ylim(0,40000) +
  xlab("CZ companies' turnover (CZK)") +
  scale_x_continuous(labels = scales::label_number_si())
m11_plot

Figure 11: Associations between CZ companies’ turnover and domains with MX records.

Model m11 estimated a positive association between the count of domains and the financial turnover of the Czech companies that operate the respective autonomous systems. The relationship is plotted above in Figure 11.

General discussion

In this ADAM report, we explored the distribution of MX records of all second-level domains under CZ across the world, looking into datasets which aggregated the count of domains with MX records by countries and by autonomous systems. Unsurprisingly, we observed that most domains using MX records use mail servers located within the Czech Republic.

We also explored what lies behind the domain holders’ tendency for setting up their mail servers outside of the Czech Republic. When using data which aggregated the count of domains by countries in which the IP addresses of mail servers were geolocated, geographic distance from the Czech Republic, foreign countries’ population size, foreign countries’ GDP per capita, Czech export value, and foreign import value from the Czech Republic all associated with the count of domains.

However, when we switched the perspective and used data where the count of domains was aggregated by autonomous systems, only GDP per capita and the value of Czech export were estimated to associate with the count of domains. On the basis of observed effect sizes and model performance indices, we may summarize that GDP per capita and the value of Czech export best explain the variances in the domain holders’ tendency to set up their mail servers abroad.

Finally, we also modeled whether the count of domains with MX records within autonomous systems that are operated by companies with domicile in the Czech Republic are connected to the financial turnover of such companies. We found out that they are indeed positively associated.

Appendix

Models

Here, we present posterior predictive checks, multicollinearity checks, and model summaries of models skipped in the main text.

Model 1

Code
color_scheme_set("brightblue")
pp_check(m1, type = "bars", ndraws = 20)
pp_check(m1, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 12: Model 1 diagnostics.

Code
check_collinearity(m1)
# Check for Multicollinearity

Moderate Correlation

         Term  VIF     VIF 95% CI Increased SE Tolerance Tolerance 95% CI
 log(gdp + 1) 5.67 [ 4.50,  7.25]         2.38      0.18     [0.14, 0.22]

High Correlation

                Term   VIF     VIF 95% CI Increased SE Tolerance
  log(distances + 1) 11.11 [ 8.67, 14.31]         3.33      0.09
 log(population + 1) 29.35 [22.71, 38.03]         5.42      0.03
     log(export + 1) 60.39 [46.57, 78.38]         7.77      0.02
     log(import + 1) 29.05 [22.48, 37.65]         5.39      0.03
 Tolerance 95% CI
     [0.07, 0.12]
     [0.03, 0.04]
     [0.01, 0.02]
     [0.03, 0.04]
Code
summary(m1)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(distances + 1) + log(population + 1) + log(gdp + 1) + log(export + 1) + log(import + 1) 
         zi ~ log(distances + 1) + log(population + 1) + log(gdp + 1) + log(export + 1) + log(import + 1)
   Data: df_cc2 (Number of observations: 188) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Population-Level Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            -28.13      5.33   -38.83   -17.79 1.00     4994     6659
zi_Intercept          -0.85     35.52   -56.43    87.80 1.00     5066     1712
logdistancesP1         1.70      0.61     0.49     2.90 1.00     3135     4607
logpopulationP1       -1.08      0.38    -1.84    -0.36 1.00     3192     4766
loggdpP1               0.71      0.49    -0.26     1.65 1.00     3436     5134
logexportP1            1.73      0.50     0.76     2.70 1.00     3171     4536
logimportP1           -0.10      0.27    -0.62     0.44 1.00     4530     5365
zi_logdistancesP1      0.45      4.62   -14.48     5.50 1.00     2312      910
zi_logpopulationP1     2.98      3.05     0.09    11.74 1.00     1553     1082
zi_loggdpP1            2.19      4.44    -1.33    16.33 1.00     1757     1028
zi_logexportP1        -3.12      3.45   -13.50    -0.21 1.00     1511     1020
zi_logimportP1        -1.27      0.85    -3.53    -0.23 1.00     2247     1380

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.20      0.03     0.15     0.27 1.00     3606     5724

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Model 2

Code
pp_check(m2, type = "bars", ndraws = 20)
pp_check(m2, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 13: Model 2 diagnostics.

Code
summary(m2)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(distances + 1) 
         zi ~ log(distances + 1)
   Data: df_cc2 (Number of observations: 188) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Population-Level Effects: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            12.56      1.87     9.17    16.58 1.00     8570     6588
zi_Intercept        -12.84      3.24   -20.00    -7.43 1.00     8418     5379
logdistancesP1       -0.54      0.23    -1.02    -0.11 1.00     7882     6572
zi_logdistancesP1     1.44      0.36     0.81     2.21 1.00     9023     5858

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.10      0.02     0.06     0.14 1.00     5941     5956

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m2_plot <- 
  plot(
    conditional_effects(m2,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  xlab("Geographic distance (km)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m2_plot_zi <- 
  plot(
    conditional_effects(m2,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("Zero-inflation probability") +
  xlab("Geographic distance (km)") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m2_plot_zi
m2_plot

(a) Geographic distance (zero-inflation).

(b) Geographic distance (domains’ count).

Figure 14: Associations between geographic distance and domains with MX records.

Model m2 estimated that geographic distance associates positively with domains’ zero-inflation and negatively with the count of domains using MX records.

Model 3

Code
pp_check(m3, type = "bars", ndraws = 20)
pp_check(m3, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 15: Model 3 diagnostics.

Code
summary(m3)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(population + 1) 
         zi ~ log(population + 1)
   Data: df_cc2 (Number of observations: 188) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Population-Level Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              1.54      3.44    -4.98     8.28 1.00     7026     7316
zi_Intercept          10.65      5.84     0.03    23.00 1.00     5131     4995
logpopulationP1        0.40      0.21    -0.01     0.81 1.00     7312     7527
zi_logpopulationP1    -0.87      0.42    -1.80    -0.16 1.00     4676     4541

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.06      0.01     0.04     0.08 1.00     5631     4483

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
m3_plot <- 
  plot(
    conditional_effects(m3,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number_si()) +
  xlab("Population size") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m3_plot_zi <- 
  plot(
    conditional_effects(m3,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("Zero-inflation probability") +
  xlab("Population size") +
  scale_x_continuous(labels = scales::label_number_si()) +
  theme(text = element_text(size = 20))
m3_plot_zi
m3_plot

(a) Population size (zero-inflation).

(b) Population size (domains’ count).

Figure 16: Associations between population size and domains with MX records.

Model m3 estimated that population size associates negatively with domains’ zero-inflation and positively with the count of domains using MX records (albeit the 95% credible interval contained zero, suggesting uncertainty over the existence of such effect).

Model 4

Code
pp_check(m4, type = "bars", ndraws = 20)
pp_check(m4, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 17: Model 4 diagnostics.

Model 5

Code
pp_check(m5, type = "bars", ndraws = 20)
pp_check(m5, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 18: Model 5 diagnostics.

Model 6

Code
pp_check(m6, type = "bars", ndraws = 20)
pp_check(m6, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 19: Model 6 diagnostics.

Model 7

Code
m7 <-
  brm(
    mx ~ 
      distance_log +
      population_log +
      gdp_log +
      export_log +
      import_log +
      (1|cc_iso),
    family  = negbinomial(),
    data    = df_asn2,
    prior   = c(prior(normal(0,2), class = Intercept),
                prior(normal(0,2), class = b)),
    chains  = 4,
    iter    = 4000,
    warmup  = 1000,
    cores   = parallel::detectCores(),
    backend = "cmdstanr",
    threads = threading(2))
Code
pp_check(m7, type = "bars", ndraws = 20)
pp_check(m7, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 20: Model 7 diagnostics.

Code
check_collinearity(m7)
# Check for Multicollinearity

Low Correlation

         Term  VIF     VIF 95% CI Increased SE Tolerance Tolerance 95% CI
 distance_log 4.79 [ 4.51,  5.09]         2.19      0.21     [0.20, 0.22]
      gdp_log 3.84 [ 3.62,  4.07]         1.96      0.26     [0.25, 0.28]

Moderate Correlation

           Term  VIF     VIF 95% CI Increased SE Tolerance Tolerance 95% CI
 population_log 7.20 [ 6.76,  7.66]         2.68      0.14     [0.13, 0.15]
     import_log 6.03 [ 5.67,  6.41]         2.46      0.17     [0.16, 0.18]

High Correlation

       Term   VIF     VIF 95% CI Increased SE Tolerance Tolerance 95% CI
 export_log 14.07 [13.19, 15.02]         3.75      0.07     [0.07, 0.08]
Code
summary(m7)
 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: mx ~ distance_log + population_log + gdp_log + export_log + import_log + (1 | cc_iso) 
   Data: df_asn2 (Number of observations: 2970) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Group-Level Effects: 
~cc_iso (Number of levels: 187) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.29      0.15     1.03     1.63 1.00     3151     5531

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept        -20.88      2.94   -26.65   -15.19 1.00     3238     6104
distance_log       1.48      0.33     0.84     2.12 1.00     2122     4246
population_log    -0.90      0.24    -1.37    -0.43 1.00     2155     3989
gdp_log            0.03      0.28    -0.53     0.58 1.00     1930     3714
export_log         1.14      0.26     0.64     1.65 1.00     2834     4828
import_log         0.15      0.14    -0.12     0.42 1.00     5095     7073

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.27      0.01     0.26     0.28 1.00    19643     9083

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Model 8

Code
pp_check(m8, type = "bars", ndraws = 20)
pp_check(m8, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 21: Model 8 diagnostics.

Code
check_collinearity(m8)
# Check for Multicollinearity

Low Correlation

                Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
  log(distances + 1) 1.10 [1.07, 1.15]         1.05      0.91     [0.87, 0.94]
 log(population + 1) 1.12 [1.08, 1.17]         1.06      0.89     [0.85, 0.92]
        log(gdp + 1) 1.08 [1.05, 1.13]         1.04      0.93     [0.88, 0.95]

Model 9

Code
pp_check(m9, type = "bars", ndraws = 20)
pp_check(m9, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 22: Model 9 diagnostics.

Model 10

Code
pp_check(m10, type = "bars", ndraws = 20)
pp_check(m10, type = "bars", ndraws = 20) +
  xlim(-1,20)

(a) Posterior predictive check.

(b) Posterior predictive check (detail).

Figure 23: Model 10 diagnostics.

Model 11

Code
pp_check(m11, type = "ecdf_overlay", ndraws = 20)

Figure 24: Model 11 diagnostics: Poterior predictive check.

Domains using MX records across countries

Code
df_cc <- df_cc |>
  mutate(perc_nocz = mx /
                     (sum(mx) - 1056497)
                     * 100) |>
  mutate(perc_nocz =
           if_else(perc_nocz > 100, NA, perc_nocz))
df_cc |>
  select(country_flag, cc, mx, perc, perc_nocz) |>
  arrange(desc(mx)) |>
  rename("Flag"                        = "country_flag",
         "Country"                     = "cc",
         "Domains' count"              = "mx",
         "Domains' percentage"         = "perc",
         "Domains' percentage (no CZ)" = "perc_nocz") |>
  kable("html",
        digits = 6,
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 4: Domains with MX records (complete country list).
Flag Country Domains' count Domains' percentage Domains' percentage (no CZ)
🇨🇿 CZ 1 056 497 62.959617 NA
🇸🇰 SK 112 651 6.713189 18.123972
🇺🇸 US 95 224 5.674665 15.320211
🇳🇱 NL 71 401 4.254986 11.487424
🇩🇪 DE 49 552 2.952943 7.972225
🇫🇮 FI 47 021 2.802113 7.565022
🇵🇱 PL 43 726 2.605755 7.034903
NA Unknown 43 220 2.575601 6.953494
🇸🇬 SG 29 561 1.761623 4.755952
🇮🇪 IE 28 049 1.671519 4.512692
🇹🇼 TW 27 327 1.628493 4.396533
🇦🇹 AT 21 870 1.303295 3.518578
🇫🇷 FR 15 342 0.914273 2.468313
🇭🇰 HK 7 887 0.470008 1.268908
🇬🇧 GB 5 661 0.337355 0.910776
🇨🇭 CH 2 135 0.127231 0.343492
🇨🇳 CN 2 049 0.122106 0.329655
🇨🇦 CA 2 021 0.120437 0.325151
🇮🇹 IT 1 835 0.109353 0.295226
🇸🇨 SC 1 758 0.104764 0.282838
🇭🇺 HU 1 516 0.090343 0.243903
🇦🇺 AU 1 223 0.072882 0.196764
🇸🇪 SE 1 204 0.071750 0.193707
🇱🇺 LU 1 143 0.068115 0.183893
🇩🇰 DK 1 077 0.064181 0.173274
🇷🇺 RU 1 025 0.061083 0.164908
🇯🇵 JP 683 0.040702 0.109885
🇰🇷 KR 621 0.037007 0.099910
🇧🇪 BE 554 0.033014 0.089131
🇲🇾 MY 495 0.029498 0.079639
🇺🇦 UA 458 0.027294 0.073686
🇪🇸 ES 368 0.021930 0.059206
🇦🇪 AE 364 0.021692 0.058563
🇿🇦 ZA 269 0.016030 0.043278
🇹🇷 TR 266 0.015852 0.042796
🇱🇹 LT 242 0.014421 0.038934
🇮🇳 IN 218 0.012991 0.035073
🇪🇪 EE 196 0.011680 0.031534
🇸🇮 SI 185 0.011025 0.029764
🇧🇬 BG 171 0.010190 0.027512
🇷🇴 RO 155 0.009237 0.024937
🇳🇴 NO 122 0.007270 0.019628
🇻🇬 VG 97 0.005781 0.015606
🇲🇺 MU 92 0.005483 0.014802
🇰🇿 KZ 87 0.005185 0.013997
🇨🇾 CY 75 0.004469 0.012066
🇱🇻 LV 61 0.003635 0.009814
🇭🇷 HR 24 0.001430 0.003861
🇵🇹 PT 23 0.001371 0.003700
🇮🇱 IL 22 0.001311 0.003539
🇬🇪 GE 21 0.001251 0.003379
🇮🇲 IM 21 0.001251 0.003379
🇻🇳 VN 21 0.001251 0.003379
🇮🇸 IS 19 0.001132 0.003057
🇧🇾 BY 17 0.001013 0.002735
🇷🇸 RS 14 0.000834 0.002252
🇱🇮 LI 13 0.000775 0.002092
🇬🇷 GR 9 0.000536 0.001448
🇺🇿 UZ 8 0.000477 0.001287
🇺🇬 UG 8 0.000477 0.001287
🇳🇿 NZ 7 0.000417 0.001126
🇸🇦 SA 7 0.000417 0.001126
🇲🇩 MD 6 0.000358 0.000965
🇱🇧 LB 5 0.000298 0.000804
🇧🇿 BZ 5 0.000298 0.000804
🇮🇩 ID 4 0.000238 0.000644
🇬🇮 GI 4 0.000238 0.000644
🇵🇭 PH 4 0.000238 0.000644
🇦🇲 AM 3 0.000179 0.000483
🇩🇲 DM 3 0.000179 0.000483
🇧🇷 BR 3 0.000179 0.000483
🇸🇲 SM 2 0.000119 0.000322
🇲🇹 MT 2 0.000119 0.000322
🇵🇦 PA 2 0.000119 0.000322
🇮🇶 IQ 2 0.000119 0.000322
🇧🇦 BA 2 0.000119 0.000322
🇨🇷 CR 2 0.000119 0.000322
🇲🇰 MK 2 0.000119 0.000322
🇧🇩 BD 2 0.000119 0.000322
🇹🇭 TH 2 0.000119 0.000322
🇲🇳 MN 1 0.000060 0.000161
🇲🇽 MX 1 0.000060 0.000161
🇦🇱 AL 1 0.000060 0.000161
🇵🇰 PK 1 0.000060 0.000161
🇲🇪 ME 1 0.000060 0.000161
🇹🇳 TN 1 0.000060 0.000161
🇲🇨 MC 1 0.000060 0.000161
🇨🇱 CL 1 0.000060 0.000161
🇮🇷 IR 1 0.000060 0.000161
🇦🇷 AR 1 0.000060 0.000161
🇰🇼 KW 1 0.000060 0.000161
🇻🇨 VC 1 0.000060 0.000161

Domains using MX records across autonomous systems

Code
df_asn <- df_asn |>
  mutate(perc      = mx / sum(mx) * 100,
         perc_nocz = mx / (sum(mx) - 1222588) * 100,
         perc_cz   = mx / (sum(mx) - 364809) * 100) |>
  mutate(perc_nocz =
           if_else(cc == "CZ", NA, perc_nocz),
         perc_cz   =
           if_else(cc != "CZ", NA, perc_cz)) |>
  mutate(perc      = round(perc, 2),
         perc_nocz = round(perc_nocz, 2),
         perc_cz   = round(perc_cz, 2)) |>
  arrange(desc(mx))
df_asn |>
  mutate(holder       = replace_na(holder, "Unknown"),
         country_flag = replace(country_flag,
                                holder == "Unknown",
                                " ")) |>
  select(country_flag,
         cc,
         asn,
         holder,
         mx,
         perc,
         perc_nocz,
         perc_cz) |>
  arrange(desc(mx)) |>
  filter(mx > 500) |>
  rename("Flag"                 = "country_flag",
         "Country"              = "cc",
         "ASN"                  = "asn",
         "AS & holder"          = "holder",
         "Domains"              = "mx",
         "Percentage"           = "perc",
         "Percentage (no CZ)"   = "perc_nocz",
         "Percentage (CZ only)" = "perc_cz") |>
  kable("html",
        digits = 2,
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 5: Autonomous systems with >500 domains.
Flag Country ASN AS & holder Domains Percentage Percentage (no CZ) Percentage (CZ only)
🇨🇿 CZ 197019 WEDOS WEDOS Internet, a.s. 215 911 13.29 NA 17.14
🇨🇿 CZ 15685 CASABLANCA-AS CASABLANCA INT a.s. 145 462 8.95 NA 11.55
🇨🇿 CZ 24806 INTERNET-CZ INTERNET CZ, a.s. 140 435 8.64 NA 11.15
🇨🇿 CZ 43541 VSHOSTING VSHosting s.r.o. 117 684 7.24 NA 9.34
🇨🇿 CZ 25234 GLOBE-AS ACTIVE 24, s.r.o. 97 072 5.98 NA 7.71
🇨🇿 CZ 29134 IGNUM-AS Webglobe, s.r.o. 91 286 5.62 NA 7.25
🇨🇿 CZ 24971 MASTER-AS Master Internet s.r.o. 70 591 4.35 NA 5.60
🇸🇰 SK 48689 WEBGLOBE-SK-AS Webglobe, a.s. 58 605 3.61 14.58 NA
🇺🇸 US 15169 GOOGLE 43 248 2.66 10.76 NA
🇺🇸 US 8075 MICROSOFT-CORP-MSN-AS-BLOCK 39 911 2.46 9.93 NA
🇸🇰 SK 51013 WEBSUPPORT-SRO-SK-AS WebSupport s.r.o. 37 964 2.34 9.44 NA
🇨🇿 CZ 43037 SEZNAM-CZ Seznam.cz, a.s. 36 811 2.27 NA 2.92
NA NA Unknown 36 669 2.26 NA NA
🇨🇿 CZ 34222 ZONER-AS ZONER a.s. 34 736 2.14 NA 2.76
🇨🇿 CZ 39392 SuperNetwork SH.cz s.r.o. 33 013 2.03 NA 2.62
🇨🇿 CZ 39790 WEB4U Web4U s.r.o. 28 826 1.77 NA 2.29
🇩🇪 DE 24940 HETZNER-AS Hetzner Online GmbH 26 846 1.65 6.68 NA
🇨🇿 CZ 29208 QUANTCOM-AS Quantcom, a.s. 26 143 1.61 NA 2.08
🇺🇸 US 14061 DIGITALOCEAN-ASN 20 620 1.27 5.13 NA
🇨🇿 CZ 60592 GRANSY Gransy s.r.o. 16 904 1.04 NA 1.34
🇨🇿 CZ 59925 GIGASERVER Seonet Multimedia s.r.o. 16 527 1.02 NA 1.31
🇺🇸 US 13335 CLOUDFLARENET 15 257 0.94 3.80 NA
🇨🇿 CZ 16019 VODAFONE-CZ-AS Vodafone Czech Republic a.s. 14 996 0.92 NA 1.19
🇨🇿 CZ 35592 COOLHOUSING-AS Coolhousing s.r.o. 13 679 0.84 NA 1.09
🇺🇸 US 16509 AMAZON-02 10 961 0.67 2.73 NA
🇫🇷 FR 16276 OVH OVH SAS 10 483 0.65 2.61 NA
🇨🇿 CZ 13036 TMOBILE-CZ T-Mobile Czech Republic a.s. 9 297 0.57 NA 0.74
🇨🇿 CZ 25248 BlueTone-AS RADIOKOMUNIKACE a.s. 8 715 0.54 NA 0.69
🇨🇿 CZ 5610 O2-CZECH-REPUBLIC O2 Czech Republic, a.s. 7 718 0.48 NA 0.61
🇺🇸 US 396982 GOOGLE-CLOUD-PLATFORM 7 526 0.46 1.87 NA
🇸🇰 SK 29405 VNET-AS VNET a.s. 5 470 0.34 1.36 NA
🇸🇬 SG 64050 BCPL-SG BGPNET Global ASN 5 076 0.31 1.26 NA
🇨🇿 CZ 42422 SECURITYNET-AS SecurityNet.cz s.r.o. 5 066 0.31 NA 0.40
🇨🇿 CZ 197013 SPRINTEL-SRO Sprintel s.r.o. 5 008 0.31 NA 0.40
🇨🇿 CZ 24641 FASTER-AS FASTER CZ spol. s r.o. 4 578 0.28 NA 0.36
🇨🇿 CZ 51210 KRAXNET-AS KRAXNET s.r.o. 4 128 0.25 NA 0.33
🇮🇱 IL 58182 wix_com Wix.com Ltd. 4 050 0.25 1.01 NA
🇨🇿 CZ 47317 WEB4CE Web4ce, s.r.o. 3 999 0.25 NA 0.32
🇨🇿 CZ 44984 Fortion Fortion Networks, s.r.o. 3 124 0.19 NA 0.25
🇨🇿 CZ 30764 PODA-AS PODA a.s. 3 108 0.19 NA 0.25
🇸🇬 SG 63949 AKAMAI-LINODE-AP Akamai Connected Cloud 2 551 0.16 0.63 NA
🇩🇪 DE 51167 CONTABO Contabo GmbH 2 469 0.15 0.61 NA
🇨🇿 CZ 21430 WIA-AS WIA spol. s.r.o. 2 006 0.12 NA 0.16
🇩🇪 DE 15598 IPX-AS15598 NorthC Deutschland GmbH 2 004 0.12 0.50 NA
🇨🇿 CZ 12570 ITSELF Nej.cz s.r.o. 1 926 0.12 NA 0.15
🇨🇿 CZ 35236 AS35236 2 connect a.s. 1 792 0.11 NA 0.14
🇨🇿 CZ 196653 ASBESTNETCZ BEST-NET s.r.o. 1 743 0.11 NA 0.14
🇫🇷 FR 29169 GANDI-AS GANDI SAS 1 719 0.11 0.43 NA
🇸🇰 SK 5578 AS-BENESTRA SWAN, a.s. 1 714 0.11 0.43 NA
🇨🇿 CZ 2852 CESNET2 CESNET z.s.p.o. 1 626 0.10 NA 0.13
🇨🇿 CZ 25512 CDT-AS CD-Telematika a.s. 1 550 0.10 NA 0.12
🇺🇸 US 2639 ZOHO-AS 1 456 0.09 0.36 NA
🇨🇿 CZ 198167 AppToCloud Apptc.me s.r.o. 1 453 0.09 NA 0.12
🇩🇪 DE 61969 TEAMINTERNET-AS Team Internet AG 1 326 0.08 0.33 NA
🇨🇿 CZ 206566 SAVANACZ Webglobe, s.r.o. 1 265 0.08 NA 0.10
🇨🇿 CZ 43542 OptoNet-AS OptoNet Communication, spol. s.r.o. 1 249 0.08 NA 0.10
🇨🇿 CZ 56566 SATT-AS SATT a.s. 1 243 0.08 NA 0.10
🇨🇿 CZ 8251 NFX_ZSPO FreeTel, s.r.o. 1 232 0.08 NA 0.10
🇬🇧 GB 48254 TWENTYI 20i Limited 1 203 0.07 0.30 NA
🇨🇿 CZ 52092 ALFSERVIS-AS Alf servis, s.r.o. 1 198 0.07 NA 0.10
🇨🇿 CZ 200828 THOSTING-AS TERMS a.s. 1 194 0.07 NA 0.09
🇨🇿 CZ 202682 g2server Geetoo CZ s.r.o. 1 145 0.07 NA 0.09
🇩🇪 DE 21499 GODADDY-SXB Host Europe GmbH 1 128 0.07 0.28 NA
🇨🇿 CZ 49985 IP4ISP IP4ISP z.s.p.o 1 089 0.07 NA 0.09
🇨🇿 CZ 47232 ISPAlliance ISP Alliance a.s. 1 072 0.07 NA 0.09
🇨🇿 CZ 28725 CETIN-AS CETIN a.s. 1 070 0.07 NA 0.08
🇺🇸 US 14618 AMAZON-AES 1 057 0.07 0.26 NA
🇨🇿 CZ 44770 SAVVY-AS Savvy s.r.o. 1 038 0.06 NA 0.08
🇨🇿 CZ 44489 Starnet STARNET, s.r.o. 1 023 0.06 NA 0.08
🇨🇿 CZ 42306 EDERA_Group Edera Group a.s. 995 0.06 NA 0.08
🇺🇸 US 54113 FASTLY 989 0.06 0.25 NA
🇨🇿 CZ 20723 MGI AVONET, s.r.o. 967 0.06 NA 0.08
🇭🇰 HK 140227 HKCICL-AS-AP Hong Kong Communications International Co., Limited 963 0.06 0.24 NA
🇫🇷 FR 200185 XANDMAIL-ASN Aruba SAS 958 0.06 0.24 NA
🇺🇸 US 714 APPLE-ENGINEERING 948 0.06 0.24 NA
🇨🇿 CZ 206548 TLAP-ZCOM ZCOM.cz s.r.o 894 0.06 NA 0.07
🇭🇰 HK 132325 LEMON-AS-AP LEMON TELECOMMUNICATIONS LIMITED 889 0.05 0.22 NA
🇨🇿 CZ 50695 VLP-AS VLTAVA LABE MEDIA a.s. 881 0.05 NA 0.07
🇭🇰 HK 136950 HIITL-AS-AP Hong Kong FireLine Network LTD 871 0.05 0.22 NA
🇨🇿 CZ 57707 GREENDATA Greendata s.r.o. 852 0.05 NA 0.07
🇺🇸 US 40065 CNSERVERS 845 0.05 0.21 NA
🇸🇰 SK 42005 LIGHTSTORM-COMMUNICATIONS-SRO-SK-AS LightStorm Services s.r.o. 819 0.05 0.20 NA
🇺🇸 US 46606 UNIFIEDLAYER-AS-1 813 0.05 0.20 NA
🇩🇰 DK 51468 ONECOM One.com A/S 811 0.05 0.20 NA
🇨🇿 CZ 15935 HA-VEL-LOCAL-AS ha-vel internet s.r.o. 806 0.05 NA 0.06
🇨🇭 CH 41913 COMPUTERLINE Computerline GmbH 791 0.05 0.20 NA
🇨🇿 CZ 16246 AS16246 Nej.cz s.r.o. 771 0.05 NA 0.06
🇵🇱 PL 12824 HOMEPL-AS home.pl S.A. 770 0.05 0.19 NA
🇲🇾 MY 55720 GIGABIT-MY Gigabit Hosting Sdn Bhd 764 0.05 0.19 NA
🇭🇰 HK 59371 DNC-AS Dimension Network & Communication Limited 742 0.05 0.18 NA
🇱🇺 LU 24611 DCLUX-AS Datacenter Luxembourg S.A. 722 0.04 0.18 NA
🇨🇿 CZ 41453 TRESTEL-CZ-ASN SITEL spol. s r.o. 714 0.04 NA 0.06
🇨🇿 CZ 207167 amccomp-master-brno amccomp s.r.o. 704 0.04 NA 0.06
🇨🇭 CH 62371 PROTON Proton AG 700 0.04 0.17 NA
🇭🇰 HK 136970 YISUCLOUDLTD-AS-AP YISU CLOUD LTD 673 0.04 0.17 NA
🇨🇿 CZ 43070 JAW-AS JAW.cz s.r.o. 665 0.04 NA 0.05
🇦🇺 AU 133618 TRELLIAN-AS-AP Trellian Pty. Limited 626 0.04 0.16 NA
🇺🇸 US 174 COGENT-174 616 0.04 0.15 NA
🇨🇿 CZ 12767 PRAGONET-AS T-Mobile Czech Republic a.s. 605 0.04 NA 0.05
🇨🇿 CZ 25424 INEXT-CZ InterneXt 2000, s.r.o. 605 0.04 NA 0.05
🇫🇷 FR 24935 ATE-AS AVENIR TELEMATIQUE SAS 602 0.04 0.15 NA
🇺🇸 US 19574 CSC 599 0.04 0.15 NA
🇨🇿 CZ 31246 NETBOX-AS Nej.cz s.r.o. 592 0.04 NA 0.05
🇬🇧 GB 34922 NETNAMES CSC Administrative Services Limited 586 0.04 0.15 NA
🇨🇿 CZ 49767 INTERNETPB-AS Nej.cz s.r.o. 584 0.04 NA 0.05
🇨🇿 CZ 15614 Dragon Dragon Internet a.s. 560 0.03 NA 0.04
🇨🇿 CZ 39817 OVANET OvaNet, a.s. 558 0.03 NA 0.04
🇩🇪 DE 3320 DTAG Deutsche Telekom AG 551 0.03 0.14 NA
🇨🇿 CZ 43614 ECONOMIA-CZ Economia a.s. 543 0.03 NA 0.04
🇭🇰 HK 136778 AIJIASU-AS-AP HONGKONG AI JIA SU NETWORK CO.,LIMITED 523 0.03 0.13 NA
🇺🇸 US 19871 NETWORK-SOLUTIONS-HOSTING 522 0.03 0.13 NA

Countries with missing values

Code
df_cc_NAs |>
  arrange(cc_iso) |>
  mutate(name = countrycode(
    cc_iso, origin = "iso3c",
    destination    = "country.name")) |>
  select(mx,
         cc_iso,
         name,
         distances,
         population,
         gdp,
         export,
         import) |>
  rename("Domains"        = "mx",
         "cc"             = "cc_iso",
         "Country"        = "name",
         "Distance"       = "distances",
         "Population"     = "population",
         "GDP per capita" = "gdp",
         "CZ export"      = "export",
         "Foreign import" = "import") |>
  kable("html",
        digits = 2,
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 6: Countries with missing values.
Domains cc Country Distance Population GDP per capita CZ export Foreign import
0 ABW Aruba NA 106 537 29 342.10 193 538 5 515
0 AFE NA NA 702 976 832 1 549.77 0 0
0 AFW NA NA 478 185 907 1 757.03 0 0
0 AIA Anguilla NA NA NA 2 554 84 322
0 AND Andorra 1 330.73 79 034 42 137.33 1 893 288 802 964
0 ARB NA NA 456 520 777 6 271.32 0 0
0 ASM American Samoa NA 45 035 15 743.31 0 6 298
0 ATA Antarctica 14 459.66 NA NA 0 0
0 ATF French Southern Territories NA NA NA 161 063 210 451
0 ATG Antigua & Barbuda NA 93 219 15 781.40 159 262 392 345
0 BLM St. Barthélemy NA NA NA 52 121 70 599
0 CCK Cocos (Keeling) Islands NA NA NA 0 24 608
0 CEB NA NA 101 430 997 18 751.03 0 0
0 CHI NA NA 172 683 NA 0 0
0 COK Cook Islands 16 806.80 NA NA 0 89 407
0 CPV Cape Verde NA 587 925 3 293.23 314 433 6 961
0 CSS NA NA 7 481 877 10 063.69 0 0
0 CUW Curaçao NA 152 369 17 717.60 335 455 139
0 CXR Christmas Island NA NA NA 0 29 606
0 CYM Cayman Islands NA 68 136 86 568.77 186 368 53 520
0 CZE Czechia NA 10 505 772 26 821.25 0 0
3 DMA Dominica NA 72 412 7 653.17 299 631 541 919
0 EAP NA NA 2 123 673 456 9 771.65 0 0
0 EAR NA NA 3 411 889 059 3 703.90 0 0
0 EAS NA NA 2 370 204 347 13 041.78 0 0
0 ECA NA NA 401 575 218 8 758.94 0 0
0 ECS NA NA 923 777 214 27 152.47 0 0
0 EMU NA NA 343 067 846 42 450.15 0 0
0 EUU NA NA 447 199 800 38 411.06 0 0
0 FCS NA NA 1 001 462 452 1 796.40 0 0
0 FLK Falkland Islands 13 274.65 NA NA 0 3 687 603
0 FSM Micronesia (Federated States of) NA 113 131 3 571.34 373 213 1 601
4 GIB Gibraltar NA 32 669 NA 617 982 641 292
0 GRD Grenada NA 124 610 9 010.57 51 303 342 465
0 HIC NA NA 1 240 629 858 48 225.24 0 0
0 HPC NA NA 861 156 745 1 035.70 0 0
0 IBD NA NA 4 895 364 297 7 235.96 0 0
0 IBT NA NA 6 695 397 735 5 684.93 0 0
0 IDA NA NA 1 800 033 438 1 465.40 0 0
0 IDB NA NA 595 016 031 1 842.77 0 0
0 IDX NA NA 1 205 017 407 1 281.63 0 0
21 IMN Isle of Man NA 84 263 NA 0 0
0 INX NA NA NA NA 0 0
0 IOT British Indian Ocean Territory NA NA NA 0 63 856
0 KNA St. Kitts & Nevis NA 47 606 18 082.61 2 014 1 232 314
0 LAC NA NA 593 308 001 7 728.03 0 0
0 LCA St. Lucia NA 179 651 9 414.23 19 514 73 362
0 LCN NA NA 654 981 699 8 327.61 0 0
0 LDC NA NA 1 099 568 566 1 164.92 0 0
0 LIC NA NA 709 088 502 793.95 0 0
13 LIE Liechtenstein NA 39 039 NA 0 0
0 LMC NA NA 3 398 187 527 2 572.71 0 0
0 LMY NA NA 6 619 578 961 5 494.40 0 0
0 LTE NA NA 2 321 862 159 11 465.86 0 0
0 MAC Macao SAR China NA 686 607 43 873.59 2 410 625 6 102 867
0 MAF Saint Martin (French part) NA 31 948 NA 295 045 1 425
1 MCO Monaco NA 36 686 234 315.46 0 0
0 MEA NA NA 486 167 363 7 569.09 0 0
0 MIC NA NA 5 901 323 889 6 074.08 0 0
0 MNA NA NA 418 047 201 3 573.17 0 0
0 MNP Northern Mariana Islands 11 332.89 49 481 NA 0 450
0 MSR Montserrat NA NA NA 1 002 8 026
0 NAC NA NA 370 203 720 68 369.66 0 0
0 NFK Norfolk Island NA NA NA 0 6 165
0 NIU Niue NA NA NA 0 229 788
0 OED NA NA 1 372 736 513 42 446.86 0 0
0 OSS NA NA 32 708 003 13 878.81 0 0
0 PCN Pitcairn Islands 15 795.68 NA NA 0 4 579 011
0 PRE NA NA 1 011 043 183 1 418.59 0 0
0 PSS NA NA 2 602 173 3 643.85 0 0
0 PST NA NA 1 116 796 245 49 223.60 0 0
0 PYF French Polynesia NA 304 032 19 914.60 4 106 106 29 908
0 SAS NA NA 1 901 911 604 2 149.82 0 0
0 SHN St. Helena 7 570.16 NA NA 1 462 045 11 852
2 SMR San Marino NA 33 745 NA 3 290 768 2 546 717
0 SPM St. Pierre & Miquelon 5 117.12 NA NA 0 50 748
0 SSA NA NA 1 181 063 481 1 632.09 0 0
0 SSD South Sudan NA 10 748 272 NA 179 826 168 629
0 SSF NA NA 1 181 162 739 1 633.18 0 0
0 SST NA NA 42 792 053 12 588.93 0 0
0 SXM Sint Maarten NA 42 846 NA 0 0
0 SYR Syria 2 499.45 21 324 367 NA 3 311 362 237 638
0 TEA NA NA 2 097 669 023 9 880.10 0 0
0 TEC NA NA 462 341 222 9 842.05 0 0
0 TKL Tokelau NA NA NA 0 36 733
0 TLA NA NA 639 188 695 8 108.92 0 0
0 TMN NA NA 413 124 452 3 572.07 0 0
0 TSA NA NA 1 901 911 604 2 149.82 0 0
0 TSS NA NA 1 181 162 739 1 633.18 0 0
0 UMC NA NA 2 503 136 362 10 828.05 0 0
1 VCT St. Vincent & Grenadines NA 104 332 8 666.39 38 485 50 047
97 VGB British Virgin Islands NA 31 122 NA 2 228 954 673 373
0 VIR U.S. Virgin Islands NA 105 870 NA 0 0
0 WLD NA NA 7 888 408 686 12 236.62 0 0
0 WLF Wallis & Futuna NA NA NA 0 4 386
0 XKX NA NA 1 786 038 5 269.78 0 0
Other ADAM reports » Další reporty »