Exploring the distribution of mail services in the CZ domain

ADAM report 1/2023

Authors

Dan Řezníček

Ladislav Lhotka

Published

July 27, 2023

Code
library(dplyr)
library(ggplot2)
library(plotly)
library(lubridate)
library(tidyr)
library(knitr)
library(kableExtra)
library(formattable)
library(jsonlite)
library(forcats)
library(colorspace)
library(readr)
library(brms)
library(bayesplot)
library(tidybayes)
library(future)
library(countrycode)
library(rnaturalearth)
library(sf)
library(lwgeom)
library(maps)
library(units)
library(s2)
library(corrplot)
library(performance)
library(ggthemes)
library(patchwork)
library(units)
library(stringr)
library(stringi)
library(cmdstanr)
library(scales)
theme_set(theme_minimal())
Code
root_endpoint <- "https://stats.nic.cz"

api_endpoint <- function(endpoint, ...) {
  pars <- list(...)
  if (length(pars) == 0) {
    return(paste0(root_endpoint, endpoint))
  }
  pstr <- mapply(function(x, y) paste(x, URLencode(toString(y)), sep = "="),
                 names(pars), pars)
  paste0(root_endpoint, endpoint, "?", paste(pstr, collapse = "&"))
}

Introduction

During the early years of the Internet, all institutions operated their own mail servers. Their MX (mail exchange) resource records in the DNS thus pointed to domain names within the same domain. However, over the past 10 or 15 years, MX records have undergone a process of outsourcing to mail service providers ranging from local ISPs to large national or global technology companies such as Seznam.cz or Google’s GMail. On the other hand, many second-level domains still have MX records pointing to mail servers within the same domain (see Table 1). In this report, we explore MX records of all second-level domains under CZ with the aim of assessing the distribution and accumulation of mail services across countries and autonomous systems. We provide both exploratory descriptive statistics and also some modeling-based insights.

Code
# These values were taken from a full scan, 2023-07-06.
mx_table <- data.frame(
  "Domains with:" =
    c(" - at least one MX record with its own name.",
      " - at least one MX record without its own name.",
      " - all MX records with its own name."),
  Count  =
    c(132142,
      808737,
      116388),
  check.names = FALSE
  )
mx_table |>
  kable("html",
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 1: Outsourcing of MX records.
Domains with: Count
- at least one MX record with its own name. 132 142
- at least one MX record without its own name. 808 737
- all MX records with its own name. 116 388

We utilized two primary datasets collected by the CZ.NIC’s DNS crawler, aggregating domain counts by (1) countries and (2) autonomous systems from which the mail servers indicated in MX records are operated. In these datasets, domain names appearing in MX records were resolved to IP address(es) that were then assigned to the corresponding countries or autonomous systems via a geolocation database. Note that a second-level domain may have multiple mail servers operating from various countries and/or autonomous systems. If it is the case, such a domain contributes to the counts of multiple countries or autonomous systems.

The data used in this report are static in the sense that the datasets do not provide a dynamic view of CZ mail server trends (e.g., time series) but just a limited peek into their distributions at a certain date or time period:

  • For the countries, all data correspond to a single day of 2023-06-01.

  • For the autonomous systems, the data belong to an interval from 2023-04-20 to 2023-06-15.

Sections labeled as ▶ Code that are interspersed in the text below can be expanded to reveal the actual R code used for producing the statistics, graphs, and tables.

Country perspective

First, we explored geolocations of mail servers of all second-level domains under CZ, using data provided by DNS crawler endpoint /crawler_mail_cc, which aggregates the count of domains with MX records by countries.

Code
df_cc  <- fromJSON(api_endpoint("/crawler_mail_cc"))
country_flags <- read.csv("CSV/country_flags.csv", sep = ";")
Code
df_cc <- df_cc |>
  mutate(ts = as.Date(ts)) |>
  group_by(cc) |>
  filter(ts == "2023-06-01") |>
  ungroup() |>
  rename("mx" = "domains") |>
  mutate(perc = mx / sum(mx) *100) |>
  replace_na(list(cc = "Unknown"))
country_flags <- country_flags |>
  rename(cc = country_code)
df_cc <- left_join(df_cc, country_flags, by = "cc") |>
  replace_na(list(country_flags = "?"))

Distribution of domains with MX records

Table 2 describes the distribution of second-level domains that have at least one MX record. The first row shows frequencies for all countries; however, as the global distribution is hugely influenced by the CZ frequencies (the maximum value at 1 056 497, 62.96% of all such domains; see also Figure 1 below and Table 4 in the Appendix), the second row presents frequencies for all countries except for the Czech Republic, alleviating the bias stemming from the CZ dominance.

Code
a <- df_cc |>
  summarize(Subset = "All countries",
            Min    = min(mx),
            Mean   = mean(mx),
            Median = median(mx),
            Max    = max(mx),
            Total  = sum(mx)
  ) 
b <- df_cc |>
  filter(cc != "CZ") |>
  summarize(Subset = "No CZ",
            Min    = min(mx),
            Mean   = mean(mx),
            Median = median(mx),
            Max    = max(mx),
            Total  = sum(mx)
  )

descriptives_table <- bind_rows(a, b)
descriptives_table |>
  kable("html",
        digits = 2,
        format.args = list(big.mark = " ")) |>
  kable_styling(
    bootstrap_options = c("striped",
                          "hover",
                          "condensed",
                          "responsive"),
    full_width = TRUE,
    position   = "left")
Table 2: Distribution of domains with MX records.
Subset Min Mean Median Max Total
All countries 1 18 239.73 68 1 056 497 1 678 055
No CZ 1 6 830.31 61 112 651 621 558

To illustrate the absolute count of these domains, we also filtered countries with more than 100 domains and plotted a barchart which, once again, shows the overwhelming and unsurprising dominance of domains with MX records pointing to mail servers within the Czech Republic.

Code
df_cc_100 <- df_cc |>
  filter(mx > 100) |> 
  arrange(mx)
plot_ly(
  data = df_cc_100,
  x    = ~mx,
  type = "bar",
  name = "Domains by country",
  hovertemplate = paste0(
    "<b>", "Country: ", "</b>", df_cc_100$cc,
    " ", df_cc_100$country_flag, "<br>",
    "<b>", "Domain count: ", "</b>", df_cc_100$mx,
    " (", round(df_cc_100$perc, 2), "%)",
    "<extra></extra>"
  ),
  text         = paste0(
    df_cc_100$country_flag, " ", df_cc_100$cc),
  textposition = "auto",
  marker       = list(color = c(rep("#003893", 41), "#F4587A")
  )
) |>
  layout(
    hovermode = "y unified",
    xaxis     = list(title = "Domain count"),
    yaxis = list (
      title          = "Country",
      showticklabels = FALSE
    )
  )
Figure 1: Distribution of domains with MX records by country (>100).

A complete list of all countries with domains that use MX records can be inspected in Table 4 in the Appendix. To provide additional insight into the relative weight of foreign countries, Table 4 also presents a column Domains percentage (no CZ) which specifies the percentage of all domains with MX records excluding those with mail servers located in the Czech Republic.

Spatial distribution

Furthermore, we explored the spatial distribution of mail servers indicated in the domains’ MX records. In Figure 2, we present log-transformed values of the absolute count of domains with mail servers in given countries. To inspect the absolute counts, hover the mouse cursor over the respective countries. Note, however, that the color scale representing log-transformed values is somewhat misleading in that it flattens the differences.

Code
df_cc <- df_cc |>
  mutate(cc_iso = countrycode(cc, origin  = "iso2c",
                              destination = "iso3c")
         ) |>
  replace_na(list(cc = "Unknown"))
df_cc <- df_cc |>
  mutate(mx_log = log(mx)) |>
  replace_na(list(cc = "Unknown"))
plot_ly(
  df_cc,
  type          = "choropleth",
  locations     = df_cc$cc_iso,
  z             = df_cc$mx_log,
  colors        = "Blues",
  hovertemplate = paste0(
    "<b>Count: </b>", df_cc$mx,
    " in ", df_cc$country_flag, " ", df_cc$cc, "<br>",
    "<b>Log: </b>", round(df_cc$mx_log, 2),
    "<extra></extra>"
    )
  ) |>
  colorbar(title = "Log")
Figure 2: Worldwide distribution of domains with MX records (log values).

Modeling mail servers outside of the Czech Republic, part 1

Unsurprisingly, IP addresses of most mail servers listed in MX records are geolocated in the Czech Republic. However, one can ask what factors might lead domain holders to set up mail servers outside of the Czech Republic? To provide a better understanding of such variations and possible influences, we utilized Bayesian regression models to test the associations between the count of domains with MX records that indicate mail servers operated from abroad and five predictors which we hypothesized to associate with the these counts.

Linear regression analysis is used to predict the value of a response variable (i.e., dependent variable) based on the value of another predictor variable (i.e., independent variable). In the Bayesian framework, linear regression formulates probability distributions for such predictions (in comparison to point estimates formulated within the frequentist framework). That is to say, the response variable is not estimated as a single value, but is assumed to be drawn from a probability distribution as Bayesian linear regression does not attempt to find one “best” value of the model parameters, but rather to determine the posterior distribution for the model parameters. When doing so, the analyst can also incorporate prior knowledge about what the parameters should be (or use non-informative priors if one has no guesses about the possible values of the parameters).

The result of a Bayesian linear regression model are distributions of possible model parameters (most notably the values by which the response variable changes depending on the predictor variables) based on the data and the priors. Practically, one can express the relationships between the response and predictors variables by describing these posterior distributions. For example, one can report the mean of the parameter’s posterior distribution and its credible interval, which helps to capture the uncertainty about the possible values of the parameter.

Predictors

Geographic distance from the Czech Republic

First, domain holders might prefer mail servers in countries that are geographically closer to the Czech Republic as they usually provide better round-trip times. We thus hypothesized that there should be a higher count of domains that use MX records in countries closer to the Czech Republic.

To calculate the distances between the Czech Republic and other countries, we first calculated centroids for each country (the geographic point at the center of gravity for a polygon that approximates each country’s borders) and then calculated the distance between the Czech Republic’s centroid coordinates and respective foreign countries’ centroids. As the distribution of these values showed great dispersion, we log-transformed them for the purposes of modeling.

Code
df_cc2 <- df_cc |>
  select(mx, cc_iso)

sf::sf_use_s2(FALSE)
world <- ne_countries(returnclass = "sf", scale = "small")
world <- world |>
  mutate(centroids = st_centroid(geometry))
CZ_polygon <- world |>
  filter(iso_a2 == "CZ")
CZ_centroid <- CZ_polygon |>
  st_centroid(CZ_polygon)
world <- world |>
  mutate(CZ_centroid = CZ_centroid)
world <- world |>
  mutate(distances = st_distance(CZ_centroid,
                                 centroids,
                                 by_element = TRUE)
         ) |>
  select(iso_a3, name, distances) |>
  rename(cc_iso = iso_a3) |>
  st_drop_geometry()
world <- world |>
  mutate(cc_iso = countrycode(name,
                              origin      = "country.name",
                              destination = "iso3c")) |>
  drop_na()

tiny_countries <- ne_countries(type        = "tiny_countries",
                               returnclass = "sf")
tiny_countries <- tiny_countries |>
  mutate(CZ_centroid = CZ_centroid)
tiny_countries <- tiny_countries |>
  mutate(distances = st_distance(CZ_centroid,
                                 geometry,
                                 by_element = TRUE)) |>
  select(iso_a3, name, distances) |>
  st_drop_geometry() |>
  mutate(cc_iso = countrycode(name,
                              origin      = "country.name",
                              destination = "iso3c"))
world <- bind_rows(world, tiny_countries)
df_cc2 <- full_join(df_cc2, world, by = "cc_iso")

df_cc2 <- df_cc2 |>
  filter(cc_iso != "CZE") |>
  select(-iso_a3) |>
  drop_units()

Population size

We also hypothesized that domain holders are more likely to have mail servers in countries with higher populations, as mail service providers are more motivated to deploy their servers in such countries.

The values for population sizes were downloaded from The World Bank. We used the most recent values (year 2021) which we log-transformed.

Code
pop <- read.csv("CSV/POP.csv", sep = ",")
pop <- pop |>
  select(cc, X2021) |>
  rename(population = X2021,
         cc_iso     = cc)
df_cc2 <- full_join(df_cc2, pop, by = "cc_iso")

GDP per capita

Furthermore, we also hypothesized that GDP per capita might increase the count of domains with MX records pointing to mail servers located in foreign countries. Again, the idea was that such countries have a better network infrastructure and may thus be more attractive for server deployments.

Again, the values for GDP per capita were downloaded from The World Bank. We used the most recent values (year 2021, in US dollars) which we log-transformed.

Code
gdp <- read.csv("CSV/GDP.csv", sep = ",")
gdp <- gdp |>
  select(cc, X2021) |>
  rename(gdp    = X2021,
         cc_iso = cc)
df_cc2 <- full_join(df_cc2, gdp, by = "cc_iso")

Export and import value

However, the size of Czech export to foreign countries and foreign countries’ import from the Czech Republic might provide a more granular and trading-network-reflective proxies for international connections than GDP per capita, geographic distance, and population size. Therefore, we hypothesized that the value of Czech Republic’s export to foreign countries would associate with the count of domains that use mail servers operated from abroad. Similarly, we hypothesized that the value of foreign countries’ import from the Czech Republic would associate with the count of these domains.

The values for the size of Czech export and import were downloaded from The Observatory of Economic Complexity (OEC). These values are proxies for a direct trading relationship between the Czech Republic and a given country. We used the most recent freely available values (year 2021) which we log-transformed.

Code
export <- read.csv("CSV/EXPORT.csv", sep = ",")
export <- export |>
  mutate(ISO.3 = replace(ISO.3,
                         Country == "Chinese Taipei",
                         "twn")) |>
  rename(cc_iso = ISO.3,
         export = Trade.Value) |>
  select(cc_iso, export) |>
  mutate(cc_iso = toupper(cc_iso))

df_cc2 <- full_join(df_cc2, export, by = "cc_iso")

import <- read.csv("CSV/IMPORT.csv", sep = ",")
import <- import |>
  mutate(ISO.3 = replace(ISO.3,
                         Country == "Chinese Taipei",
                         "twn")) |>
  rename(cc_iso = ISO.3,
         import = Trade.Value) |>
  select(cc_iso, import) |>
  mutate(cc_iso = toupper(cc_iso))

df_cc2 <- full_join(df_cc2, import, by = "cc_iso")

Upon pairing all the predictor data with our original data on domains using MX records into one dataset, we obtained a number of rows with missing values.

First, we manually adjusted the data for the population size, GDP per capita, and geographic distance for larger countries, filling in the missing values.

Second, as our original dataset does not contain data on countries that do not operate any mail servers under CZ, we assumed that such countries have zero domains using MX records and replaced such missing values with zeros. Similarly, in the case of Czech export and foreign import, we assumed that countries not captured in the OEC datasets do not trade with the Czech Republic, hence we replaced such missing values with zeros.

Lastly, for small (mostly island) countries, we did not attempt to fill in the missing values. Therefore, these countries are omitted in the analysis (as regression models cannot handle rows with missing values) and can be inspected in Table 6 in the Appendix.

Code
df_cc2 <- df_cc2 |>
  mutate(population = replace(population,
                              cc_iso == "TWN",
                              23859912),
         gdp        = replace(gdp,
                              cc_iso == "TWN",
                              33186.34),
         gdp        = replace(gdp,
                              cc_iso == "VEN",
                              2071.6),
         gdp        = replace(gdp,
                              cc_iso == "KWT",
                              28884.3),
         gdp        = replace(gdp,
                              cc_iso == "TKM",
                              10111),
         gdp        = replace(gdp,
                              cc_iso == "YEM",
                              549.49),
         gdp        = replace(gdp,
                              cc_iso == "CUB",
                              7291),
         gdp        = replace(gdp,
                              cc_iso == "GRL",
                              54571),
         gdp        = replace(gdp,
                              cc_iso == "ERI",
                              614),
         gdp        = replace(gdp,
                              cc_iso == "PRK",
                              654),
         distances  = replace(distances,
                              cc_iso == "SYC",
                              7165900),
         distances  = replace(distances,
                              cc_iso == "HKG",
                              8750280),
         distances  = replace(distances,
                              cc_iso == "AND",
                              1330730),
         name       = replace(name,
                              cc_iso == "HKG",
                              "Hong Kong"),
         name       = replace(name,
                              cc_iso == "SYC",
                              "Seychelles")) |>
  mutate(mx        = replace_na(mx, 0),
         export    = replace_na(export,0),
         import    = replace_na(import, 0),
         distances = distances/1000) |>
  mutate(distance_log   = log(distances + 1),
         population_log = log(population + 1),
         gdp_log        = log(gdp + 1),
         export_log     = log(export + 1),
         import_log     = log(import +1)) |>
  dplyr::distinct(cc_iso, .keep_all = TRUE)

df_cc_NAs <- 
  df_cc2 |>
  mutate(mx     = replace_na(mx, 0),
         export = replace_na(export, 0))
df_cc_NAs <- df_cc_NAs[!complete.cases(df_cc_NAs),]

df_cc2 <- df_cc2 |> drop_na()

Passive properties or active channels?

While all of the proposed predictors showed some correlations between each other (see Figure 3 below), they theoretically differ in how they function as proxies for domain holders’ tendency for setting up their mail services outside of the Czech Republic. While geographic distance, population size, and GDP per capita are properties describing respective countries, the values of export and import describe the interaction between the Czech Republic and given countries. In other words, while there is no “human design” behind geographic distance or population size of foreign countries in respect to mail servers indicated in MX records of second-level domains under CZ, both of these variables could influence these counts as a passive side-effect of geographic closeness or high-enough population numbers. Similarly, GDP per capita of foreign countries is an index not directly connected to the Czech Republic but may influence the count of domains as a side-effect of foreign countries’ economic power.

In contrast, Czech export to foreign countries and foreign import from the Czech Republic can be conceptualized as intentionally directed and active networking channels. Therefore, the count of domains with MX records pointing to mail servers in given countries could increase as domain holders might spend resources on developing their own CZ infrastructure and operations in countries where trading with the Czech Republic is more intense.

Code
df_cc2 |>
  mutate(mx_log = log(mx + 1)) |>
  select(mx_log,
         distance_log,
         population_log,
         gdp_log,
         export_log,
         import_log) |>
  drop_units() |>
  drop_na() |>
  cor() |>
  corrplot(method      = "color",
           type        = "lower",
           diag        = FALSE,
           addCoef.col = "Black")

Figure 3: Predictor correlations.

Results

Because the distribution of domain counts was over-dispersed and showed high zero-inflation (i.e., there were many countries with no mail servers indicated in the domains’ MX records), we used zero-inflated negative binomial distribution to model both the count of domains with MX records and the proportion of zeros.

Zero-inflated negative binomial regression can be used for modeling count variables with excessive zeros (i.e., zero-inflation) where the distribution of the count response variable also shows overdispersion (i.e., its variability is greater than would be expected in more classical models like Poisson). Furthermore, the excess zeros can be conceptualized as generated by a separate process from the count values and can be, therefore, modeled independently from the count of the response variable.

Models 1-3

Code
m1 <-
  brm(
    bf(
      mx ~ 
        log(distances + 1) +
        log(population + 1) +
        log(gdp + 1) +
        log(export + 1) +
        log(import + 1),
      zi ~
        log(distances + 1) +
        log(population + 1) +
        log(gdp + 1) +
        log(export + 1) +
        log(import + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
saveRDS(m1, file = "m1.rds")
Code
m2 <-
  brm(bf(mx ~ log(distances + 1),
         zi ~ log(distances + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
saveRDS(m2, file = "m2.rds")
Code
m3 <-
  brm(bf(mx ~ log(population + 1),
         zi ~ log(population + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
saveRDS(m3, file = "m3.rds")

In model m1, we used all five predictors to estimate the count of domains with MX records pointing to mail servers located in foreign countries and the probability that no mail servers are located in the respective countries. However, as we observed severe issues with multicollinearity (when predictors correlate with each other highly, the resulting model parameters become unreliable), we do not discuss model m1 (for the code and results, see the Appendix), and focus on modeling each predictor separately.

For the sake of brevity, we also refer the reader to the Appendix for the results of models m2 (geographic distance) and m3 (population size). While they found associations with the count of domains and their zero-inflation, they performed less efficiently than other models (see below) and found weaker effects than model m4 (GDP per capita).

Model 4: GDP per capita

Code
m4 <-
  brm(bf(mx ~ log(gdp + 1),
         zi ~ log(gdp + 1)),
    family = zero_inflated_negbinomial(),
    data   = df_cc2,
    prior  = c(prior(normal(0,2), class = Intercept),
               prior(normal(0,2), class = b)),
    chains = 4,
    iter   = 4000,
    warmup = 1000,
    cores  = parallel::detectCores())
saveRDS(m4, file = "m4.rds")
Code
summary(m4)
 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = logit 
Formula: mx ~ log(gdp + 1) 
         zi ~ log(gdp + 1)
   Data: df_cc2 (Number of observations: 188) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept      -16.15      2.41   -20.65   -11.13 1.00     6951     6444
zi_Intercept    15.49      5.52     6.00    27.51 1.00     6252     4895
loggdpP1         2.41      0.25     1.89     2.89 1.00     7249     7126
zi_loggdpP1     -1.99      0.69    -3.53    -0.86 1.00     5907     4392

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.11      0.02     0.08     0.15 1.00     7377     6768

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The results summary table above presents a great deal of information. However, only a part of the section titled Population-Level Effects is usually of prime interest.

First, the reader should focus on the columns Estimate (mean of the posterior distribution for respective predictors), l-95% CI (lower bound of the 95% credible interval), and u-95% CI (upper bound of the 95% credible interval). As noted above, these values help us interpret the parameters’ posterior distributions.

  • The mean estimate describes the central tendency of the parameter and can be interpreted as the value where the association between the predictor and the response variable probably lies. Then, it represents how much the values of the response change depending on the values of the predictor. Note that this value is not a single point-estimate, but a mean of many possible regression lines that the model fitted on the data.

  • Furthermore, we are interested in the 95% credible intervals (95% CIs) which capture where 95% of parameters’ posterior distribution lie. CIs are described by their lower (l-95% CI) and upper (u-95% CI) bound (i.e., 95% of the parameter’s distribution lie somewhere between its lower and upper bound). When looking at credible intervals, we are interested whether they contain zero—if they do, we can conclude that there is no relationship between the variables; if they don’t, there possibly is a relationship. For example, a parameter with mean M = 2.41 and 95% credible interval between 1.90 and 2.88 does not contain zero and therefore suggests a positive relationship between the response and predictor variables. However, a parameter with mean M = 2.41 and 95% credible interval between -1.90 and 2.88 contains zero, suggesting a null effect.

Then, in the same section (Population-Level Effects), we are mostly interested in rows describing the parameters for the investigated predictors (here labelled as loggdpP1 and zi_loggdpP1). These two describe the respective relationships between the response and predictor variables. For loggdpP1, we see that the effect is positive as the 95% CI lies above zero, suggesting that GDP per capita increases the count of domains. For zi_loggdpP1 (the zi stand for zero-inflation), we observe a negative effect as the 95% CI lies below zero, suggesting that GDP per capita decreases the probability of observing zero domains with MX records.

However, the reported parameter values are not intuitively interpretable as the predictors have been log-transformed in the formula of the model, the model utilized a log link function to map the predictor values on the response values, and a logit link function to map the predictor values on the zero-inflation probability. In the case of predictors for the count of domains, we can interpret the results as a percent increase. For example, in model m4, 1% increase in GDP per capita associates with 2.4% increase in the count of domains with MX records. For any x percent increase, one has to calculate 1.x to the power of the coefficient, subtract 1, and multiply by 100. For example, a 30% increase in the GDP per capita results in 87.7% increase in the count of domains ((1.30^2.40 - 1) * 100 = 87.7).

Code
m4_plot <- 
  plot(
    conditional_effects(m4,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "mu"),
    points = TRUE,
    plot   = FALSE) [[1]] +
  ylim(0,1000000) +
  ylab("Domain count") +
  scale_y_continuous(labels = scales::label_number(scale_cut = cut_short_scale())) +
  xlab("GDP per capita (USD)") +
  scale_x_continuous(labels = scales::label_number(scale_cut = cut_short_scale())) +
  theme(text = element_text(size = 20))
m4_plot_zi <- 
  plot(
    conditional_effects(m4,
                        ndraws    = 100,
                        spaghetti = TRUE,
                        dpar      = "zi"),
    plot   = FALSE) [[1]] +
  ylim(0,1) +
  ylab("Zero-inflation probability") +
  xlab("GDP per capita (USD)") +
  scale_x_continuous(labels = scales::label_number(scale_cut = cut_short_scale())) +
  theme(text = element_text(size = 20))

m4_plot_zi
m4_plot