Abstract
In this article we describe our measurements and analysis of QNAME minimisation support in .CZ DNS ecosystem. We present a new method for classifying QNAME-minimising resolvers based on machine learning and discuss its advantages and possible problems.DNS Query Name Minimisation (also called QNAME minimisation) is a technique to improve DNS privacy. It was introduced in experimental RFC 7816 [1], which was recently obsoleted by RFC 9156 [2]. A resolver using QNAME minimisation, when sending a query to the upstream DNS server, uses a truncated (“minimised”) query name. As a result, the original query name and type are not disclosed to the upstream server, so the user privacy is protected.
The goal of this study was to evaluate the number of QNAME-minimising resolvers in .CZ DNS ecosystem.
What does it mean to use minimised query names? For example, consider
a resolver wanting to get the IPv4 address for
www.akademie.nic.cz
. For simplicity, let’s assume that it
already has records for .CZ DNS servers in its cache. If the resolver is
minimising, it would first query .CZ servers for a minimised name
(i.e. for domain name nic.cz
and NS
or
A
type). In contrast, a non-minimising resolver would send
a type-A
query for www.akademie.nic.cz
, and
thus disclose the full domain name to the .CZ server.
Exact behaviour of resolvers may vary depending on the software
implementation. For instance, some of the DNS resolvers switch to the
full QNAME after getting NXDOMAIN
answer for a query with
minimised QNAME. This is kind of a fallback to address the DNS servers
that don’t properly handle Empty
Non-Terminals and send an incorrect NXDOMAIN
answer
instead of NODATA
(see Section
3 in [1]). These differences make
passive analysis complicated.
A common trait of all minimising resolvers is that they should first try to query for a minimised name. In DNS traffic traces from .CZ servers we observe millions of distinct query names daily. Investigating exact query order for each pair of source IP address and query name would require huge resources, and there would still be no guarantee to get precise results (see Remarks section). Therefore, we adopted a different approach.
In order to detect whether a DNS resolver uses QNAME minimisation, we developed a machine learning classifier. For each resolver observed by .CZ DNS servers we compute the following specific features:
NOERROR
answersNOERROR
answers for A
recordsNOERROR
answers for NS
recordsNOERROR
answers for NS
records with QNAME consisting of 2 labelsNS
records with QNAME
consisting of 2 labelsNOERROR
answersWe used these features as an input for our classifier.
To get the ground truth data for our model, we used RIPE Atlas. Each RIPE Atlas probe was
employed to query its local resolver for A
record for the
following (nonexistent) domain name:
l5.l4.l3.niemanapewnotakiejdomeny001.cz
We subsequently analysed the corresponding DNS traffic captured on
.CZ servers. If the first query from a resolver related to that
domain was for the full domain name
l5.l4.l3.niemanapewnotakiejdomeny001.cz
and query type
A
, then this resolver doesn’t use QNAME minimisation. On
the opposite side, the first query from a minimising resolver was
supposed to be one of the following variants:
niemanapewnotakiejdomeny001.cz
and type
NS
niemanapewnotakiejdomeny001.cz
and type
A
_.niemanapewnotakiejdomeny001.cz
and type
A
.With RIPE Atlas probes we were able to gather data about 8,160 resolvers, 512 of them were considered to do NS-based QNAME minimisation (variant 1 above) and 2,700 A-based QNAME minimisation (variants 2 and 3). For either case we developed a machine learning model. In the course of our work we found out that Google Public DNS resolvers might change IP address during the name resolving process (see: Behaviour of public resolvers section). Therefore, we filtered out Google Public DNS addresses from our ground truth dataset.
In order to train the ML model, we only selected resolvers for which we captured more than 100 queries per day on .CZ servers (on 21 July 2021). As a result, with the testing set we reached 100% accuracy for NS-based model and 90% accuracy for A-based model.
We analysed DNS traffic captured on .CZ DNS servers in the last week of July 2021 (from Sunday 25th to Saturday 31st). Each resolver observed by our servers was labelled by one of the following classes:
low traffic
— less than 10 DNS queries were observed
from that resolver (too few to make predictions)non qmin
— a resolver was not doing QNAME
minimisationqmin
— a resolver was doing QNAME minimisationClassification results are shown in Figure 1. Around 16.8% of resolvers were classified as doing QNAME minimisation (1.4% NS-based and 15.4% A-based).
However, if we sum up the queries for each prediction class it turns out that around 54% of queries were originated from resolvers which do QNAME minimisation (Figure 2).
The autonomous system 15169 (GOOGLE) was the biggest source of DNS queries from resolvers doing QNAME minimisation, in total 85.46% queries from this network were originated from resolvers doing QNAME minimisation. Similar results (80-95%) were observed for other big DNS players. Top 10 networks by the number of queries from resolvers doing QNAME minimisation is presented in Table 1.
# | ASN | Total queries | % of queries from qmin resolvers |
---|---|---|---|
1 | AS15169 GOOGLE | 1,513,038,645 | 85.46% |
2 | AS13335 CLOUDFLARENET | 222,833,315 | 99.56% |
3 | AS44489 Starnet STARNET, s.r.o. | 207,547,217 | 99.38% |
4 | AS24940 HETZNER-AS Hetzner Online GmbH | 245,411,334 | 80.67% |
5 | AS14061 DIGITALOCEAN-ASN | 185,057,592 | 81.17% |
6 | AS25192 CZNIC-AS CZ.NIC, z.s.p.o. | 156,284,099 | 92.05% |
7 | AS16509 AMAZON-02 | 451,613,224 | 28.88% |
8 | AS36692 OPENDNS | 132,486,055 | 92.37% |
9 | AS43037 SEZNAM-CZ Seznam.cz, a.s. | 108,605,818 | 99.99% |
10 | AS136907 HWCLOUDS-AS-AP HUAWEI CLOUDS | 109,438,289 | 98.64% |
Behaviour of popular public DNS resolvers can be easily determined by active measurements. A query for non-existing .CZ domain sent to a public DNS resolver triggers DNS queries to .CZ authoritative DNS servers. By analysing the traffic captured on these servers we are able to investigate behaviour of selected public DNS resolvers.
We tested 8.8.8.8
(Google), 1.1.1.1
(Cloudflare) and 193.17.47.1
(CZ.NIC ODVR) and we found out that
all these resolvers were doing QNAME minimisation but each in a
different manner.
In a prior study, Huston [3] considered
Google Public DNS not to do QNAME minimisation. It is possible that this
feature was not deployed at that time. In our study, however, most of
the traffic from Google network was originated from resolvers classified
as qmin
. In order to verify our classifier prediction, we
queried 8.8.8.8
for
l5.l4.l3.definitelynosuchdomain6378.cz A
. Our experiment
confirmed that QNAME minimisation was performed. It was also revealed
that the resolver IP address may change during the domain resolving
process, which is another complication for the classification. Below
there is a sequence of DNS queries for
definitelynosuchdomain6378.cz
captured on .CZ servers. We
can see that the initial query from Google resolver was for for
minimised QNAME (definitelynosuchdomain6378.cz NS
), but
after getting NXDOMAIN
answer Google server promptly
queried with non-minimised QNAME. The source IP address was different
for the second query. We could possibly aggregate IP addresses into
blocks, but this method wouldn’t work for a case when the resolver
changes between IPv4 and IPv6 addresses.
time | src | server | qname | qtype | rcode
-------------------+--------------------------+-----------+----------------------------------------+-------+-------
1636448337.466004 | 2a00:1450:4001:c1b::101 | decix-cz | definitelynosuchdomain6378.cz | 2 | 3
1636448337.517988 | 2a00:1450:4001:c15::105 | decix-cz | l5.l4.l3.definitelynosuchdomain6378.cz | 1 | 3
Cloudflare public DNS demonstrated different behaviour. QNAME
minimisation seemed to be deployed in a “strict” mode there: after
getting NXDOMAIN
from .CZ authoritative servers, Cloudflare
resolver(s) didn’t sent a query with non-minimised QNAME. We queried
1.1.1.1
for
l5.l4.l3.qhu29z3dx9avnbvkha2robs5.cz A
and captured just
one query for this domain on .CZ authoritative servers:
time | src | server | qname | qtype | rcode
-------------------+--------------------------+-----------+----------------------------------------+-------+-------
1636448941.918191 | 141.101.95.41 | cra-cz | qhu29z3dx9avnbvkha2robs5.cz | 1 | 3
CZ.NIC ODVR demonstrated behaviour similar to Google. Besides the IP address change it was also observed that the second (non-minimised) query ended up at a different server:
time | src | server | qname | qtype | rcode
-------------------+--------------------------+-----------+----------------------------------------+-------+-------
1636449080.394062 | 217.31.204.137 | cra-cz | S3fmT2nWrvcc47343KheK7BA.cZ | 2 | 3
1636449080.401786 | 2001:1488:800:400::2:137 | cecolo-cz | l5.L4.L3.s3fmt2NwRvcc47343Khek7Ba.CZ | 1 | 3
In this report we demonstrated that the our ML-based approach is a reliable method for passive detection of QNAME-minimising resolvers. However, under certain circumstances it is not possible to clearly identify whether a resolver is supporting QNAME minimisation. In particular, the prediction may be distorted if:
NS
/A
queries for domain
names of 2 labels (typical for scanners, may be also triggered by a user
behind a resolver)However, we believe that 90% accuracy achieved by our classifier was enough to make inferences.
Our study revealed that QNAME minimisation is widely deployed among many big DNS players. QNAME minimisation is a relatively new feature, but popular DNS software mostly supports it in the newest version, although sometimes it needs to be explicitly activated in configuration. Knot Resolver, a state-of-the-art DNS resolver developed by CZ.NIC, provides it by default.