QNAME minimisation

DNS Query Name Minimisation (also called QNAME minimisation) is a technique to improve DNS privacy. It was introduced in experimental RFC 7816 [1], which was recently obsoleted by RFC 9156 [2]. A resolver using QNAME minimisation, when sending a query to the upstream DNS server, uses a truncated (“minimised”) query name. As a result, the original query name and type are not disclosed to the upstream server, so the user privacy is protected.

The goal of this study was to evaluate the number of QNAME-minimising resolvers in .CZ DNS ecosystem.

Behaviour of minimising resolvers

What does it mean to use minimised query names? For example, consider a resolver wanting to get the IPv4 address for www.akademie.nic.cz. For simplicity, let’s assume that it already has records for .CZ DNS servers in its cache. If the resolver is minimising, it would first query .CZ servers for a minimised name (i.e. for domain name nic.cz and NS or A type). In contrast, a non-minimising resolver would send a type-A query for www.akademie.nic.cz, and thus disclose the full domain name to the .CZ server.

Exact behaviour of resolvers may vary depending on the software implementation. For instance, some of the DNS resolvers switch to the full QNAME after getting NXDOMAIN answer for a query with minimised QNAME. This is kind of a fallback to address the DNS servers that don’t properly handle Empty Non-Terminals and send an incorrect NXDOMAIN answer instead of NODATA (see Section 3 in [1]). These differences make passive analysis complicated.

Setup

A common trait of all minimising resolvers is that they should first try to query for a minimised name. In DNS traffic traces from .CZ servers we observe millions of distinct query names daily. Investigating exact query order for each pair of source IP address and query name would require huge resources, and there would still be no guarantee to get precise results (see Remarks section). Therefore, we adopted a different approach.

In order to detect whether a DNS resolver uses QNAME minimisation, we developed a machine learning classifier. For each resolver observed by .CZ DNS servers we compute the following specific features:

  • percentage of NOERROR answers
  • percentage of NOERROR answers for A records
  • percentage of NOERROR answers for NS records
  • percentage of NOERROR answers for NS records with QNAME consisting of 2 labels
  • percentage of queries for NS records with QNAME consisting of 2 labels
  • number of queries per domain
  • number of distinct QTYPEs for NOERROR answers
  • average number of QNAME labels

We used these features as an input for our classifier.

To get the ground truth data for our model, we used RIPE Atlas. Each RIPE Atlas probe was employed to query its local resolver for A record for the following (nonexistent) domain name:

l5.l4.l3.niemanapewnotakiejdomeny001.cz

We subsequently analysed the corresponding DNS traffic captured on .CZ servers. If the first query from a resolver related to that domain was for the full domain name l5.l4.l3.niemanapewnotakiejdomeny001.cz and query type A, then this resolver doesn’t use QNAME minimisation. On the opposite side, the first query from a minimising resolver was supposed to be one of the following variants:

  1. domain name niemanapewnotakiejdomeny001.cz and type NS
  2. domain name niemanapewnotakiejdomeny001.cz and type A
  3. domain name _.niemanapewnotakiejdomeny001.cz and type A.

With RIPE Atlas probes we were able to gather data about 8,160 resolvers, 512 of them were considered to do NS-based QNAME minimisation (variant 1 above) and 2,700 A-based QNAME minimisation (variants 2 and 3). For either case we developed a machine learning model. In the course of our work we found out that Google Public DNS resolvers might change IP address during the name resolving process (see: Behaviour of public resolvers section). Therefore, we filtered out Google Public DNS addresses from our ground truth dataset.

In order to train the ML model, we only selected resolvers for which we captured more than 100 queries per day on .CZ servers (on 21 July 2021). As a result, with the testing set we reached 100% accuracy for NS-based model and 90% accuracy for A-based model.

Results

We analysed DNS traffic captured on .CZ DNS servers in the last week of July 2021 (from Sunday 25th to Saturday 31st). Each resolver observed by our servers was labelled by one of the following classes:

  • low traffic — less than 10 DNS queries were observed from that resolver (too few to make predictions)
  • non qmin — a resolver was not doing QNAME minimisation
  • qmin — a resolver was doing QNAME minimisation

Classification results are shown in Figure 1. Around 16.8% of resolvers were classified as doing QNAME minimisation (1.4% NS-based and 15.4% A-based).

Numbers of resolvers classified in each category during the test period

However, if we sum up the queries for each prediction class it turns out that around 54% of queries were originated from resolvers which do QNAME minimisation (Figure 2).

Numbers of queries received from resolvers in each category during the test period

Behaviour of public resolvers

Behaviour of popular public DNS resolvers can be easily determined by active measurements. A query for non-existing .CZ domain sent to a public DNS resolver triggers DNS queries to .CZ authoritative DNS servers. By analysing the traffic captured on these servers we are able to investigate behaviour of selected public DNS resolvers.

We tested 8.8.8.8 (Google), 1.1.1.1 (Cloudflare) and 193.17.47.1 (CZ.NIC ODVR) and we found out that all these resolvers were doing QNAME minimisation but each in a different manner.

In a prior study, Huston [3] considered Google Public DNS not to do QNAME minimisation. It is possible that this feature was not deployed at that time. In our study, however, most of the traffic from Google network was originated from resolvers classified as qmin. In order to verify our classifier prediction, we queried 8.8.8.8 for l5.l4.l3.definitelynosuchdomain6378.cz A. Our experiment confirmed that QNAME minimisation was performed. It was also revealed that the resolver IP address may change during the domain resolving process, which is another complication for the classification. Below there is a sequence of DNS queries for definitelynosuchdomain6378.cz captured on .CZ servers. We can see that the initial query from Google resolver was for for minimised QNAME (definitelynosuchdomain6378.cz NS), but after getting NXDOMAIN answer Google server promptly queried with non-minimised QNAME. The source IP address was different for the second query. We could possibly aggregate IP addresses into blocks, but this method wouldn’t work for a case when the resolver changes between IPv4 and IPv6 addresses.

       time        |           src            |  server   |                 qname                  | qtype | rcode 
-------------------+--------------------------+-----------+----------------------------------------+-------+-------
 1636448337.466004 | 2a00:1450:4001:c1b::101  | decix-cz  | definitelynosuchdomain6378.cz          |     2 |     3 
 1636448337.517988 | 2a00:1450:4001:c15::105  | decix-cz  | l5.l4.l3.definitelynosuchdomain6378.cz |     1 |     3 
  

Cloudflare public DNS demonstrated different behaviour. QNAME minimisation seemed to be deployed in a “strict” mode there: after getting NXDOMAIN from .CZ authoritative servers, Cloudflare resolver(s) didn’t sent a query with non-minimised QNAME. We queried 1.1.1.1 for l5.l4.l3.qhu29z3dx9avnbvkha2robs5.cz A and captured just one query for this domain on .CZ authoritative servers:

       time        |           src            |  server   |                 qname                  | qtype | rcode 
-------------------+--------------------------+-----------+----------------------------------------+-------+-------
 1636448941.918191 | 141.101.95.41            | cra-cz    | qhu29z3dx9avnbvkha2robs5.cz            |     1 |     3 

CZ.NIC ODVR demonstrated behaviour similar to Google. Besides the IP address change it was also observed that the second (non-minimised) query ended up at a different server:

       time        |           src            |  server   |                 qname                  | qtype | rcode
-------------------+--------------------------+-----------+----------------------------------------+-------+-------
 1636449080.394062 | 217.31.204.137           | cra-cz    | S3fmT2nWrvcc47343KheK7BA.cZ            |     2 |     3 
 1636449080.401786 | 2001:1488:800:400::2:137 | cecolo-cz | l5.L4.L3.s3fmt2NwRvcc47343Khek7Ba.CZ   |     1 |     3

Conclusions

In this report we demonstrated that the our ML-based approach is a reliable method for passive detection of QNAME-minimising resolvers. However, under certain circumstances it is not possible to clearly identify whether a resolver is supporting QNAME minimisation. In particular, the prediction may be distorted if:

  • there are many NS/A queries for domain names of 2 labels (typical for scanners, may be also triggered by a user behind a resolver)
  • there are many DNS resolvers behind one IP address
  • there is spoofed DNS traffic for a given IP address
  • there are too few queries from a resolver

However, we believe that 90% accuracy achieved by our classifier was enough to make inferences.

Our study revealed that QNAME minimisation is widely deployed among many big DNS players. QNAME minimisation is a relatively new feature, but popular DNS software mostly supports it in the newest version, although sometimes it needs to be explicitly activated in configuration. Knot Resolver, a state-of-the-art DNS resolver developed by CZ.NIC, provides it by default.

References

1.
Stéphane Bortzmeyer. 2016. DNS Query Name Minimisation to Improve Privacy. https://doi.org/10.17487/RFC7816
2.
Stéphane Bortzmeyer, Ralph Dolmans, and Paul E. Hoffman. 2021. DNS Query Name Minimisation to Improve Privacy. https://doi.org/10.17487/RFC9156
3.
Geoff Huston. 2020. DNS query privacy revisited. Retrieved from https://blog.apnic.net/2020/09/11/dns-query-privacy-revisited/
Other ADAM reports » Další reporty »
© CZ.NIC, z.s.p.o., 
ADAM is an R&D project that tries to get the most of the big data generated by DNS and other services operated by CZ.NIC.
Projekt ADAM se snaží vytěžit maximum z dat získávaných z DNS a dalších služeb provozovaných sdružením CZ.NIC.