First publication of this article on 18 November 2022
The fediverse is supposed to be decentralized. But the fact that the network has a decentralized architecture does not mean that it is perfectly decentralized in practice. We survey here the DNS authoritative servers for the domains used by the fediverse instances. Unfortunately, yes, they are too concentrated.
Mostly because of the incredibly childish behaviour of
Elon Musk, the current wave of migration from
Twitter to the
fediverse is much larger than the previous
ones. As a result, there is a renewed interest in the
fediverse, a system which is far from
recent. The fediverse is made of various
instances, each one being independently
administered, both from a technical point of view, and from a
political one. Nothing new or extraordinary, this is how the
Internet was architectured, and how its
services worked, before a few centralized US-based giants distorted
the vision of many people, making them believe that centralization
is normal. Each fediverse instance has a domain
name and this domain name is hosted on a set of
authoritative name servers, which will reply to
the DNS requests of
the clients, the DNS resolvers. As an example, the instance I use,
mastodon.gougere.fr
is in the domain
gougere.fr
, whose set of authoritative name
servers is currently composed of three
servers. Unix users can for instance
display them with dig :
% dig +short NS gougere.fr nsb.bookmyname.com. nsa.bookmyname.com. nsc.bookmyname.com.
Note you can also perform DNS queries on the fediverse itself, as explained here (see the current result).
A domain "works", for the users, only if the authoritative name servers reply, and reply properly. This makes them critical infrastructure for an instance. If your DNS authoritative servers fail, or lie, the instance is unusable, even if the Pleroma, Mastodon or other server still works fine. So, what are the DNS servers used for this critical role in the current fediverse?
To get data, I started from a list of current instances. I got one
from instances.social
. Note
that, on a decentralized network like the fediverse, you cannot get
an official and complete list of all instances. All lists are
imperfect. But I have to start from somewhere (other possibilities:
http://demo.fedilist.com/instance
or the public
list of peers of a big instance such as https://mastodon.social/api/v1/instance/peers
). So, I created an
authentication token on instances.social
, and
retrieved a list with curl:
% curl -s -H "Authorization: Bearer MY-TOKEN" https://instances.social/api/1.0/instances/list\?count=10000\&include_down=false > list.json
I included only live instances
(include_down=false
) but, still, some instance
names already were missing from the DNS.
The resulting list was in the JSON format and processed by two Python scripts (see the code later). The first one queried the DNS (with the excellent dnspython library) to get information such as the list of authoritative name servers and their IP addresses. The second script processed the JSON data of the first to output meaningful statistics. Of course, like with any quantitative survey, the hard problem is often which questions to ask, rather than the answers.
Let's start with the actual numbers:
%./nameservers-fediverse-analyze.py There are 1596 instances
Remember that the list is incomplete (no one knows how many instances
really live). Some of these instances are grouped into the same
registered domain (the domain you bought from a
registry, may be through a
registrar). I choosed to group the instance
names in the DNS zones (a set of contiguous names hosted on the same
set of name servers). For instance,
medievalist.masto.host
and
piano.masto.host
are in the same zone,
masto.host
, they have the same servers. There
are a bit less zones than instance names:
There are 1596 instances in 1540 zones. The largest zone, masto.host., encompasses 10 instances
Except masto.host
, there are few fediverse
hosters with such subdomains, so no sign of
concentration/centralisation here. There is no giant fediverse
hoster.
Now, the name servers themselves:
There are 2062 nameservers. The largest one, dns2.registrar-servers.com., hosts 85 instances.
Some name servers are more common, such as this
dns2.registrar-servers.com
, used by a big DNS
hoster. But it is still only for a small minority of instances.
But wait, name servers come in groups, called sets. The instance administrator, except if he or she is a DNS fan, typically does not cherry-pick his or her name servers, they use a set indicated by the DNS hoster. Let's study sets instead:
There are 951 nameserver sets. The largest one, dns1.registrar-servers.com.;dns2.registrar-servers.com., hosts 85 instances. The top 10 % hosts 677 instances.
There are less sets than instances (a sign of a small centralisation) but not by a large margin. The biggest set is not so big but we note that the top tenth of the sets hosts more than a third of the instances. There are not so many DNS providers.
But there is a catch. Some DNS hosters use a lot of names so you
may think there are many various sets but they actually depend on
one company, a bad thing for decentralisation. The typical example
is Cloudflare, with a lot of cool names
depending on the domain they host
(eleanor.ns.cloudflare.com
,
sue.ns.cloudflare.com
, etc). So, we have to
group the name servers by company. One of the simplest ways is to find
the registered domain of the name servers' names. To do so, we use
the Public Suffix List,
which is not authoritative and not perfect but sufficient for us. It
will put all the Cloudflare names into one bucket, the registered
domain cloudflare.com
:
There are 667 nameserver's domains. The largest one, cloudflare.com, hosts 366 instances. The top 10 % hosts 1324 instances
This time, we have a clear sign of centralisation. A lot of seemingly independent instances have a shared provider, Cloudflare. If Cloudflare breaks, or does evil things, it will affect many instances. And the top 10 % of hosters serves the overwhelming majority of fediverse instances. For your information, the biggest ones are:
cloudflare.com
: 366registrar-servers.com
: 91gandi.net
: 90domaincontrol.com
: 55googledomains.com
: 49digitalocean.com
: 48linode.com
: 39inwx.de
: 23inwx.eu
: 23
(It can be noticed that, unlike the fediverse itself, which uses a
lot of "new TLDs" such as
.social
, the name servers are faithful to the
old TLDs.) Playing with names, as seen in the Cloudflare case,
complicated things. We have also the case of
AWS which does not appear on the above list
but they should; they use a lot of registered domains
(awsdns-46.co.uk
,
awsdns-20.com
, etc) so they appear as many
different hosters. Also, about Cloudflare, remember that this
survey is only about the DNS: I did not try to
see where the server instance is hosted. I repeat: just DNS, no HTTP
was used or considered here. (One also could note that some
authoritative name servers are installed at one big machine hoster,
which gives Cloudflare an even
more prominent place.)
So, as you can see, but this is hardly a surprise, the fediverse is not perfect and there are signs of centralisation, at least as far as DNS is concerned.
The two programs used for this article are:
nameservers-fediverse-gather.py
:
it gathers information from the DNS and produces a JSON file
storing, for each instance, its DNS zone, the name servers and
their IP addresses (which were not used for this article). To find
the DNS zone, it just climbs the DNS tree until it founds
name server records.nameservers-fediverse-analyze.py
:
it reads the JSON file produced by the first program and displays
statistics.Among the weaknesses of this analysis, you'll note that our
instances are considered equal. But there are very small and very big
instances. It could be useful to tweak results depending on the size
of the instance. instances.social
gives us this
information but it has to be handled with care, it is just a declaration
from the instance (and many accounts may be inactive, anyway).
Version PDF de cette page (mais vous pouvez aussi imprimer depuis votre navigateur, il y a une feuille de style prévue pour cela)
Source XML de cette page (cette page est distribuée sous les termes de la licence GFDL)