If an agent has a few middle Tor relays (Am) and a few exit Tor relays (Ae), could they obtain the original traffic of some of the circuits with a reasonable probability?
Let’s assume, without too much loss of generality, that Tor only uses middle-middle-exit circuits and that there are M middle relays and E exit relays.
The probability of such a circuit consisting only of nodes this agent controls then is:
P = Am/M * (Am - 1)/(M - 1) * Ae/E
According to Tor Metrics, there are just short of 7000 relays in total, with almost 2000 being exit relays. I will round these figures up into 7000 – 2000 = 5000 middle relays and 2000 exit relays.
Assuming the attacker owns 10 middle relays and 10 exit relays, the probability of them getting to control the whole circuit is
P = 10/5000 * 9/4999 * 10/2000 ~= 1.8e-8
which is very low. However, once you factor in the enormous amount of Tor circuits being established (could not find a reliable figure anywhere, will gladly edit one in if someone has it), wouldn’t this agent be able to consistently get complete circuits through their relays and, as a consequence, have complete access to the data it was relaying?
I understand that some of the data through the circuits would also be using TLS, but at least some of it should be plaintext.
It may also be worth pointing out that if this is a really well-funded agent, they might have substantially more than 20 relays at their disposal.