I have already read PAC learning of axis-aligned rectangles and understand every other part of the example.

From *Foundations of Machine Learning* by Mohri, 2nd ed., p. 13 (book) or p. 30 (PDF), I am struggling to understand the following sentence of Example 2.4, which is apparently the result of a contrapositive argument:

… if $ R(\text{R}_S) > \epsilon$ , then $ \text{R}_S$ must miss at least one of the regions $ r_i$ , $ i \in [4]$ .

i.e., $ i = 1, 2, 3, 4$ . Could someone please explain why this is the case?

The way I see it is this: given $ \epsilon > 0$ , if $ R(\text{R}_S) > \epsilon$ , then $ \mathbb{P}_{x \sim D}(\text{R}\setminus \text{R}_S) > \epsilon$ . We also know from this stage of the proof that $ \mathbb{P}_{x \sim D}(\text{R}) > \epsilon$ as well. Beyond this, I’m not sure how the sentence above is reached.