Show that if $\mathcal{H}$ is PAC learnable in the standard one-oracle model, then $\mathcal{H}$ is PAC learnable in the two-oracle model


This is a question $ 9.1$ from Understanding Machine Learning Chapter 3. It goes like this:

Consider a variant of the PAC model in which there are two example oracles: one that generates positive examples and one that generates negative examples, both according to the underlying distribution $ \mathcal{D}$ on $ \mathcal{X}$ . Formally, given a target function $ f : \mathcal{X} \to {0,1}$ , let $ \mathcal{D}^+$ be the distribution over $ \mathcal{X}^+ = \{x \in \mathcal{X}: f(x) = 1\}$ defined by $ \mathcal{D}^+(A) = \frac{\mathcal{D}(A)}{\mathcal{D}(X^+)}$ , for every $ A \subset \mathcal{X}$ . Similary $ \mathcal{D}^-$ is the distribution over $ \mathcal{X}^{-}$ induced by $ \mathcal{D}$ .

The definition of PAC learnability in the two-oracle model is the same as the standard definition of PAC learnability except that here the learner has access to $ m^{+}_{\mathcal{H}}(\epsilon, \delta)$ i.i.d. examples from $ \mathcal{D}^+$ and $ m^{-}_{\mathcal{H}}(\epsilon, \delta)$ i.i.d. examples from $ \mathcal{D}^{-}$ . The learner’s goal is to output $ h$ s.t. with probability at least $ 1-\delta$ (over the choice of the two training sets, and possibly over the nondeterministic decisions made by the learning algorithm), both $ L_{(\mathcal{D}^+,f)}(h) \leq \epsilon$ and $ L_{(\mathcal{D}^−,f)}(h) \leq \epsilon$

I am trying to prove that if $ \mathcal{H}$ is PAC learnable in the standard one-oracle model, then $ \mathcal{H}$ is PAC learnable in the two-oracle model. My attempt so far:

Note that $ $ L_{(D,f)}(h) = \mathcal{D}(\mathcal{X}^+)L_{(\mathcal{D}^+,f)}(h) + \mathcal{D}(\mathcal{X^{-}})L_{(\mathcal{D}^-,f)}(h).$ $ Let $ d = min \{ \mathcal{D^+}, \mathcal{D^-}\}$ , then if $ m\geq m_\mathcal{H}(\epsilon d, \delta)$ , then it is clear that: $ $ \mathbb{P}[L_{(D,f)}(h)\leq \epsilon d] \geq 1-\delta \implies \mathbb{P}[L_{(D^+,f)}(h)\leq \epsilon] \geq 1-\delta$ $ And, $ $ \mathbb{P}[L_{(D,f)}(h)\leq \epsilon d] \geq 1-\delta \implies \mathbb{P}[L_{(D^-,f)}(h)\leq \epsilon] \geq 1-\delta$ $

So we know that if we have $ m\geq m_{\mathcal{H}}(\epsilon d, \delta)$ samples drawn iid from $ \mathcal{D}$ , then we can guarantee $ \mathbb{P}[L_{(D^+,f)}(h)\leq \epsilon] \geq 1-\delta$ and $ \mathbb{P}[L_{(D^-,f)}(h)\leq \epsilon] \geq 1-\delta$ .

How do I choose $ m_{\mathcal{H}}^+(\epsilon, \delta)$ and $ m_{\mathcal{H}}^-(\epsilon, \delta)$ such that if we have $ m^+ \geq m_{\mathcal{H}}^-(\epsilon, \delta)$ samples iid according to $ \mathcal{D}^+$ and $ m^- \geq m_{\mathcal{H}}^-(\epsilon, \delta)$ drawn iid according to $ \mathcal{D}^{-}$ , then we can guarantee $ \mathbb{P}[L_{(D^+,f)}(h)\leq \epsilon] \geq 1-\delta$ and $ \mathbb{P}[L_{(D^-,f)}(h)\leq \epsilon] \geq 1-\delta$ ?

When is drawing $ m^+$ samples according to $ \mathcal{D}^+$ and $ m^{-}$ samples according to $ \mathcal{D}^-$ the same as drawing $ (m^+ + m^-)$ samples according to $ \mathcal{D}$ ?