Show that if $\mathcal{H}$ is PAC learnable in the standard one-oracle model, then $\mathcal{H}$ is PAC learnable in the two-oracle model

This is a question $$9.1$$ from Understanding Machine Learning Chapter 3. It goes like this:

Consider a variant of the PAC model in which there are two example oracles: one that generates positive examples and one that generates negative examples, both according to the underlying distribution $$\mathcal{D}$$ on $$\mathcal{X}$$. Formally, given a target function $$f : \mathcal{X} \to {0,1}$$, let $$\mathcal{D}^+$$ be the distribution over $$\mathcal{X}^+ = \{x \in \mathcal{X}: f(x) = 1\}$$ defined by $$\mathcal{D}^+(A) = \frac{\mathcal{D}(A)}{\mathcal{D}(X^+)}$$, for every $$A \subset \mathcal{X}$$. Similary $$\mathcal{D}^-$$ is the distribution over $$\mathcal{X}^{-}$$ induced by $$\mathcal{D}$$.

The definition of PAC learnability in the two-oracle model is the same as the standard definition of PAC learnability except that here the learner has access to $$m^{+}_{\mathcal{H}}(\epsilon, \delta)$$ i.i.d. examples from $$\mathcal{D}^+$$ and $$m^{-}_{\mathcal{H}}(\epsilon, \delta)$$ i.i.d. examples from $$\mathcal{D}^{-}$$. The learner’s goal is to output $$h$$ s.t. with probability at least $$1-\delta$$ (over the choice of the two training sets, and possibly over the nondeterministic decisions made by the learning algorithm), both $$L_{(\mathcal{D}^+,f)}(h) \leq \epsilon$$ and $$L_{(\mathcal{D}^−,f)}(h) \leq \epsilon$$

I am trying to prove that if $$\mathcal{H}$$ is PAC learnable in the standard one-oracle model, then $$\mathcal{H}$$ is PAC learnable in the two-oracle model. My attempt so far:

Note that $$L_{(D,f)}(h) = \mathcal{D}(\mathcal{X}^+)L_{(\mathcal{D}^+,f)}(h) + \mathcal{D}(\mathcal{X^{-}})L_{(\mathcal{D}^-,f)}(h).$$ Let $$d = min \{ \mathcal{D^+}, \mathcal{D^-}\}$$, then if $$m\geq m_\mathcal{H}(\epsilon d, \delta)$$, then it is clear that: $$\mathbb{P}[L_{(D,f)}(h)\leq \epsilon d] \geq 1-\delta \implies \mathbb{P}[L_{(D^+,f)}(h)\leq \epsilon] \geq 1-\delta$$ And, $$\mathbb{P}[L_{(D,f)}(h)\leq \epsilon d] \geq 1-\delta \implies \mathbb{P}[L_{(D^-,f)}(h)\leq \epsilon] \geq 1-\delta$$

So we know that if we have $$m\geq m_{\mathcal{H}}(\epsilon d, \delta)$$ samples drawn iid from $$\mathcal{D}$$, then we can guarantee $$\mathbb{P}[L_{(D^+,f)}(h)\leq \epsilon] \geq 1-\delta$$ and $$\mathbb{P}[L_{(D^-,f)}(h)\leq \epsilon] \geq 1-\delta$$.

How do I choose $$m_{\mathcal{H}}^+(\epsilon, \delta)$$ and $$m_{\mathcal{H}}^-(\epsilon, \delta)$$ such that if we have $$m^+ \geq m_{\mathcal{H}}^-(\epsilon, \delta)$$ samples iid according to $$\mathcal{D}^+$$ and $$m^- \geq m_{\mathcal{H}}^-(\epsilon, \delta)$$ drawn iid according to $$\mathcal{D}^{-}$$, then we can guarantee $$\mathbb{P}[L_{(D^+,f)}(h)\leq \epsilon] \geq 1-\delta$$ and $$\mathbb{P}[L_{(D^-,f)}(h)\leq \epsilon] \geq 1-\delta$$?

When is drawing $$m^+$$ samples according to $$\mathcal{D}^+$$ and $$m^{-}$$ samples according to $$\mathcal{D}^-$$ the same as drawing $$(m^+ + m^-)$$ samples according to $$\mathcal{D}$$?