I am new in this field. In the book Foundations of Machine Learning by Mehryar Mohri, Afshi Rostamizadeh, and Ameet Talwalkar. The authors define the generalized error of a hypothesis as

$ R(h) = \Pr_{x \sim D}[h(x) \neq c(x)] = E_{x \sim D}[1_{h(x) \neq c(x)}]$

which is a number. However in the Definition 2.3 a concep class $ C$ is said to be PAC-learnable if

$ Pr_{S\sim D^{m}}[R(h_{S}) \leq \varepsilon ] \geq 1 – \delta$

for any $ m \geq poly(1/\varepsilon,1/\delta,n, size(c)) $ . Since by definition $ R(h) = \Pr_{x \sim D}[h(x) \neq c(x)]$ is a number, What does this mean $ Pr_{S\sim D^{m}}[R(h_{S}) \leq \varepsilon ] $ and $ h_{S}$ in formal terms? The authors explain that $ h_{S}$ depends on the sample $ S$ , but the question is how?

Another reason I have this question is due to the fact that every $ h$ is defined as a function $ h: \mathcal{X} \to \mathcal{Y}$ and the distribution $ D$ is defined on $ \mathcal{X}$ , not on the *hypothesis set* $ H$ where for each $ h \in \mathcal{H}$ , $ h: \mathcal{X} \to \mathcal{Y}$ .