In a machine learning system, why use differentially private SGD if our input data is already perturbed by a DP mechanism?

I’m trying to implement my own version of a deep neural network with differential privacy to preserve the privacy of the parties involved in the training dataset.

I’m using the method by Abadi et al. proposed in their seminal paper Deep Learning with Differential Privacy as the basis of my implementation. Now I have trouble understanding one thing in this paper. In their method, they propose a differentially private SGD optimisation function and they use an accountant to keep their privacy budget expenditure during each iteration. All of this makes sense: every time you query the data, you need to add controlled noise to it to mitigate the risk of leakage. But before they begin the training process, they add a differentially private PCA layer and filter their data through it.

My confusion is about why we do need to have DP-SGD after this (or the other way around, why DP-PCA when we’re already ensuring DP with our DP-SDG method). I mean, based on post-processing principle, if a mechanism is say (epsilon)-DP, any function performed on the output of that mechanism is also (epsilon)-DP. Now since we’re already applying an (epsilon)-differentially private PCA mechanism on our data, why do we need to have the whole DP-SGD process after that? I understand the problem with local DP and why it’s much more efficient to do global DP on the model instead of the training data, but I’m wondering if we’re already applying DP during the training phase, is it really necessary for the PCA to be DP as well or could we have just used normal DP or another dimensionality reduction method?