Host filesystem manipulation from docker vs. virtual machine

When reading about docker, I found a part of the documentation describing the attack surface of the docker daemon. From what I was able to understand, part of the argument is that it is possible to share (basically arbitrary) parts of the host filesystem with the container, which can then be manipulated by a privileged user in the container. This seems to be used as an argument against granting unprivileged users direct access to the docker daemon (see also this Security SE answer).

Would the same be possible from a virtual machine, e.g. in VirtualBox, which on the host is run as an unprivileged user?

A quick test where I was trying to read /etc/sudoers on a Linux Host from a Linux guest running in VirtualBox did produce a permission error, but I would not consider myself an expert in that regard in any way nor was the testing very exhaustive.

Is unary machine code a concept?

Please assume for the sake of this session that humans can fluently read and understand machine languages and time isn’t a problem in that regard.

I, not a computer scientist, would at least theorize that a unary machine language is possible but might just be much "less comfortable" than binary machine language.

Is unary machine code a concept?

In a machine learning system, why use differentially private SGD if our input data is already perturbed by a DP mechanism?

I’m trying to implement my own version of a deep neural network with differential privacy to preserve the privacy of the parties involved in the training dataset.

I’m using the method by Abadi et al. proposed in their seminal paper Deep Learning with Differential Privacy as the basis of my implementation. Now I have trouble understanding one thing in this paper. In their method, they propose a differentially private SGD optimisation function and they use an accountant to keep their privacy budget expenditure during each iteration. All of this makes sense: every time you query the data, you need to add controlled noise to it to mitigate the risk of leakage. But before they begin the training process, they add a differentially private PCA layer and filter their data through it.

My confusion is about why we do need to have DP-SGD after this (or the other way around, why DP-PCA when we’re already ensuring DP with our DP-SDG method). I mean, based on post-processing principle, if a mechanism is say (epsilon)-DP, any function performed on the output of that mechanism is also (epsilon)-DP. Now since we’re already applying an (epsilon)-differentially private PCA mechanism on our data, why do we need to have the whole DP-SGD process after that? I understand the problem with local DP and why it’s much more efficient to do global DP on the model instead of the training data, but I’m wondering if we’re already applying DP during the training phase, is it really necessary for the PCA to be DP as well or could we have just used normal DP or another dimensionality reduction method?

Should you let yourself ssh into every machine in your network?

I am wondering how you should setup your network (AWS) so you can debug different things that might occur. Obviously there’s logging, but it seems at some point you might require SSHing into the actual machine of interest and checking around. If this is the case, it seems you would need to open up port 22 on every machine in the network. To make it secure, I would only allow bastion host to connect to my IP address, and then every other machine only allows connections from the bastion host on the internal network. Is this considered bad practice? If so, what is the right way to go about this situation?

Speedup with multi-head turning machine

What sort of speedup can a Turing machine with more than one head give vs a one-headed machine (I do not mean multiple tapes, I mean multiple heads operating on the same string at the same time making concurrent edits on different parts of the tape)?

ie. what is the overhead, worst-case, for a one-head Turing Machine to simulate a multi-head Turing Machine as the number of heads grow?

^ This paper ^ seems says linear time. But the multi-head machines have the additional property of a one-move shift operation (shift a given head to the position of some other given head).