Methods to Prove Data Authenticity from Potentially Compromised Sources?

I’ve been thinking about this problem for some time and I wanted to ask if there are any known methods, or research papers, about how to prove "authenticity" or correctness of data originating from a potentially compromised source (remote server, process, etc). Specifically what I’ve been imagining is say you have service A and service B, service B sources data from A but is worried that A has been compromised such that even if data is signed by A, B can’t trust that it was generated by code written by A‘s developers. Is it possible for B to prove to itself that data from A is authentic, that it was indeed generated by the expected code and not injected or generated by an attacker who has compromised A?

One solution I’ve been thinking about is using a sort of distributed ledger or blockchain so that multiple nodes compute the same data, and in doing so raises the bar such that an attacker would have to compromise N% of the services producing the needed data, this provides naturally replication and I can use an appropriate consensus protocol, but ofc introduces some overhead, efficiency concerns, and I would need to think hard about side-effects being performed more than once.

If there is only one node possible of generating data, such as a sensor node, and it is compromised, I’d imagine all hope is lost, but I also wouldn’t be surprised if there is some clever crypto scheme that attempts to solve this problem as well.

I hope it’s clear as to what the question is, thank you.

How to prove that this problem is NP-complete (or NP-Hard)

I have the following data:

  • A set $ V$ of tasks, the starting time $ s_j$ of each task and the duration $ p_j$ of each task.

  • A set $ K$ of resource, each resource has an availability function $ R_{k}$ that is piecewise constant.That is, for each $ t = 0, .., T-1$ , we precise $ R_{k}(t)$ the number of units available at $ t$ . $ R_k$ is an array of length $ T$ .

  • Each task $ j$ needs $ r_{j,k}$ resources to be processed (it could be zero). This quantity needs to be available during all the processing time starting from $ s_j$ .

For example consider :

  • Task$ A$ has processing time $ 3$ and starts at time period $ t=2$ and needs 2 units of some resource $ k$
  • Task $ B$ has processing time $ 4$ starts at time $ t=3$ and needs 3 units of the same resource $ k$ .

Then if $ R_{k}(t) = [*,*,2,6,6,3,*,*]$ then we are ok since at time $ t=2$ only task $ A$ is active and it requires $ 2$ units, at time $ t=3$ , both tasks are active and the sum of their utilization is $ 2+3 = 5 \leq 6$ ; same at time $ t=4$ . At time $ t=4$ , only task $ B$ is active and it requires $ 3$ units.

However, if $ R_{k}(t) = [*,*,2,4,6,3,*,*]$ , is not ok since at time $ t=3$ , both tasks $ A$ and $ B$ are active and their total use is equal to $ 5$ wheras only $ 4$ units are available.

Here is my attempt to verify that the resource utilization at each $ t$ is no larger than the availability function. So the answer is yes or no (we can say that this is a decision problem).

For each time t in [0,T-1]   For each resource k in K      total_use = 0, active_set = A     for each task j in V       if s_j<=t and s_j+p_j > t and r_{j,k}>0 \if the task is active at time t and it requires positive amount of resource k in order to be processed)         total_use += r_{j,k}         active_set := active_set U {j}        if total_use > R_{k}(t)         print(at time t the usage of resource k exceeds its capacity, active_set)         return False return True 

The algorithm here is pseud-polynomial. Unfortunately, I need to find a polynomial one in order to say that the problem is in $ \mathcal{NP}$ .

How to prove that there is no algorithm with worst-case running time better than this one?

I have the following data:

  • A set $ V$ of tasks, the starting time $ s_j$ of each task and the duration $ p_j$ of each task.

  • A set $ K$ of resource, each resource has an availability function $ R_{k}$ that is piecewise constant.That is, for each $ t = 0, .., T-1$ , we precise $ R_{k}(t)$ the number of units available at $ t$ . $ R_k$ is an array of length $ T$ .

  • Each task $ j$ needs $ r_{j,k}$ resources to be processed (it could be zero). This quantity needs to be available during all the processing time starting from $ s_j$ .

Here is my attempt to verify that the resource utilization at each $ t$ is no larger than the availability function.

Algorithm

How to prove that the dual linear program of the max-flow linear program indeed is a min-cut linear program?

So the wikipedia page gives the following linear programs for max-flow, and the dual program :

enter image description here

While it is quite straight forward to see that the max-flow linear program indeed computes a maximum flow (every feasable solution is a flow, and every flow is a feasable solution), i couldn’t find convincing proof that the dual of the max-flow linear program indeed is the LP of the min-cut problem.

An ‘intuitive’ proof is given on wikipedia, namely : $ d_{uv}$ is 1 if the edge $ (u,v)$ is counted in the cut and else $ 0$ , $ z_u$ is $ 1$ if $ u$ is in the same side than $ s$ in the cut, and $ 0$ if $ u$ is in the same side of the cut than $ t$

But that doesn’t convince me a lot, mainly why should all the variables be integers, while we don’t have integer conditions ?

And in general, do you have a convincing proof that the dual of the max-flow LP indeed is the LP formulation for min-cut ?

How to prove νX. A × X ≅ (μX. 1 + X) -> A?

How can we prove Stream A = νX. A × X is isomorphic to Nat -> A = (μX. 1 + X) -> A ?

In programming sense, Stream A can be seen as a function from Nat to A, and I can write isomorphisms between them. But how can this be proven mathematically?

I would also like to know the conversion mechanism from μ to ν, and vice versa.