Distributed predicate computation on event stream


My question is actually a request for papers, articles, texts or books on the problem that I’m trying to solve on my work.

I’m working on a program that computes a predicate value (true or false) for a given object in a distributed system in which there is a stream of events that can change the object’s attributes and, consequentially, the predicate value. Whenever the predicate value changes, the program must send a notification about this change.

For example, consider that there is an object A which has an attribute called name and consider that there is a predicate P which is true when the object’s name is equal to Jhon. Each event in the stream has a timestamp and a value for the attribute name. So consider the following sequence of events:

e1 = { name: Jhon, timestamp: 1 } e2 = { name: Jhon, timestamp: 2 } e3 = { name: Peter, timestamp: 3 } e4 = { name: Doug, timestamp: 4 } e5 = { name: Jhon, timestamp: 5 } 

Now, the events don’t necessarily show up in the stream in the correct order and, even worst, there are multiple computers parallelly processing this stream of events. However, for simplicity, I’ll go further in this example considering only one computer.

If the events arrive and are processed in the order described above, then the notifications sent should be:

P(A) = true when e1 arrives P(A) = false when e3 arrives P(A) = true when e5 arrives. 

That is the correct sequence of notifications. Now, imagine that the computer receives the events in the following order:

e1, e5, e2, e4, e3 

A naive algorithm which doesn’t consider the event’s timestamp would send an incorrect sequence of notifications:

P(A) = true when e1 arrives P(A) = false when e4 arrives 

The algorithm that I’m working on considers the timestamps and infers when a notification should have been sent but was not. So when e3 arrives it will notice that the notification P(A) = true for e5 was not sent. This feels a bit like reinventing the wheel, though I’m not aware of any reading about this problem. I would like some references to this problem or to something similar, like some papers dealing with this kind of problem.

The real problem is quite more complex since it involves storing the predicate $ \times$ object state in a database that works as a shared state between the computers processing the stream and I’m talking about thousands of events arriving per second so it’s not possible to keep all events stored in some database.