The overhead of OCC validation phase

The validation phase of optimistic concurrency control has two derctions: one is backward validation, checking for conflicts with any previously validated transaction, the other is forward validation, checking for conflicts with transaction that not yet committed.

The validated transactions install modifications into "global" database, which means the main work backward validation needs to do is checking conflicts with "global" database. However, the forward validation needs to check conflicts with each running transaction. It introduces expensive communication between threads if the database supports multithreading, and also extensive memory read when the concurrency level of transaction is pretty high.

As far as I know, forword validation is more widely adopted than backward validation. Why? Which cases are suited for forward validation and backward validation respectively?

Overhead cost of spawning child processes

I am curious as to the overhead cost of spawning child processes using fork in a Linux environment. Suppose I have a C program such as

void run_computation(int x);  int main() {     for (int k=0; k<10; k++) {         if ( fork() == 0 ) {             run_computation(k);             break;         }     }      return 0; } 

Assuming that no global state is modified by run_computation, how efficient in terms of overhead is the setting up of the child processes?

To help understand where I’m coming from, I’d like to write my own regex engine (for self-learning) and am imagining using fork to implement the NFA. So, am I incurring an unreasonable cost in such an implementation compared to other methods?

64 byte cache block and memory overhead for cachline with 7 states (3 bits)

I came across some lecture notes of a professor about memory consistency and models. There is an example about memory overhead:

The cache line has 7 states (3 bits): unowned, shared, exclusive, modified, read, update and “uncached read”.

  • For 64 processor with the cache line described above, the memory overhead is 12.5%
  • For 256 processor with the cache line described above, the memory overhead is 50%
  • For 1024 processor with the cache line described above, the memory overhead is 200%.

This is the only information that I received from the notes. I wanted to ask him by contacting him but I got no answer. How are these overheads calculated?

What is more desirable: Inefficient code or maintainence overhead?

In our Ruby on Rails app, we have a set of conditions to define the status of a user. Currently, there is logic that checks if those conditions are true given a user. We are in the process of writing code that will fetch a set of users from the database based on these conditions.

A couple of my colleagues believe that we should reuse the logic for checks and loop over the entire set of queried records as opposed to writing a complex query to fetch the set. Their reason is that the definition of conditions would remain in one place and would help avoid maintenance overhead in case the definitions change.

On the other hand, I think the processes are inverses of each other and the maintenance overhead is desirable over the inefficient code approach.

Food Delivery Startup Business For Sale!! Low Overhead and an Excellent Home Based Business.

Food Delivery Startup Business For Sale.

Dish Out


Be in on the New Wave in the
Restaurant Food Delivery Service Industry!

Business Overview:
Deliver meals from local restaurants and receive delivery fee plus percentage of cost of meal. Low overhead and an excellent home based business….

Food Delivery Startup Business For Sale!! Low Overhead and an Excellent Home Based Business.

Balancing function call overhead and testability in a code that is a part of the deep learning model training loop

I am currently implementing the transformer architecture for sequence to sequence problems. Key part of the model is the attention mechanism, which is basically a matrix multiplication, followed by a masking operation and a softmax function. My initial thought was to wrap this 3 steps in a function, that looks like this:

    def attention(self, matrix_1, matrix_2, mask=None, trans_1=False, trans_2=False):         att_stage_1 = F.matmul(matrix_1, matrix_2, transa=trans_1, transb=trans_2)*self.scale_score         att_stage_2 = F.where(mask, att_stage_1,, 'f')*(-1e9))         return F.softmax(att_stage_2, axis=3) 

I want to write unit tests for this function to test whether the output is what I expect it to be. The problem, however, is that this function, as it is, performs 3 separate operations: matmul, masking and softmax. I would prefer to determine that each of this operations does produces correct output, but as it is I could only check the final effect. This leads me to a design where I would wrap each of this 3 operations to a separate, dedicated function and test them separately. What I am concerned, however, is that the overhead of python functions calls in a training loop function that is called on each forward pass may be unnecessary.

Thus, the question is, what would be the correct approach to balance design and reliability vs performance in this scenario? Maybe I am missing some obvious approach here.