How to build an execution graph of concurrency accesses

Sorry If I don’t use the right vocabulary, maybe part of my question is due to the fact that I don’t know the name of what I’m searching.

I have a bounded set of operations that need to access a database. I want to create an “execution graph” of these operations that prevents conflicting accesses. For each of this operations, I know the read set and write set.

Let’s say I have four operation, a, b, c and d :

readSet(a) = {k1, k2} writeSet(a) = {}

readSet(b) = {} writeSet(b) = {k1}

readSet(c) = {} writeSet(c) = {k2}

readSet(d) = {} writeSet(d) = {}

So here I can build the graphs whit the following set of edges :

g1 -> { (b, a), (c, a) }

g2 -> { (a, b), (a, c) }

How do I build such a graph that is that is as little deep as possible ? ( i.e that maximize the parallelism ? )

What are the keywords to find litterature on this problem ?

GO – Goroutine and Concurrency

Background:

Threads use pre-emptive scheduling, whereas fibers use cooperative scheduling.

With Pthreads: the current execution path may be interrupted or preempted at any time This means that for threads, data integrity is a big issue because one thread may be stopped in the middle of updating a chunk of data, leaving the integrity of the data in a bad or incomplete state. This also means that the operating system can take advantage of multiple CPUs and CPU cores by running more than one thread at the same time and leaving it up to the developer to guard data access.

Using C,

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,                           void *(*start_routine) (void *), void *arg); 

Using threads, application may have Concurrency,

Properties of Concurrency:

1) Multiple actors

2) Shared resource

3) Rules to access(Atomic/Conditional synchronization)


With C++ fibers: the current execution path is only interrupted when the fiber yields execution This means that fibers always start and stop in well-defined places, so data integrity is much less of an issue. Also, because fibers are often managed in the user space, expensive context switches and CPU state changes need not be made, making changing from one fiber to the next extremely efficient. On the other hand, since no two fibers can run at exactly the same time, just using fibers alone will not take advantage of multiple CPUs or multiple CPU cores.

In Win32, a fiber is a sort of user-managed thread. A fiber has its own stack and its own instruction pointer etc., but fibers are not scheduled by the OS: you have to call SwitchToFiber explicitly. Threads, by contrast, are pre-emptively scheduled by the operation system.

So roughly speaking a fiber is a thread that is managed at the application/runti me level rather than being a true OS thread.

Using C,

void __stdcall MyScheduler(void *param){   .... }  LPVOID *FiberScheduler = CreateFiber(0, MyScheduler, NULL); 

Why fibers?

OS threads give us everything we want, but for a heavy performance penalty: switching between threads involves jumping back and forth from user to kernel mode, possibly even across address space boundaries. These are expensive operations partly because they cause TLB flushes, cache misses and CPU pipelining havoc: that’s also why traps and syscalls can be orders of magnitude slower than regular procedure calls.

In addition, the kernel schedules threads (i.e. assigns their continuation to a CPU core) using a general-purpose scheduling algorithm, which might take into account all kinds of threads, from those serving a single transaction to those playing an entire video.

Fibers, because they are scheduled at the application layer, can use a scheduler that is more appropriate for their use-case. As most fibers are used to serve transactions, they are usually active for very short periods of time and block very often. Their behavior is often to be awakened by IO or another fiber, run a short processing cycle, and then transfer control to another fiber (using a queue or another synchronization mechanism).Such behavior is best served by a scheduler employing an algorithm called “work-stealing”;When fibers behave this way, work-stealing ensures minimal cache misses when switching between fibers.


Fiber does not exploit the power of multiple core, because what OS knows is, single threaded process.

In GO, we invoke goroutines using go keyword

func main(){   go f() // f runs as new go routine   f() } 

Edit

Based on the code of server model ..go/src/net/http, below is my understanding:

enter image description here

My understanding is, Go runtime allocates GOMAXPROCS threads to server_program process, that is using net/http package to serve http requests, where each thread is a user-land thread.

GOMAXPROCS user-land threads are shared between kernel threads provided by OS to server_program process.

Each Goroutine runs on its corresponding user-land thread. Goroutine pre-empts when user-land thread pre-empts.

Question 4

As per above server model,

Unless user-land thread is pre-empted by OS, on what conditions, does server Goroutine gets pre-empted on its own?


Question:

1) Is GO routine(f) a fiber that is non-preemptively scheduled by GO runtime, in user space?

2) If yes, Does Concurrency situation arise in GO environment?

3) Does GO support api for OS level thread?

Approaches to dynamically determining if concurrency needed

There is a series of procedures, each of which falls into one of two general categories.

Most of these are of category 1: it would be best to execute them one at a time – the overhead associated with threading or even using a thread pool negates any gains.

However there are a few of category 2: it would be best to move these into a concurrency scenario, like a different thread, a task, etc. These procedures, unlike the others, do justify the overhead.

The series is made up of an unknown number and unknown order of both categories, other than the fact that most will usually be of category 1.

What are the known approaches to solving this problem? Here is a stab at some approaches, but I’m curious if there are others or if there is a comprehensive survey of the problem.

  1. Start all procedures sequentially, but determine mid-process if it should be moved to a separate concurrent process.
  2. Have a ‘look-ahead’ process running simultaneously that decides what category the procedure belongs to. It is referred to when a procedure is ready to be started.

Both of these ideas have a tradeoff (as I’m sure any approach will): the added intelligence costs something that adds to the overhead.

Understanding connection between different database concurrency control protocol

I have came across following diagram related to timestamp based concurrency control protocols in database systems:

enter image description here

I don’t know whether its correct or not.

As per my interpretation, it means all schedules possible under protocol of particular oval is also possible under protocol of enclosing oval.

However as long as I know, both multiversion timestamp and timestamp ordering protocols ensures conflict serializability, whereas Thomas write rule ensures view serializability.

So I guess, the diagram correctly depicts the relation between Thomas write rule and timestamp ordering protocol. But I am unsure whether it correctly depicts the relation of multiversion timestamp ordering protocol correctly in relation with Thomas write rule. (I believe relation between multiverstion timestamp ordering protocol and timestamp ordering protocol is also correctly shown).

Should it be something like this:

enter image description here

I have came with above diagram because I feel it correctly captures following points:

  • All schedules possible under timestamp ordering are also possible under multiversion timestamp
  • All schedules possible under timesstamp ordering are also possible under thomas write rule
  • Some schedules (specifically view serializable ones) possible under Thomas write rule are not possible under multi version timestamp protocol and vice versa (schedules involving different versions of data items).

Am I right with all of these?

Understanding validation based database concurrency control protocol

Database book by Korth explains validation protocol as follows :

To perform the validation test, we need to know when the various phases of transactions took place. We shall, therefore, associate three different timestamps with each transaction $ T_i$ :

  1. Start($ T_i$ ),the time when $ T_i$ started its execution.

  2. Validation($ T_i$ ), the time when $ T_i$ finished its read phase and started its validation phase.

  3. Finish($ T_i$ ),the time when $ T_i$ finished its write phase.

The validation test for transaction $ T_i$ requires that, for all transactions $ T_k$ with $ TS(T_k) < TS(T_i)$ , one of the following two conditions must hold:

  1. Finish($ T_k$ ) < Start($ T_i$ ). Since $ T_k$ completes its execution before $ T_i$ started, the serializability order is indeed maintained.

  2. The set of data items written by $ T_k$ does not intersect with the set of data items read by $ T_i$ , and $ T_k$ completes its write phase before $ T_i$ starts its validation phase Start($ T_i$ ) $ <$ Finish($ T_k$ ) $ <$ Validation($ T_i$ )). This condition ensures that the writes of $ T_k$ and $ T_i$ do not overlap. Since the writes of $ T_k$ do not affect the read of $ T_i$ , and since $ T_i$ cannot affect the read of $ T_k$ , the serializability order is indeed maintained.

Doubt

How second point says “writes of $ T_k$ do not affect the read of $ T_i$ “?

Start($ T_i$ ) is the timestamp when $ T_i$ started its execution.

Validation ($ T_i$ ): is the timestamp when $ T_i$ finishes its read phase and starts its validation phase.

I believe that mean read phase start at Start($ T_i$ ) and ends at Validation($ T_i$ ) ?

Also Finish($ T_k$ ) is timestamp when $ T_k$ finished its write phase.

So, I believe Start($ T_i$ ) $ <$ Finish($ T_k$ ) $ <$ Validation($ T_i$ ) means write of $ T_k$ overlaps with read of $ T_i$ and hence write of $ T_k$ can affect read of $ T_i$ .

Am I missing something stupid?

information about concurrency control

Hello everyone,

I have some questions about concurrency control in mysql. I was wondering what commando do you use to check wich concurrency control you have or use in innodb. And what are the different types and security levels (for what are they protected and for what they dont give protection? And how can I raise to a higher concurrency control level?

Hopely someone can and wanne help me with this?

Greets
Matthias

Performance of large transactions and concurrency?

If I have a multi-million row table and I run a transaction that updates 50k rows, what are the performance implications of this?

Assuming it’s indexed correctly, it shouldn’t take long, but what rows are locked and how is the usage of that table affected?

  1. Are rows being updated during the transaction able to be read after the transaction starts and before it finishes?
  2. Are rows not being updated during the transaction able to be read after the transaction starts and before it finishes?
  3. If another transaction starts trying to change rows that are being changed by a previously unfinished transaction, will that transaction fail at start or after it tries to commit (assuming conflict)?

My question is for Postgres 9.3; I assume there are variations.