Concurrent Garbage-Collectiong/Compacting Memory Allocator

I’m developing an algorithm for concurrent heap garbage collection/compaction. It will be used in low latency systems that need to scale well to a lot of clients, e.g. web servers.

I thought about an algorithm that would be suitable but it does have a few flaws. I’m not very good at describing algorithms, so please correct me if my description isn’t clear, but here it goes:

  • There are two heaps of equal size
  • There’s an object handle table, which contains memory address and lock of each object.
  • The index to the handle table is the handle of an object.
  • Each heap has a linked list of all objects, which contains type and size of the object.
  • Objects are copied from one heap to another

The GC/compaction algorithm is designed to be incremental and concurrent. This is the Pseudocode, which runs in all threads from time to time. Atomic operations are marked with // atomic.

current = heapFrom->currentCopyObj; // atomic heapFrom->currentCopyObj = current->Next; // atomic  while(heapTo->currentCopyObj->freeSpaceToNextObj < current->size) // atomic {     heapTo->currentCopyObj = heapTo->currentCopyObj->Next; // atomic }  size = current->Size; oldAddress = current->Address; newAddress = heapTo->currentCopyObj->Address;  handleTable->LockObj(current->Handle); // atomic  memcpy(heapTo->currentCopyObj->Address, current->Address, current->Size);  heapTo->InsertIntoObjectList(heapTo->currentCopyObj, current); // atomic heapFrom->RemoveFromObjectList(current); // atomic  handleTable->SetHandleAddress(current->Handle, newAddress); // atomic  handleTable->UnlockObj(current->Handle); // atomic 

Allocation of objects is done using a sort of bump allocator, which allocates objects at the end of each heap, handle allocation is done using an O(1) algorithm, which uses either a bump allocator or cached handle slots. This should make allocation quite fast, O(1) theoretically.

This algorithm has a few flaws though:

  • Object writes have to be locking, reads can be concurrent with copying
  • It does not achieve very good heap compression
  • High memory overhad due to handle table
  • Not very cache friendly due to handle table

Would this algorithm work? If it would, how could I solve some of the problems it has? Or is there a better algorithm that does not have these flaws?