Why is MEMORYCLERK_SQLGENERAL using so much memory in SQL Server?

I have a production system running SQL Server 2019 Standard edition. It recently had a problem 3 times in 1 day where it became unresponsive until a reboot. Errors seem to point to memory limitations. 32GB is installed and the machine is dedicated to MSSQL. Max Memory set to 26GB.

The best lead I have so far is output of dbcc memorystatus that was automatically dumped to the log along with a FAIL_PAGE_ALLOCATION error. The full output is attached, but this part below caught my eye. It looks like MEMORYCLERK_SQLGENERAL wanted so much memory that it forced normal things like the buffer pool and query memory down to uselessly small levels.

I can’t seem to find any good info on what MEMORYCLERK_SQLGENERAL does, let alone why it would want so much memory.

11/18/2020 15:10:48,spid51,Unknown,MEMORYCLERK_SQLGENERAL (node 0)                  KB Pages Allocated                            22821672 SM Committed                                      0 SM Reserved                                       0 Locked Pages Allocated                       546740 VM Committed                                  75776 VM Reserved                                12867644 ---------------------------------------- ---------- 11/18/2020 15:10:48,spid51,Unknown,MEMORYCLERK_SQLBUFFERPOOL (node 0)               KB Pages Allocated                                3400 SM Committed                                      0 SM Reserved                                       0 Locked Pages Allocated                            0 VM Committed                                      0 VM Reserved                                       0 ---------------------------------------- ---------- 11/18/2020 15:10:48,spid51,Unknown,MEMORYCLERK_SQLQUERYPLAN (node 0)                KB Pages Allocated                                3632 SM Committed                                      0 SM Reserved                                       0 Locked Pages Allocated                            0 VM Committed                                      0 VM Reserved                                       0 ---------------------------------------- ---------- 11/18/2020 15:10:48,spid51,Unknown,MEMORYCLERK_SQLQUERYEXEC (node 0)                KB Pages Allocated                                1128 SM Committed                                      0 SM Reserved                                       0 Locked Pages Allocated                            0 VM Committed                                      0 VM Reserved                                       0 

Should I write custom allocators for STL containers to interface with my memory pool, or just overwrite the standard new and delete

I want to write a custom memory allocator for learning. I’m tempted to have a master allocator that requests n bytes of ram from the heap (via new). This would be followed by several allocator… Adaptors? Each would interface with the master, requesting a block of memory to manage, these would be stack, linear, pool, slab allocators etc.

The problem I have is whether I should write custom allocator_traits to interface with these for the various STL containers; or if I should just ignore the adaptor idea and simply overload new and delete to use a custom pool allocator.

What I’m interested in understanding is what tangible benefit I would gain from having separate allocators for STL containers? It seems like the default std::allocator calls new and delete as needed so if I overload those to instead request from my big custom memory pool, I’d get all the benefit without the kruft of custom std::allocator code.

Or is this a matter where certain types of allocator models, like using a stack allocator for a std::dqueue would work better than the default allocator? And if so, wouldn’t the normal stl implementation already specialise?

A 32 – bit wide main memory unit with a capacity of 1 GB is built using 256M X 4-bit DRAM chips

A 32 – bit wide main memory unit with a capacity of 1 GB is built using 256M X 4-bit DRAM chips. The number of rows of memory cells in the DRAM chip is 2^14. The time taken to perform one refresh operation is 50 nanoseconds. The refresh period is 2 milliseconds. The percentage (rounded to the closet integer) of the time available for performing the memory read/write operations in the main memory unit is _______ .

I calculated that the no of DRAM chips needed is 32. Now each DRAM have rows = 2^14
and columns 2^16 also as we can refresh the rows in parallel and since for one memory cell the time is 50 nanoseconds so for 2^16 columns we will need 2^16 * 50 nano sec ?…Is my approach right

or if i consider that 50 nanoseconds is the time for refresh of a complete row then also it would need a total of 50 nanosec to refresh all in parallel

memory storage of a program before compiling

Whenever we write code, after compilation the code will be converted to machine language and then stored in the hard disk. But before compiling the code, it is still in the high-level language. How and where the memory is allocated for the code before compiling the code while it is in a high-level language.

I assume, before compiling the code is stored in RAM, but how? because we can only store in machine language in RAM.

If there is any wrong with my question or it is a wrong way of asking, please comment below. It will be helpful

Concurrent Garbage-Collectiong/Compacting Memory Allocator

I’m developing an algorithm for concurrent heap garbage collection/compaction. It will be used in low latency systems that need to scale well to a lot of clients, e.g. web servers.

I thought about an algorithm that would be suitable but it does have a few flaws. I’m not very good at describing algorithms, so please correct me if my description isn’t clear, but here it goes:

  • There are two heaps of equal size
  • There’s an object handle table, which contains memory address and lock of each object.
  • The index to the handle table is the handle of an object.
  • Each heap has a linked list of all objects, which contains type and size of the object.
  • Objects are copied from one heap to another

The GC/compaction algorithm is designed to be incremental and concurrent. This is the Pseudocode, which runs in all threads from time to time. Atomic operations are marked with // atomic.

current = heapFrom->currentCopyObj; // atomic heapFrom->currentCopyObj = current->Next; // atomic  while(heapTo->currentCopyObj->freeSpaceToNextObj < current->size) // atomic {     heapTo->currentCopyObj = heapTo->currentCopyObj->Next; // atomic }  size = current->Size; oldAddress = current->Address; newAddress = heapTo->currentCopyObj->Address;  handleTable->LockObj(current->Handle); // atomic  memcpy(heapTo->currentCopyObj->Address, current->Address, current->Size);  heapTo->InsertIntoObjectList(heapTo->currentCopyObj, current); // atomic heapFrom->RemoveFromObjectList(current); // atomic  handleTable->SetHandleAddress(current->Handle, newAddress); // atomic  handleTable->UnlockObj(current->Handle); // atomic 

Allocation of objects is done using a sort of bump allocator, which allocates objects at the end of each heap, handle allocation is done using an O(1) algorithm, which uses either a bump allocator or cached handle slots. This should make allocation quite fast, O(1) theoretically.

This algorithm has a few flaws though:

  • Object writes have to be locking, reads can be concurrent with copying
  • It does not achieve very good heap compression
  • High memory overhad due to handle table
  • Not very cache friendly due to handle table

Would this algorithm work? If it would, how could I solve some of the problems it has? Or is there a better algorithm that does not have these flaws?

Does exist a technology to monitor memory and filesystem vps?

Hello I’m looking for a vps to check my untrusted machine network activity. Because I suspect to have a spyware I want try to redirect all my traffic to the vps so that I can check it. But I want a really safe vps, for example something that monitor the vps filesystem and/or memory (from the point of view of the hypervisor then from outside the vps) and tell me if anything of suspicious is installed or loaded in the memory. A friend told me that many years ago vmware had a tecnology called vsafe. Anyone know if exist anything like that ? (I don’t ask about any provider because I don’t want make spam). Please I’m not looking for a network intrusion detection because the danger come from my own machine.

difference between “addressable” and “address” in memory?

I’m struggle on this practice question from this site….

Calculate the number of bits required in the address for memory having size of 16 GB. Assume the memory is 4-byte addressable.

MY QUESTION IS: what is the difference between an "address" and "the memory is 4 byte addressable"?

I understand an address would be its location in memory that is represented by bits, such as 2^n, where n is the number of bits in the address. But I’m confused about addressable in this question and how that’s different than address

2^n * 4 bytes = 2^34 The solution is 32 bits

Are lines what differ machine code programming in a text editor from in memory directly?

I understand that at least theoretically a human could do programming with a given type of machine code in a text editor OR in the memory directly somehow.

I also understand that in a (human invented?) computer memory, in each cell, data is sequential, scattered in addresses each of which contains a word in a fixed size which always contain bit/s to a full capacity.

I am not sure what would theoretically differ machine code programming in a text editor from machine code programming in memory directly; perhaps the very usage of lines ("the absence of sequence", I guess) as available in text editors, is the answer.

Median of distribution with memory constraint


Task

I want to approximate the median of a given distribution $ D$ that I can sample from.

A simple algorithm for this, using $ n$ samples, is:

samples = [D.sample() for i in range(n)] # generate n samples from D sort(samples) return samples[n/2] 

However, I am looking for an algorithm that requires less than $ O(n)$ space.

Ideas

I have looked into these algorithms:

  • Median of medians: Needs $ O(n)$ space, so it does not work for me.
  • Randomized median: It seems like this could be easily generalized to an algorithm that uses $ O(n^{3/4})$ space.

Are there any other algorithms that use less then $ O(n)$ space that could solve my problem? In particular, I was thinking there may be an algorithm that uses $ O(m)$ space by generating batches of samples from $ D$ of size $ m$

Details

  • Ideally, I am looking for a reference to an algorithm that also includes analysis (success probability, expected runtime, etc).
  • Actually, I need an algorithm to estimate $ D$ ‘s $ p$ -th percentile for a given $ p$ , but I am hoping most median-finding algorithms can be generalized to that.