Response time optimization – Getting record count based on Input Parameters

I’m trying to optimize the process of calculating count of records, based on variable input parameters. The whole proces spans several queries, functions and stored procedures.

1/ Basically, front-end sends a request to the DB (it calls a Stored procedure) with an input parameter (DataTable). This DataTable (input parameter collection) contains 1 to X records. Each record corresponds to one specific rule.

2/ SP receives the collection of rules (as a custom typed table) and iterates through them one by one. Each rule apart from other meta-data contains a name of a specific function that should be used in evaluating the said rule.

For every rule, the SP prepares a dynamic query wherein it calls the mentioned function with 3 input parameters.

a/ Custom type Memory Optimized Table (Hashed index) b/ collection of lookup values (usually INTs) that the SELECT query uses to filtr data. Ie. "Get me all records, that have fkKey in (x1, x2, x3)" c/ BIT determining if this is the first rule in the whole process.

Each function has an IF statement, that determines based on the c/ parameter if it should return "all" records that fullfill the input criteria (b/). Or if it should fullfill the criteria on top of joining the result of previous rule that is contained in the custom table (a/)

3/ Once the function is run, it’s result is INSERTed into a table variable called @tmpResult. @result is then compared to tmpResult and records that are not in the tmpResult are DELETEd from result.

  • @result is a table variable (custom memory optimized table type), that holds intermediate result during the whole SP execution. It is fully filled up on the first rule, every consequent rule only removes records from it.

4/ The cycle repeats for every rule until all of the rules are done. At the end, count is called on the records in @result and returned as a result of SP.

Few things to take into account:

  • There are dozens of different types of rules. And the list of rules only grows bigger over time. That’s why dynamic query is used.
  • The most effective way to temporarily store records between individual rule execution so far proved to be custom Memory-Optimized table type. We tried a lot of things, but this one seems to be the fastest.
  • The number of records that are usually returned for 1 single rule is roughly somewhere between 100 000 and 3 000 000. That’s why a bucket_size of 5 000 000 for the HASHed temporary tables is used. And even though we tried nonclustered index, it was slower than that HASH.
  • The input collection of rules can vary strongly. There can be anything from 1 rule up to dozens of rules used at once.
  • Most every rules can be defined with at minimum 2 lookup values .. at most with dozens or in a few cases even hundred values. For a better understanding of rules, here are some examples:

Rule1Color, {1, 5, 7, 12} Rule2Size, {100, 200, 300} Rule3Material, {22, 23, 24}

Basically every rule is specified by it’s Designation, which corresponds to a specific Function. And by it’s collection of Lookup values. The possible lookup values differ based on the designation.

What we have done to optimize the process so far:

  • Where big number of records need to be temporarily stored, we use Memory-Optimized variable tables (also tried with temp ones, but it was basically same when using Memory-Optimized variants).
  • We strongly reduced and optimized the source tables the SELECT statements are run against.

Currently, the overal load is somewhat balanced 50/50 between I/O costs pertaining to SELECT statements and manipulation with records between temporary tables. Which is frankly not so good .. ideally the only bottleneck should be the I/O operations, but so far we were not able to come up with a better solution since the whole process has a lot of variability.

I will be happy for any idea you can throw my way. Of course feel free to ask questions if I failed to explain some part of the process adequately.

Thank you