A dozen tables related by foreign key to a central table (call it TableA). All tables have a PK that is INT IDENTITY, and Clustered. All the tables’ FKs are indexed.
This looks like a star configuration. TableA has fairly static personal info such as name, & DOB. The surrounding tables each have lists of items about the person in TableA that change or grow over time: for example, a person might have several emails, several addresses, several phone numbers, etc…
In the unusual event that I want to delete from TableA (test data that gets inserted during performance checks, for example), the FKs all have CASCADE DELETE to handle removing all subordinate data lists if they exist in any of the surrounding tables. I have three environments to play with: DEV, QA, and UAT (well, four if you count PROD, but “play” is not something I would want to do to PROD). DEV has about 27 million people in TableA with various counts upward of 30M in the surrounding tables. QA and UAT are only a few hundred thousand rows.
The simple “delete from TableA where Id = @Id” takes < 1ms on DEV (the big one) and the execution plan looks fine, lots of tiny thin lines and all index seeks… but here’s the rub: infrequently on DEV, and ALWAYS on QA and UAT, the simple delete takes about 1 second and the plan shows almost all the indexes are being scanned, with big fat lines showing the entire row counts.
The delete statement is issued by Entity Framework Core running inside an API so I have limited capability to mess with it (making it into a stored procedure, index hinting, using a different predicate, and other ideas…)
Despite all three environments being identical (same script created all three environments), nothing I have done so far has improved QA and UAT, but DEV is usually fast.
When DEV becomes slow, it remains slow until “something” happens. I haven’t figured out what the “something” is, but when it occurs, the performance reverts to fast again and remains that way for days.
If I catch DEV at a slow time, and use SSMS to manually run a delete statement, the plan is fast (<1ms); but the deletes coming from the API use a slow plan (1s). Entity Framework is (as best I can tell) using sp_executesql to run a parameterized “delete from tableA where Id = @Id”. The manual query is “DELETE FROM TableA WHERE Id = 123456789”
The row being deleted is always a recently-added row, meaning that the Id is right at the “top” and probably not within the range of the index statistics (although I speak from a position of profound ignorance on that topic and probably have my wires crossed…)
What I have tried so far
Reading up on FK cascade delete issues, it seems all the FKs need to be indexed, so I did that.
Rebuild (not just Reorg) every index.
Selectively delete the bad plans from the plan cache using DBCC FREEPROCCACHE (plan handle)
Running the excellent tools from Brent Ozar got me checking that the FKs were all is_not_trusted = 0
Looked at these (and other) previous StackExchange questions:1, 2, 3, 4
Of those, I suspect that the last one, with a description of how the cardinality estimator gets confused, might be pointing to the source of the problem, but I need help figuring out what to do next…
The plan shot below (from ssms) shows the slow plan: some of the FK indexes are being scanned (but not all) and there is an excessive memory grant. The fast plan would show all index seeks. The whole plan is at ShowMyPlan Link
I hope someone can point out what I have missed, or what I can investigate further.