I have a 1 TB read-only database where performance is critical. It’s difficult to predict queries since they are dynamically generated by the users (the whole thing is basically a visualization platform atop a large collection of medical studies, and users select what they want to visualize). Queries can often be complex and involve 10+ joins. I recently learned about the extended statistics feature, but I find little information online about when best to use it (other than what’s in the documentation).
The DB is pretty well normalized, but makes extensive use of materialized views which are de-normalized. Is there any performance penalty or other issue with creating extended statistics (dependency and top n) for all pairwise columns? It would result in, say, 500 statistics on some 70 tables. Time for analyze or inserts is not of relevance, only read performance. Also, is there a tool or code snippet to help me do this?
I’m using Postgresql 12 and it’s optimized as far as possible w.r.t. indexing.