I’ll just show you an example. Here candidates
is a table of 1000000 candidates from 1000 teams and their individual scores. We want a list of all teams and whether the total score of all candidates within each team is within the top 50. (Yeah this is similar to the example from another question, which I encourage you to look at, but I assure you that it is not a duplicate)
Note that all CREATE TABLE results AS ...
statements are identical, and the only difference is the presence of indices. These tables are created (and dropped) to suppress the query results so that they won’t make a lot of noise in the output.
------------ -- Set up -- ------------ .open delete-me.db -- A persistent database file is required .print '' .print '[Set up]' DROP TABLE IF EXISTS candidates; CREATE TABLE candidates AS WITH RECURSIVE candidates(team, score) AS ( SELECT ABS(RANDOM()) % 1000, 1 UNION SELECT ABS(RANDOM()) % 1000, score + 1 FROM candidates LIMIT 1000000 ) SELECT team, score FROM candidates; ------------------- -- Without Index -- ------------------- .print '' .print '[Without Index]' DROP TABLE IF EXISTS results; ANALYZE; .timer ON .eqp ON CREATE TABLE results AS WITH top_teams_verbose(top_team, total_score) AS ( SELECT team, SUM(score) FROM candidates GROUP BY team ORDER BY 2 DESC LIMIT 50 ), top_teams AS ( SELECT top_team FROM top_teams_verbose ) SELECT team, SUM(team IN top_teams) FROM candidates GROUP BY team; .eqp OFF .timer OFF ------------------------------ -- With Single-column Index -- ------------------------------ .print '' .print '[With Single-column Index]' CREATE INDEX candidates_idx_1 ON candidates(team); DROP TABLE IF EXISTS results; ANALYZE; .timer ON .eqp ON CREATE TABLE results AS WITH top_teams_verbose(top_team, total_score) AS ( SELECT team, SUM(score) FROM candidates GROUP BY team ORDER BY 2 DESC LIMIT 50 ), top_teams AS ( SELECT top_team FROM top_teams_verbose ) SELECT team, SUM(team IN top_teams) FROM candidates GROUP BY team; .eqp OFF .timer OFF ----------------------------- -- With Multi-column Index -- ----------------------------- .print '' .print '[With Multi-column Index]' CREATE INDEX candidates_idx_2 ON candidates(team, score); DROP TABLE IF EXISTS results; ANALYZE; .timer ON .eqp ON CREATE TABLE results AS WITH top_teams_verbose(top_team, total_score) AS ( SELECT team, SUM(score) FROM candidates GROUP BY team ORDER BY 2 DESC LIMIT 50 ), top_teams AS ( SELECT top_team FROM top_teams_verbose ) SELECT team, SUM(team IN top_teams) FROM candidates GROUP BY team; .eqp OFF .timer OFF
Here is the output
[Set up] [Without Index] QUERY PLAN |--SCAN TABLE candidates |--USE TEMP B-TREE FOR GROUP BY `--LIST SUBQUERY 3 |--CO-ROUTINE 1 | |--SCAN TABLE candidates | |--USE TEMP B-TREE FOR GROUP BY | `--USE TEMP B-TREE FOR ORDER BY `--SCAN SUBQUERY 1 Run Time: real 0.958 user 0.923953 sys 0.030911 [With Single-column Index] QUERY PLAN |--SCAN TABLE candidates USING COVERING INDEX candidates_idx_1 `--LIST SUBQUERY 3 |--CO-ROUTINE 1 | |--SCAN TABLE candidates USING INDEX candidates_idx_1 | `--USE TEMP B-TREE FOR ORDER BY `--SCAN SUBQUERY 1 Run Time: real 2.487 user 1.108399 sys 1.375656 [With Multi-column Index] QUERY PLAN |--SCAN TABLE candidates USING COVERING INDEX candidates_idx_1 `--LIST SUBQUERY 3 |--CO-ROUTINE 1 | |--SCAN TABLE candidates USING COVERING INDEX candidates_idx_2 | `--USE TEMP B-TREE FOR ORDER BY `--SCAN SUBQUERY 1 Run Time: real 0.270 user 0.248629 sys 0.014341
While the covering index candidates_idx_2
does help, it seems that the single-column index candidates_idx_1
makes the query significantly slower, even after I ran ANALYZE;
. It’s only 2.5x slower in this case, but I think the factor can be made greater if you fine-tune the number of candidates and teams.
Why is it?