Context
I am trying to create a ranked-customer-search that will order results based on "most likely correct". We have several factors we search by, but to keep it simple I will stick with just name, phone number, & email. The goal is that if the customer has an existing account, we use that instead of creating a new account.
It is also worth noting that for this system, a customer account is US state-specific. So it is technically possible for a single person to have 49 existing accounts and still need to have a "new account" created, so there are often many duplicate accounts we can copy information from.
My Attempt
The query below uses binary values to determine the rank of a result. So a match based on a phone number (0100) scores higher than a match based on name (0001) or email (0010). This works pretty well but isn’t quite ideal. For example, an account that matches both email & name will rank lower than an account that matches only a phone number. Unfortunately, this is pretty much my limit when it comes to creating queries.
(The UNION
s below are required because they allow better indexes to be used)
SELECT results.id, SUM(results.score) AS total_score FROM ( ( SELECT c1.id, 4 AS score FROM customers c1 WHERE c1.phone = :phone ) UNION ALL ( SELECT c2.id, 2 AS score FROM customers c2 WHERE c2.email = :email ) UNION ALL ( SELECT c3.id, 1 AS score FROM customers c3 WHERE c3.first_name = :first_name AND c3.last_name = :last_name ) ) results GROUP BY results.id ORDER BY total_score DESC LIMIT 10;
My Question
I am not sure how to change it so that matching multiple less-important factors ranks higher than a single important factor?
Also, since this is my first time creating a search query like this, there is likely a better and/or more standard way of doing this; so any resources you can share related to this would also be greatly appreciated!