I am trying to create a ranked-customer-search that will order results based on "most likely correct". We have several factors we search by, but to keep it simple I will stick with just name, phone number, & email. The goal is that if the customer has an existing account, we use that instead of creating a new account.
It is also worth noting that for this system, a customer account is US state-specific. So it is technically possible for a single person to have 49 existing accounts and still need to have a "new account" created, so there are often many duplicate accounts we can copy information from.
The query below uses binary values to determine the rank of a result. So a match based on a phone number (0100) scores higher than a match based on name (0001) or email (0010). This works pretty well but isn’t quite ideal. For example, an account that matches both email & name will rank lower than an account that matches only a phone number. Unfortunately, this is pretty much my limit when it comes to creating queries.
UNIONs below are required because they allow better indexes to be used)
SELECT results.id, SUM(results.score) AS total_score FROM ( ( SELECT c1.id, 4 AS score FROM customers c1 WHERE c1.phone = :phone ) UNION ALL ( SELECT c2.id, 2 AS score FROM customers c2 WHERE c2.email = :email ) UNION ALL ( SELECT c3.id, 1 AS score FROM customers c3 WHERE c3.first_name = :first_name AND c3.last_name = :last_name ) ) results GROUP BY results.id ORDER BY total_score DESC LIMIT 10;
I am not sure how to change it so that matching multiple less-important factors ranks higher than a single important factor?
Also, since this is my first time creating a search query like this, there is likely a better and/or more standard way of doing this; so any resources you can share related to this would also be greatly appreciated!