I’m looking for an efficient way to calculate user to user compatibility.
A user has various preferences:
- age range (min & max integer)
- various choice based categories (multiple choice)
- distance max
- correlated desire (probably most complex; a user can hate/love/curious about "giving" say a "hug" and is thus looking for a user who love/curious about "receiving" a "hug"; in my actual application it’s not "hug" and some users are only interested in giving and some only in receiving)
Example user preference: age: between 21 and 50, gender: female, hair color: brown, interested in: reading, nature, giving "hugs": loves, receiving "hugs": hates
I’m open to any data structure that would assist the matching, but right now I have tables like "gender", "interests", "desires", and tables joining to users like "users_genders" (user_id, gender_id) to indicate what user is what gender, etc.
Ideally I’m looking to calculate a compatibility % but even just getting a list of users who are compatible enough would be acceptable
My previous approach was to have a function that took two user IDs returning the % compatibility between the two. Then for each user I’d loop through all other users, calling that function to calculate compatibility. Then I’d store this data in a table (user_id1, user_id2, compatibility). This "worked" and resulted in fast lookups that could even be sorted by compatibility (a definite plus but not required) but quickly created a HUGE table that crashed with <100K users. The table crashed because it became extremely large. I’m ideally looking for an approach that doesn’t use N*(N-1) rows for N users.
Doing the math, I see why, since I’m doing N*(N-1) compare & stores for N users.
I suppose I could try to address that by splitting the compatibility table by user so each table would have at most N-1 rows, but this isn’t ideal
The other approach I can think of is to turn the preference criteria into SQL. This wouldn’t easily get me a percentage score but at least I could get all matches compatible enough for a user (maybe removing conditions and trying again if no results)
My current approach is in MySQL. But I’m open to a different DB.
I found these related questions on Stackoverflow (though they really seem they belong more here)
which all seem to indicate that the (user_id1, user_id2, compatibility) table is probably my best bet, but I’m hoping that’s changed since these were asked or db wizards here have better ideas