# Most efficient way to calculate user compatibility from per user

I’m looking for an efficient way to calculate user to user compatibility.

A user has various preferences:

• age range (min & max integer)
• various choice based categories (multiple choice)
• distance max
• correlated desire (probably most complex; a user can hate/love/curious about "giving" say a "hug" and is thus looking for a user who love/curious about "receiving" a "hug"; in my actual application it’s not "hug" and some users are only interested in giving and some only in receiving)

Example user preference: age: between 21 and 50, gender: female, hair color: brown, interested in: reading, nature, giving "hugs": loves, receiving "hugs": hates

I’m open to any data structure that would assist the matching, but right now I have tables like "gender", "interests", "desires", and tables joining to users like "users_genders" (user_id, gender_id) to indicate what user is what gender, etc.

Ideally I’m looking to calculate a compatibility % but even just getting a list of users who are compatible enough would be acceptable

My previous approach was to have a function that took two user IDs returning the % compatibility between the two. Then for each user I’d loop through all other users, calling that function to calculate compatibility. Then I’d store this data in a table (user_id1, user_id2, compatibility). This "worked" and resulted in fast lookups that could even be sorted by compatibility (a definite plus but not required) but quickly created a HUGE table that crashed with <100K users. The table crashed because it became extremely large. I’m ideally looking for an approach that doesn’t use N*(N-1) rows for N users.

Doing the math, I see why, since I’m doing N*(N-1) compare & stores for N users.

I suppose I could try to address that by splitting the compatibility table by user so each table would have at most N-1 rows, but this isn’t ideal

The other approach I can think of is to turn the preference criteria into SQL. This wouldn’t easily get me a percentage score but at least I could get all matches compatible enough for a user (maybe removing conditions and trying again if no results)

My current approach is in MySQL. But I’m open to a different DB.

I found these related questions on Stackoverflow (though they really seem they belong more here)

https://stackoverflow.com/questions/27928548/representing-compatibility-between-two-users-in-mysql

https://stackoverflow.com/questions/31919589/how-to-get-the-best-match-without-looping-through-the-entire-database

which all seem to indicate that the (user_id1, user_id2, compatibility) table is probably my best bet, but I’m hoping that’s changed since these were asked or db wizards here have better ideas