There are thousands (or tens of thousands) of possible movie titles. New user enters my website and selects hundreds of titles that he likes.
The only single goal of my website is to output the list of other users that liked exact same titles, but with only very minor differences – say, only 1-5 titles between users are different.
The solution I see is straightforward – take input from new user, iterate over every row in database to compare it with every other user, output the result.
But I want to solve it with millions (or tens of millions) of users in mind – and the straightforward solution of course doesn’t scale well at all.
I feel like maybe my lack of formal education with algorithms/data structures is limiting my thinking here – how could I implement a more efficient solution?
Note that I don’t want to categorize users in vague buckets that have common interests – I want to match exact titles between users as precisely as possible – so it’s not your usual common recommendation system, I guess.