Let’s say I have a list of n items – i1, i2, …, in.

These items are similar to one another based on some similarity function, say similar_func(item_x,item_y) where item_x,item_y belong to list i1, i2, … in. This function returns a number that ranges from 0 to 1, where 0 means not similar at all and 1 means exactly similar.

Now I would like a create a number of groups, each of which may contain items from the above list. Each group having items that are similar to each other items of the group by let’s say 0.5.

I could run a double loop on the list, comparing each item to one another and creating a group whenever I find two items with similarity more than 0.5. Of course I would need to check if the item already exists in a group already created and with all the other items in that group.

This brute force method is possible and will give the right answer.

But I would like to do this more efficiently. Is there a efficient method for solving this problem?

Edit: There can be any number of groups and an item can be in multiple groups.

If similarity(x,y) = 0.6 and similarity(x,z) = 0.6, but similarity(y,z) = 0.4, then there will be two groups -> [x,y] and [x,z]

Also if x is similar to y AND x is similar to z, then it doesnt mean that y is similar to z. The similarity between two items solely depends on the the return value of similarity function when those items are passed as arguments to it.