I’d like to find a good algorithm to find list of items which are similar. Let me give more details. I have a CVS file. First column is a key, and next 10 columns are properties (order doesn’t matter). Given a key included in this CVS file, I’d like to find other “rows” with more than 2 same properties.
K1,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10 K2,P3,P11,P12,P20,P30,P23,P45,P32,P43,P21 K3,P21,P5,P80,P81,P83,P11,P76,P65,P64,P63 ...
Given K2, algorithm should return K3, because they share properties P21 and P11 It would return more items if they have other matches in the file.
Any idea how I can do? I’d use python, but I’d need only how to approach it. Thanks.