Suppose I have multiple data points in 2d (x,y) that are either labeled as A, B, C, or D. I find a minimum bounding area for points that are labeled as A and refer to it as cluster A. I can do the same thing for B, C, and D. After doing this, chances are high that cluster A will overlap with cluster B, C, or D. My goal is to find an algorithm to discard points so that
- There will be no more overlaps between clusters
- Discard as little number of points as possible
- The area “shrunk” for each cluster should be as similar as possible. Meaning the old minimum bounding area minus the new minimum bounding area should be similar among different clusters.
We can assume that the initial cluster roughly makes sense, which means the labeling is not random. The clusters have roughly a 20% overlap between each other. To potentially make the problem simpler, we can assume the area of each cluster is convex. Although not 100% of my datasets are convex of each clusters I would be happy if I can solve this problem for them first. If there isn’t an optimal solution for this I am also interested in greedy solutions.
I am also having a hard time to figure out what kind of keyword should I google to find similar problems for the solution. I would appreciate any pointers. Many thanks.