Something weird going on with my algorithm?


I wrote an algorithm for data analysis using the CERN ROOT framework. It takes in three files of sorted UNIX timestamps from a single year, and pairs them up in the closest triplets, where each file contributes a triplet, and each triplet is unique. I know there are some more "well known" algorithms for accomplishing this, however this algorithm completes the task much faster, clocking in at about 20 minutes on my machine, as compared to many, many hours for some of the other algorithms I’ve tried. When complete, the algorithm plots the triplets (of the form {a,b,c]) on a 2-dimensional histogram, where the horizontal axis is a-b, and the vertical axis is a-c.

Problem is, it seems to be acting very weird. Namely, when I feed the algorithm one file of real data (these are timestamps generated by an experiment) and two files of completely random data, I get these weird diagonal lines: https://filebin.net/spbswnhkfy8xdkp8/random_plot.pdf?t=48f2yhjq. When I feed the algorithm three files of real data, a single diagonal line through the middle (and two more lines, running horizontally and vertically) appears if I use enough bins. Any idea what’s going on with my algorithm?

void unbiasedAnalysis(){  TNtupleD *D  = new TNtupleD("D","D","x:y");  ROOT::RDataFrame statn1("D", "./pathtodata"); ROOT::RDataFrame statn2("D", "./pathtodata"); ROOT::RDataFrame statn3("D", "./pathtodata");  vector<double> vec_1, vec_2, vec_3; statn1.Foreach([&](double tstamp){ vec_1.push_back(tstamp); },{"UNIX"}); statn2.Foreach([&](double tstamp){ vec_2.push_back(tstamp); },{"UNIX"}); statn3.Foreach([&](double tstamp){ vec_3.push_back(tstamp); },{"UNIX"});  vector<vector<double>> pairs; for(auto tstamp : vec_1){      double first,second;      //get iterator pointing to closest element greater than or equal to     auto geq = std::lower_bound(vec_2.begin(), vec_2.end(), tstamp);     //get iterator pointing to nearest element less than     auto leq = geq - 1;      double foo = tstamp - *geq;     double bar = tstamp - *leq;      //compare iterators, save the closest      if(dabs(foo) <  dabs(bar)){ first = *geq; }     else { first = *leq; }         //repeat     geq = std::lower_bound(vec_3.begin(), vec_3.end(), tstamp);     leq = geq - 1;      foo = tstamp - *geq;     bar = tstamp - *leq;      if(dabs(foo) < dabs(bar)){ second = *geq; }     else { second = *leq; }      //add to pairs     pairs.push_back({tstamp, first, second, (tstamp-first), (tstamp-second), std::min((tstamp-first), (tstamp-second))});  }  //sort vector of vectors by size of smallest difference std::sort(pairs.begin(), pairs.end(),     [](const vector<double>& A, const vector<double>& B){         return A[5] < B[5]; });  std::set<double> cache;  ROOT::EnableImplicitMT();  for(auto pair : pairs){     //if not in cache, add to TNtuple     if(cache.find(pair[1]) == cache.end() && cache.find(pair[2]) == cache.end()){              D->Fill(pair[3],pair[4]);          //add to cache         cache.insert(pair[1]); cache.insert(pair[2]);     } }  D->Draw("x:y>>htemp(100,-0.02,0.02,100,-0.02,0.02)","","colz"); 

}