## Cylindrical billboarding around an arbitrary axis in geometry shader

I found an answer on this site relating to this question already, but it doesn’t seem applicable in the context of my project.

Basically I’d like to create a method which fits this signature:

``float3x3 AxisBillboard(float3 source, float3 target, float3 axis) ``

That is to say, when given a source point (i.e an object’s position in world space), a target (camera position in world space), and an axis (the object’s up vector, which is not necessarily the global y axis), it produces a 3×3 rotation matrix by which I can multiply the vertices of my point so that it’s properly rotated.

I’ve found many solutions and tutorials online which work nicely assuming I only want to rotate around the y axis.

For example, here’s a solution which billboards around the global y axis:

``float3 dir = normalize(target - source);      float angleY = atan2(dir.x, dir.z); c = cos(angleY); s = sin(angleY);      float3x3 rotYMatrix; rotYMatrix[0].xyz = float3(c, 0, s); rotYMatrix[1].xyz = float3(0, 1, 0); rotYMatrix[2].xyz = float3(-s, 0, c); ``

For context, I’m working on a grass shader, and each individual blade of grass should be billboarded to face the camera while remaining aligned with the normal of the terrain.

## Usefulness of Differential Geometry

I recently came across these books: https://www.springer.com/us/book/9783030460396#aboutBook https://www.springer.com/gp/book/9783030460464#aboutBook Their subject matter really intrigues me, as I really enjoy topology/geometry/analysis, but had not planned to pursue them since I also want to work in an area with very concrete application. However, I am skeptical. At one point I thought topological data analysis (TDA) was the perfect marriage of my interests, but I have found very little evidence of that field actually being used in computer science, much less in industrial or otherwise more ‘practical’ settings. It seems like TDA makes mathematicians feel more relevant to the data science world, but I’m not convinced that it makes them so (feel free to contradict me if you think I’m wrong on this point, but note that I want a concrete use case, not an abstract argument about its relevance). I have similar stories about coding theory, certain aspects of set theory, etcetera. They may have theoretical relevance, but is there any situation where, in the process of developing software, one might need to consult theses fields? I don’t know of any.

So now my question: is there any practical field of computer science that makes advanced use of differential geometry? Medical imaging, other imaging, computer graphics, virtual reality, and some other fields come to mind as potential application areas. In my (admittedly limited) experience, however, these areas seem to use basic 3D geometry, numerical linear algebra, and sometimes numerical analysis of PDEs. Those are all very nice topics, but they do not require anything as abstract as differential geometry.

## How to solve this analytic geometry problem completely

I want to find a plane that passes through points `{1,0,0}` and `{0,1,0}` and is tangent to surface $$z(x,y)=x^{2}+y^{2}$$.

``Solve[{a, b, c}.{1, 0, 0} == d && a*0 + b*1 + c*0 == d &&    a*x0 + b*y0 + c*z0 == d && z0 == x0^2 + y0^2 &&    VectorAngle[{a, b, c}, {-2 x0, -2 y0, 1}] ==     0,(*MatrixRank[{2x0,2y0,1},{a,b,c}]\[Equal]1*){a, b, c, d, x0, y0,    z0}] ``

But I can’t get the answer I want with the above code(the answer is $$z=0$$ and $$2x+2y-z=2$$). What should I do?

## Intersection of line segments induced by point sets from fixed geometry

I am reading up on algorithms and at the moment looking at the below problem from Jeff Erickson’s book Algorithms.

I solved (a) by seeing a relationship to the previous problem on computing the number of array inversions. However, I am struggling with problem (b) as I cannot see how to reduce the circle point arrangement to an arrangement of points and lines that would be an input to the problem posed in (a). Assuming I have something for (b), I also cannot see how one might resolve (c).

For part (b), clearly every point $$p = (x, y)$$ satisfies $$x^2 + y^2 = 1$$ but I do not see how I might be able to use this fact to do the reduction. The runtime I am shooting for of $$O(n \log^2 n)$$ also seems to tell me the reduction is going to cost something non-trivial to do.

Can anyone have some further hints/insights that might help with part (b) and potentially even part (c)?

## Euclidean geometry theorem proving complexity

Euclidean geometry is complete, so the problem of determining whether a statement $$A$$ is provable is computable. Do we know its time complexity?

## What is a good Object-Oriented design for geometry objects when building libraries dealing with geometry operations?

I am trying to design an object-oriented library to handle geometric calculations. However, I am trying to exaggerate on being “tightly” object-oriented and applying relevant “best practices”. I know that there is no reason to be dogmatic about patterns, I am just trying to squeeze out every bit of possibility that I have not found out any different way for what I am about to ask.

Consider trying to design a path, composed of segments. I consider having a common interface for segments, which offers a method to calculate points of intersection with another segment. In short, the tentative signature of such a method might look like:

``abstract class Segment {     ...      Point[] Intersection(Segment other);      ... } ``

However, when implementing such a method, it might be necessary to check what actual implementation lies behind the “other” Segment. This can be done through run-time type checks, given that the language supports it, otherwise, I have to use some kind of `enum` to differentiate between segments and, potentially, cast the object to call corresponding methods. In any case, I cannot “extract” some common interface for this kind of design.

I have considered “forcingly” establishing a base-assumption that all path segments boil down to sequences of points, and unify the algorithmic intersection process as being always a line-to-line intersection between all sub-segments, but this will rob the design of a very significant (in my opinion) optimization possibility. Considering the ubiquity and “temporal density” of geometry-based operations the library will be built to support, it is very important, in terms of performance, to take advantage of special shapes having “closed forms” to calculate intersections between them (such as a circular arc with a line), instead of testing a multitude of small line-segment pairs to identify intersection points.

Apart from that, if I make the simplifying assumption of paths consisting of path sequences, I will have to make another relatively pervasive (for such a library) design choice, that of point density, to trace, for example, the various segment types. This would be, in my opinion, a reasonably architecturally-relevant parameter when considering an end result of drawing, e.g. on-screen, in order to achieve a given level of smoothness, for example. However, I feel this is, conceptually” an unsuitable abstraction for the geometric operations between pure geometric abstractions. A circle is not a series of line segments and should not need 5,10 or 100 coordinate pairs to be represented. It is just a center and a radius.

My question is, is there any other way to be object-oriented when dealing with base classes for geometry entities, or the “usual” way of doing things is with an enumeration and implementations checking segment type and exploiting specific geometric relations to potentially optimize the procedures?

The reason I am giving so much thought on this is that I might find myself having to implement special segment types, such as, for example, parametric curves, in the future, or simply allow extension of the API outside of the API itself. If I use the enum-based, type-checked everything-with-everything intersection tests (and do so also for other spatial operations between segments besides intersection), “outsider” extensions of the API will have to “register” themselves in the segment-types enumeration, which would necessitate either changing and rebuilding the original API, or providing a mechanism for external registrations. Or simply make a true global capture of all possible segment geometric forms to account for everything.

To make it simple, assume that I implement this only with segments, and then I add a new implementation of a circular arc. I can “teach” the circular arc how to intersect itself with straight line segments (by checking the segment type for “line”, for example), but I can not “teach” the line segment how to intersect itself with arcs without going back to change the original library.

I understand that there are methods or techniques to provide all this flexibility (I could make segments register special “injected” intersection methods for specific identifiers, which would be determined by the external API extension objects, so that lines will first check whether the object they intersect with is such a “special” type, or simply make intersection methods virtual, so that the developer trying to extend my API will be able to manually “teach” all existing segment implementations how to intersect themselves with my original objects). All I am asking is, is there any other elegant way to tackle this situation?

The top-voted answer to this question suggests excluding the method entirely and delegating it to a different class. This does sound somewhat counter-intuitive, given that segments do know their geometries. However, segments do not know other segments’ geometries, so it appears to be reasonable design decision to “outsource” the intersection method, one that still necessitates knowing the segment type at run-time, however. Since I am trying to represent segments as interfaces “ignorant” of the underlying type (as in “I want to support the use of the segment interface as being ignorant of the underlying implementation”). Apart from that, I would not resort to empty marker interfaces to differentiate between classes. An external “intersector”-like class would look interesting, though I would avoid making it static, in order to allow for extensions and potential changes of strategy (different implementations, optimizing for speed, employing snapping, etc).

## Finding paths between triangles efficiently in 3D geometry #2

This post is an update of the one from here. I’ve updated the code and a couple pieces of the post itself.

I’ve been writing some functions used to find paths between two types of triangles – alphas and betas. Alphas are triangles that have been in a zone we consider important, have an “interesting” value above a given threshold, and are “active”. Betas are essentially anything that isn’t an Alpha.

The position of the zone and the geometry of the model can change between invocations. Thus, both alphas and betas change almost every invocation to some extent. This requires a complete re-computation of the paths between them.

This is written in C++03, compiled into a MEX file (.mexa64) to be executed by MATLAB R2016B on a Linux machine. Those are all hard limits.

This code uses a good deal of functions and data from an external libraries and objects. However, most of the methods used are very simple array lookups, nothing performance-hindering.

Everything works correctly so far in testing, but performance has become a significant problem.

The code:

``// Doxygen block goes here  // Various includes  // Only needed because ultimately the MATLAB script needs an error code, not a // C++ exception #define SUCCESS 0 #define PTHREAD_ERR 1  typedef std::pair<unsigned int, unsigned int> ABPair;  // Useful for multithreading struct ThreadData {   CRayTracer* rt;   pthread_t threadId;                          // ID returned by pthread_create   unsigned uThreadID;                          // Index   std::vector<ABPair> validPathsThread;        // valid pairs that this thread                                                // found   unsigned int numTris;                        // Input argument, the number of                                                // triangles in the mesh   double distThreshold;                        // Input argument, the maximum                                                // distance between triangles };  // Exception for experimentation class PThreadException: public std::exception {   virtual const char* what() const throw()   {     return "Exception occured in a pthread_attr_init or pthread_create\n";   } };  // Data about each individual tri, could be brought intro a vector of structs // Needed to check if geometry has changed since last invokation std::vector<bool> triActive; // Needed to check if alphas have changed since last invokation std::vector<bool> validAlphaIndex; // Needed to keep history of what tris have ever been in the beam, for alphas std::vector<bool> hasBeenInBeam;  // A "map" from a given face to the element it resides in. Many faces may share // the same element. std::vector<unsigned int> faceToElementMap;  // Not neccesary as a global if it's just getting re-generated each time. // However, if we do decide to append and cull instead of regenerating, this // needs to stay. std::vector<unsigned int> validAlphas;  // All valid paths. Must be maintained, because we don't know if // findPaths() will be called. It may not be if geometry hasnt changed. std::vector<ThermalPair> validPaths; unsigned int prevPathNum = 0;  // Useful everywhere CRTWrapper* rayTracer = NULL; NanoRTWrapper* m_nrt = NULL;  // Function declarations // Not required, but prevents warnings depending on how functions are ordered // and call each other // (Including the mexFunction here would be redundant, as it exists in mex.h) void exitFcn(); bool isTriInZoneRadius(const unsigned int iTri); bool checkForModelChanges(const unsigned int numTris,                           const float* nodeIValues,                           const double iValueThreshold                           ); void initialize(const float* elemFace,                 const unsigned int numElems,                 const unsigned int facePerElMax,                 unsigned int* numTri,                 unsigned int* numFace                 ); void* findPathsThread(void *data); void findPathsThreadSpooler(const unsigned int numTris,                             const double distThreshold                             ); void mapFacesToElements(const float* elemFace,                         const unsigned int numElems,                         const unsigned int facePerElMax                         ); bool checkPairValid(const unsigned int i,                     const unsigned int j,                     const double distThreshold                     ); bool isTriAlpha(const unsigned int iTri,                 const float* nodeIValues,                 const double iValueThreshold                 ); int mainFunc(some args gohere);  /**  * @brief exitFcn - Cleans up malloc'd or calloc'd memory if someone in the  * MATLAB script calls "clear mexFileName" or "clear all". Does nothing ATM.  */ void exitFcn() {   // mexPrintf("exitFcn() called\n");   // Does nothing right now, since I don't malloc/calloc anything }  /**  * @brief Checks if a given tri is currently in the zone's external radius.  * @param iTri - The index of the triangle to check  * @return True if in the radius, false if not  */ bool isTriInZoneRadius(const unsigned int iTri) {   // Omitted, relies on some funky external stuff that'd be hard to explain   // hasBeenInZone[] gets updated here }  /**  * @brief Checks if the model has changed (either in terms of alphas or  * geometry) and re-generates the vector of alphas  * @param numTris -       The number of triangles in the mesh  * @param nodeIValues -   The iValue at each node  * @param iValueThreshold - The iValue threshold beyond which an alpha  * is interesting enough to be valid  * @return True if the list of alphas or the geometry has changed, false if  * neither have  */ bool checkForModelChanges(const unsigned int numTris,                           const float* nodeIValues,                           const double iValueThreshold                           ) {   bool modelChanged = false;   bool isAlpha;   bool currentlyActive;    // Two checks need to happen - geometry changes and for the list of valid   // alphas to change   // Also regenerates the vector of valid alphas from scratch as it goes    for(unsigned int i = 0; i < numTris; i++)   {     // Active means it has 1 exposed face, not 2 (internal) or 0 (eroded)     currentlyActive = m_nrt->getTriActive(i);      // Has the geometry changed?     if(currentlyActive != triActive[i])     {       modelChanged = true;       triActive[i] = currentlyActive;     }      // Get whether this triangle is an alpha:     isAlpha = isTriAlpha(i, nodeIValues, iValueThreshold);      // Triangle is a valid alpha now, but wasn't before     if((isAlpha == true) && (validAlphaIndex[i] == false))     {       validAlphaIndex[i] = true;       modelChanged = true;     }     // Was valid before, is no longer valid now     else if((isAlpha == false) && (validAlphaIndex[i] == true))     {       validAlphaIndex[i] = false;       modelChanged = true;       //cullalphasFlag = true;     }      // Generating the set of all valid alphas     if(isAlpha)     {       validAlphas.push_back(i);     }   }    return modelChanged; }  /**  * @brief Initializes this MEX file for its first run  * @param rt -            A pointer to the raytracer object  * @param numTris -       The total number of triangles in the mesh  * @param numFaces -      The total number of faces in the mesh  * @param elemFace -      The map of elements to the faces that they have  * @param numElems -      The number of elements in the mesh  * @param facePerElMax -  The maximum number of faces per element  */ void initialize(const float* elemFace,                 const unsigned int numElems,                 const unsigned int facePerElMax,                 unsigned int* numTri,                 unsigned int* numFace                 ) {   // Fetch number of tris and faces   // Must be done every time, since we're storing locally and not globally   // However:   // They're never modified   // They never change between calls to rtThermalCalculate()   // They're used frequently in many functions   // I think that's a strong candidate for being a global    unsigned int numTris = m_nrt->getTriCount();   *numTri = numTris;    unsigned int numFaces = m_nrt->getFaceCount();   *numFace = numFaces;    /*    * Allocate some space for things we need to be persistent between runs of    * this MEX file.    */   if(triActive.empty())   {     triActive.resize(numTris, false);   }   if(hasBeenInZone.empty())   {     hasBeenInZone.resize(numTris, false);   }   if(validAlphaIndex.empty())   {     validAlphaIndex.resize(numTris, false);   }   if(faceToElementMap.empty())   {     faceToElementMap.resize(numFaces);     mapFacesToElements(elemFace, numElems, facePerElMax);   }    return; }  /**  * @brief Is something that can be used by pthread_create(). Threads will skip  * over some of the work, and do isValidPair on others. Thus...multithreading.  * @param data - The data structure that will hold the results and arguments  */ void* findPathsThread(void *data) {   struct ThreadData* thisThreadsData = static_cast<struct ThreadData*>(data);   const unsigned uThreadID = thisThreadsData->uThreadID;   const unsigned uNumThreads = rayTracer->m_uNumThreads;   const double distThreshold = thisThreadsData->distThreshold;   const unsigned int numTris = thisThreadsData->numTris;   unsigned int validI;    std::vector<ABPair>& validPathsThread = thisThreadsData->validPathsThread;    // Loop over all valid alphas   for(unsigned int i = uThreadID; i < validAlphas.size(); i += uNumThreads)   {     // Get this to avoid needing to index into the array 4 times total     // Each time     validI = validAlphas[i];      // Loop over all triangles (potential betas)     for(unsigned int j = 0; j < numTris; j++)     {       // Do the easy checks first to avoid function call overhead       if(!validAlphaIndex[j] && triActive[j])       {         if(checkPairValid(validI, j, distThreshold))         {           validPathsThread.push_back(std::make_pair(validI, j));         }       }     }   }   return NULL; }  /**  * @brief Uses the raytracer object's current state as well as arguments to  * generate pairs of unobstructed paths between alphas and betas. Creates  * as many threads as the system has available, and then uses pthread_create()  * to dish out the work of findPaths()  * @param numTris - The number of triangles in the mesh  * @param distThreshold - The maximum distance an alpha and beta can be  * apart  */ void findPathsThreadSpooler(const unsigned int numTris,                             const double distThreshold                             ) {   std::vector<ThreadData> threadData(rayTracer->m_nProc);   pthread_attr_t attr;   int rc;    // I think this is checking to make sure something doesn't already exist,   // not sure what though   if((rc = pthread_attr_init(&attr)))   {     throw PThreadException();   }    // We know how many threads the system supports   // So all this does is walk through an array of them and start them up   for(unsigned uThread = 0; uThread < rayTracer->m_uNumThreads; uThread++)   {     ThreadData& data = threadData[uThread];      data.rt = rayTracer;     data.uThreadID = uThread;     data.numTris = numTris;     data.distThreshold = distThreshold;      if(rayTracer->m_uNumThreads > 1)     {       if((rc = pthread_create(&data.threadId, &attr, &findPathsThread, &data)))       {         throw PThreadException();       }     }     else     {       findPathsThread(&data);     }   }    // Join all threads   for(unsigned uThread = 0; uThread < rayTracer->m_uNumThreads; uThread++)   {     std::vector<ABPair>& validPathsThread =         threadData[uThread].validPathsThread;      if(rayTracer->m_uNumThreads > 1)     {       void* res;        if((rc = pthread_join(threadData[uThread].threadId, &res)))       {         throw PThreadException();       }     }      // validPathsThread is the set of ABPairs that this thread found     // while validPaths is the globally maintained set of valid paths     // Take each thread's results and merge it into the overall results     validPaths.insert(validPaths.end(),                       validPathsThread.begin(),                       validPathsThread.end());   }    // Useful for preallocation next time   prevPathNum = validPaths.size();    return; }   /* void cullalphas() {   for(unsigned int i = 0; i < validAlphas.size(); i++)   {     if(!isValidalpha(validAlphas[i]))     {       validAlphas.erase(i);     }   } } */  /**  * @brief Determines the elements that each face belongs to  * @details the MATLAB script maintains a map of all faces per element.  * This is the opposite of what we want. Accessing it linearly  * walks by column, not by row. Also, MATLAB stores everything 1-indexed.  * Finally, the MATLAB script left them stored as the default, which are  * singles.  * @param elemFace - A MATLAB facePerElMax by numElems array, storing which  * faces belong to each element (elements being the row number)  * @param numElems - The total number of elements (rows) in the array  * @param facePerElMax - The max number of faces per element (the number of  * columns)  */ void mapFacesToElements(const float* elemFace,                         const unsigned int numElems,                         const unsigned int facePerElMax                         ) {   unsigned int i;    // elemFace[0] = 1. We don't know how elemFace will be structured precisely,   // so we need to keep going until we find a face in it that equals our number   // of faces, since it's 1-indexed.   for(i = 0; i < (numElems * facePerElMax); i++)   {     faceToElementMap[static_cast<unsigned int>(elemFace[i]) - 1] =         (i % numElems);      // Is the next face for that element a NaN? If so, we can skip it. Keep     // skipping until the next element WON'T be NaN.     // Don't cast here, as NaN only exists for floating point numbers,     // not integers.     while(((i + 1) < (numElems * facePerElMax)) && isnan(elemFace[i + 1]))     {       i++;     }   } }  /**  * @brief checkPairValid - Checks if a pair of an alpha index  * (of validAlphas), beta index form a valid path  * @param i -             Index into validAlphas  * @param j -             Index into all tris (potential beta)  * @param distThreshold - The max distance the tri's centers can be apart  * @return Whether the pair forms a valid path  */ bool checkPairValid(const unsigned int i,                     const unsigned int j,                     const double distThreshold                     ) {   double pathDist;   double alphaCoords[3];   double betaCoords[3];   nanort::Ray<double> ray;    alphaCoords[0] = rayTracer->m_vecTriFixedInfo[i].center.x();   alphaCoords[1] = rayTracer->m_vecTriFixedInfo[i].center.y();   alphaCoords[2] = rayTracer->m_vecTriFixedInfo[i].center.z();    betaCoords[0] = rayTracer->m_vecTriFixedInfo[j].center.x();   betaCoords[1] = rayTracer->m_vecTriFixedInfo[j].center.y();   betaCoords[2] = rayTracer->m_vecTriFixedInfo[j].center.z();    // Determine distance squared between alpha and beta   // (x2-x1)^2 + (y2-y1)^2 +(z2-z1)^2   pathDist = sqrt(pow((betaCoords[0] - alphaCoords[0]), 2)                 + pow((betaCoords[1] - alphaCoords[1]), 2)                 + pow((betaCoords[2] - alphaCoords[2]), 2));    // Doing this instead of doing the sqrt to save doing the sqrt when not   // needed for performance   if(pathDist < distThreshold)   {     // Set up a nanort::Ray's origin, direction, and max distance     ray.org[0] = alphaCoords[0]; // x     ray.org[1] = alphaCoords[1]; // y     ray.org[2] = alphaCoords[2]; // z      ray.dir[0] = (betaCoords[0] - alphaCoords[0]) / pathDist;     ray.dir[1] = (betaCoords[1] - alphaCoords[1]) / pathDist;     ray.dir[2] = (betaCoords[2] - alphaCoords[2]) / pathDist;      // TODO: Subtract some EPSILON here so it doesn't report a hit because it     // hit the beta itself (assuming that's how it works)     ray.max_t = pathDist;      // Call CNmg::ShootRay()'s third form to check if there is a path     if(!(m_nrt->shootRay(ray)))     {       return true;     }     else     {       // There's no path       return false;     }   }   else   {     // The distance is too far between alpha and beta     return false;   } }  /**  * @brief Determines if a given triangle is a valid alpha.  * @param iTri - The triangle index to check  * @return True if it is an alpha, false if it is not  */ bool isTriAlpha(const unsigned int iTri,                 const float* nodeIValues,                 const double iValueThreshold                 ) {   double triAvgIValue;   const unsigned int* triNodes;    // Do the simple checks first, as it's more performant to do so   // alternate consideration for accuracy   //if(triActive[iTri] && (hasBeenAlpha[iTri] || isTriInZoneRadius(iTri)))   if(triActive[iTri] && (hasBeenInZone[iTri] || isTriInZoneRadius(iTri)))   {     // Retrieve the average iValue of this triangle     triNodes = m_nrt->getTriNodes(iTri);      triAvgIValue = (nodeIValues[triNodes[0]]                   + nodeIValues[triNodes[1]]                   + nodeIValues[triNodes[2]]) / 3;      if(triAvgIValue > iValueThreshold)     {       return true;     }   }    return false; }  // Doxygen block, omitted int mainFunc(args) {   // Some local vars, omitted    // Initialize the program if we're on a first run   initialize(elemFace, numElems, facePerElMax, &numTris, &numFaces);    // Need to see if we need to call findPaths   if(checkForModelChanges(numTris, nodeIValues, iValueThreshold))   {     validPaths.clear();     validPaths.reserve(prevPathNum);      try     {       findPathsThreadSpooler(numTris, distThreshold);     }     catch(PThreadException& e)     {       return PTHREAD_ERR;     }   }    // Loop over all valid paths, use them to do some more calculations..(omitted)   // This takes up hundreds of the time findPaths() takes    // Clear vector of valid alphas, it'll be re-generated from scratch each time   validAlphas.clear() }  // Doxygen block goes here, omitted, specific code also omitted as it's // irrelevant void mexFunction(int nlhs,                  mxArray *plhs[],                  int nrhs,                  const mxArray *prhs[]                  ) {   // Register exit function    // Prep for writing out results    // Checking to make sure # of arguments was right from MATLAB    // Input argument handling to convert from mxArrays to double*, float*, etc    // *errcode = mainFunc(some args)    // retrieve execution time in clock cycles, convert to seconds, print    // Put the outputs in plhs } ``

Callgraph(?):

This isn’t exactly a callgraph, but it might be useful to get an idea of the flow of the program.

The Problem: Performance

For medium-size models (104k tris, 204k faces, 51k elems) it can take up to a couple seconds for this to complete, even though the worst of it is multi-threaded on a powerful 4C/8T machine. (roughly 100*104k size loop)

For any models where the number of alphas is very large (50K) it can take up to three minutes for a single execution to complete because of how large that double-nested for loop must become. (50k^2 size loop)

Pushing the list of betas onto their own vector can help in cases like that, but seems to significantly hurt performance of more normal cases.

Possible optimizations:

• Creating a sphere around all alphas to use in culling betas that are outside of the range of any alpha could potentially provide benefit, but it’s an O(alphas^2) operation, and its benefit is extremely variable on the geometry.

• Creating a vector of Betas and pushing onto it as the alphas are also created seems to only benefit extreme edge cases like the 50k alpha case. In more “normal” cases of small numbers of alphas, it seems to hurt performance significantly.

• Adding to the list of valid alphas and culling it rather than re-building it each time may be an option, however, this will again be dependent on what % are alphas in the geometry.

• As well, it’s possible something can be done with nanoRT’s BVHs, but I’m not very familiar with BVH’s or what they’d let me do in this

Note: How it’s being used:

The MATLAB script will likely call this many times. In small models, it may finish its own loop within tenths of a second and then call ours again. In larger ones, there may be half a second between calls. In total, this may be called hundreds of times.

Note: How it’s being built:

This isn’t built using the `MEX` command in MATLAB, nor by using Visual Studio. Instead, g++ is used to create an object file (.o) and then g++ is used again to create the .mexw64 file in a method I’m not entirely familiar with. (This is also a hard limit I cannot touch)

I occasionally compiled with very aggressive warnings enabled to catch things like sign conversion, promotions, bad casts, etc.

Profiling:

I would love to be able to profile this code more in depth. However, it seems impossible. MEX files built using `MEX` command in MATLAB can be done. MEX files compiled in Visual Studio can be profiled. But we’re not doing either of those, and so when I try to profile with either MATLAB or Visual Studio, it just doesn’t work.

Even if I could, I don’t think it would reveal anything surprising. The numbers we’re working with are large, so the double-nested loop at the core of it grows very large.

I can (and do) measure per-invocation performance and total runtime after the MATLAB script completes. This is mostly stable, ~1% std dev in runtimes.

Final Note:

While performance is my most major concern, style improvements are always welcome. I’m more familiar with C than C++, and that bleeds into my code sometimes.

## Finding paths between triangles efficiently in 3D geometry

I’ve been writing some functions used to find paths between two types of triangles – alphas and betas. Alphas are triangles that have been in a zone we consider important, have an “interesting” value above a given threshold, and are “active”. Betas are essentially anything that isn’t an Alpha.

The position of the zone and the geometry of the model can change between invocations.

This is written in C++03, compiled into a MEX file (.mexw64) to be executed by MATLAB R2016B on a Linux machine. Those are all hard limits.

This code uses a good deal of functions and data from an external libraries and objects. However, most of the methods used are very simple array lookups, nothing performance-hindering.

Everything works correctly so far in testing, but performance has become a significant problem.

The code:

``// Doxygen block exists here  // Various includes go here  // Only needed because ultimately MATLAB needs an error code, not a C++ // exception #define SUCCESS 0 #define DYN_ALLOC_ERR 1 #define PTHREAD_ERR 2  /*  * Design notes: I considered having a modified version of the geometry checking  * section that just pushed _new_ alphas onto the global vector of alphas  * and set a flag to cull old ones, but that actually seemed significantly less  * efficient than just making a new one each time since all the same checks need  * to be made regardless however, if validAlphas gets very large, this could  * be highly inefficient due to push_back, so the idea of pushing on  * just new ones and culling may actually be the best solution.  *  * Also: Could maintain a sum of validAlphas and use that to re-size it upon  * re-creation to save some overhead from resizing, maybe?  */  // The indices of two triangles who have a valid alpha and a beta that can // be seen from it. 120k paths * 8 bytes per pair = ~1GB on the heap. // No, ushort won't be enough here. struct ABPair {     unsigned int alphaTriIndex;     unsigned int betaTriIndex; };  // Useful for multithreading, stolen and modified from craytracer.h struct ThreadData {   CRayTracer* rt;   pthread_t threadId;                     // ID returned by pthread_create   unsigned uThreadID;                     // Index   std::vector<ABPair*> validPathsThread;  // valid pairs that this thread                                           // found   unsigned int numTris;                   // Input argument, the number of                                           // triangles in the mesh   double distThreshold;                   // Input argument, the maximum                                           // distance between triangles };  // Exceptions for experimentation class PThreadException: public std::exception {   virtual const char* what() const throw()   {     return "Exception occured in a pthread_attr_init or pthread_create\n";   } };  class DynAllocationException: public std::exception {   virtual const char* what() const throw()   {     return "Exception occured when attempting to malloc or calloc\n";   } };  // Note: Globals must exist here so that when the MEX file exits and goes // Back to MATLAB, this information is maintained. // (AFAIK, mexMakeMemoryPersistant() wouldn't make sense here. I can link // to discussions on that topic)  // An indicator for every tri to tell if it has been removed (neccesary for // maintaining previous state to check if we need to call findPaths()) static bool* triActive = NULL;  // A map from a given face to the element it resides in static unsigned int* faceToElementMap = NULL;  // All valid paths. Must be maintained, because we don't know if // findPaths() will be called. It may not be if geometry hasnt changed. static std::vector<ABPair*> validPaths;  // The previous state of what alphas were considered valid. Neccesary to see // if a change has occured that isn't geometry-based. static unsigned int* validAlphaIndex;  // Not neccesary as a global if it's just getting re-generated each time. // However, if we do decide to append and cull instead of regenerating, this // needs to stay. static std::vector<unsigned int> validAlphas;  // Needed so we can accurately determine alphas. // I removed this in the past, thinking it wasn't needed. As it turns out, // it's absolutely needed and very helpful. static bool* hasBeenInZone;  // Useful everywhere CRayTracerClass* rayTracer = NULL; NanoRTWrapperClass nanoRTWrapper = NULL;  // Function declarations // Not required, but prevents warnings depending on how functions are ordered // and call each other // (Including the mexFunction here would be redundant, as it exists in mex.h) void exitFcn(); bool isTriInZoneRadius(const unsigned int itri); bool checkForModelChanges(const unsigned int numTris,                           const float* iValues,                           const double iThreshold                           ); void initialize(const float* elemFace,                 const unsigned int numElems,                 const unsigned int facePerElMax,                 unsigned int* numTri,                 unsigned int* numFace                 ); void* findPathsThread(void *data); void findPathsThreadSpooler(const unsigned int numTris,                             const double distThreshold                             ); void mapFacesToElements(const float* elemFace,                         const unsigned int numElems,                         const unsigned int facePerElMax                         ); bool checkPairValid(const unsigned int i,                     const unsigned int j,                     const double distThreshold                     ); bool isTriAlpha(const unsigned int itri,                 const float* iValues,                 const double iThreshold                 ); void findPaths(const unsigned int numTris,                const double distThreshold                ); //mainfunc declaration goes here  /**  * @brief exitFcn - Cleans up malloc'd or calloc'd memory if someone in the  * MATLAB script calls "clear mexfilename" or "clear all".  */ void exitFcn() {   //mexPrintf("exitFcn() called\n");    if(triActive)   {     free(triActive);   }   if(faceToElementMap)   {     free(faceToElementMap);   }   if(validAlphaIndex)   {     free(validAlphaIndex);   }   if(hasBeenInZone)   {     free(hasBeenInZone);   }   for(unsigned int i = 0; i < validPaths.size(); i++)   {     free(validPaths[i]);   } }  /**  * @brief Checks if a given tri is currently in the zone's external radius.  * Implementation stolen from CRayTracerClass::trace_inverseray_tri  * Not sure if we need to raytrace, so I omitted it  * @param itri - The index of the triangle to check  * @return True if in the radius, false if not  */ bool isTriInZoneRadius(const unsigned int itri) {   //Omitted }  /**  * @brief Checks if the model has changed (either in terms of alphas or  * geometry) and re-generates the vector of alphas  * @param numTris -     The number of triangles in the finite mesh  * @param iValues -     The ivalue at each node  * @param iThreshold -  The interesting value threshold beyond which an alpha  * is interesting enough to be valid  * @return True if the list of alphas or the geometry has changed, false if  * neither have  */ bool checkForModelChanges(const unsigned int numTris,                           const float* iValues,                           const double iThreshold                           ) {   bool modelChanged = false;   bool isAlpha;   bool currentlyActive;    // Two checks need to happen - geometry changes and for the list of valid   // alphas to change   // Also regenerates the vector of valid alphas from scratch as it goes    for(unsigned int i = 0; i < numTris; i++)   {     // Active means it has 1 exposed face, not 2 (internal) or 0 (gone)     currentlyActive = nanoRTWrapper->getTriActive(i);      // Has the geometry changed?     if(currentlyActive != triActive[i])     {       modelChanged = true;       triActive[i] = currentlyActive;     }      // Get whether this triangle is an alpha:     isAlpha = isTriAlpha(i, iValues, iThreshold);      // Triangle is a valid alpha now, but wasn't before     if((isAlpha == true) && (validAlphaIndex[i] == false))     {       validAlphaIndex[i] = true;       modelChanged = true;     }     // Was valid before, is no longer valid now     else if((isAlpha == false) && (validAlphaIndex[i] == true))     {       validAlphaIndex[i] = false;       modelChanged = true;       //cullalphasFlag = true;     }      // Generating the set of all valid alphas     if(isAlpha)     {       validAlphas.push_back(i);     }   }    return modelChanged; }  /**  * @brief Initializes this MEX file for its first run  * @param rt -            A pointer to the raytracer object  * @param numTris -       The total number of triangles in the finite mesh  * @param numFaces -      The total number of faces in the finite mesh  * @param elemFace -      The map of elements to the faces that they have  * @param numElems -      The number of elements in the finite mesh  * @param facePerElMax -  The maximum number of faces per element  */ void initialize(const float* elemFace,                 const unsigned int numElems,                 const unsigned int facePerElMax,                 unsigned int* numTri,                 unsigned int* numFace                 ) {   DynAllocationException e;    // Fetch number of tris and faces   // Must be done every time, since we're storing locally and not globally   // However:   // They're never modified   // They never change between calls from the MATLAB script   // They're used frequently in many functions   // I think that's a strong candidate for being a global    unsigned int numTris = nanoRTWrapper->getTriCount();   *numTri = numTris;    unsigned int numFaces = nanoRTWrapper->getFaceCount();   *numFace = numFaces;    /*    * Allocate some space for things we need to be persistent between runs of    * this MEX file. And check that the allocation succeeded, of course.    */   if(triActive == NULL)   {     if(NULL ==        (triActive =         static_cast<bool*>(calloc(numTris, sizeof(bool))))        )     {       throw e;     }   }   if(hasBeenInZone == NULL)   {     if(NULL ==        (hasBeenInZone =         static_cast<bool*>(calloc(numTris, sizeof(bool))))        )     {       throw e;     }   }   if(validAlphaIndex == NULL)   {     if(NULL ==        (validAlphaIndex =         static_cast<unsigned int*>(calloc(numTris, sizeof(unsigned int))))        )     {       throw e;     }   }   if(faceToElementMap == NULL)   {     if(NULL ==        (faceToElementMap =         static_cast<unsigned int*>(calloc(numFaces, sizeof(unsigned int))))        )     {       throw e;     }     mapFacesToElements(elemFace, numElems, facePerElMax);   }    return; }  /**  * @brief Is something that can be used by pthread_create(). Threads will skip  * over some of the work, and do isValidPair on others. Thus...multithreading.  * @param data - The data structure that will hold the results and arguments  */ void* findPathsThread(void *data) {   struct ThreadData* thisThreadsData = static_cast<struct ThreadData*>(data);   const unsigned uThreadID = thisThreadsData->uThreadID;   const unsigned uNumThreads = rayTracer->m_uNumThreads;   const double distThreshold = thisThreadsData->distThreshold;   const unsigned int numTris = thisThreadsData->numTris;    std::vector<ABPair*>& validPathsThread = thisThreadsData->validPathsThread;    // Loop over all valid alphas   for(unsigned int i = 0; i < validAlphas.size(); i++)   {     if ((i % uNumThreads) == uThreadID)     {       // Loop over all triangles (potential betas)       for(unsigned int j = 0; j < numTris; j++)       {         if(checkPairValid(i, j, distThreshold))         {           ABPair* temp =               static_cast<ABPair*>(malloc(sizeof(ABPair)));            temp->alphaTriIndex = validAlphas[i];           temp->betaTriIndex = j;            validPathsThread.push_back(temp);         }       }     }   }   return NULL; }  /**  * @brief Creates as many threads as the system has available, and then uses  * pthread_create() to dish out the work of findPaths()  * @param numTris - The number of triangles in the finite mesh  * @param distThreshold - The maximum distance an alpha and beta can be  * apart  */ void findPathsThreadSpooler(const unsigned int numTris,                             const double distThreshold                             ) {   std::vector<ThreadData> threadData(rayTracer->m_nProc);   pthread_attr_t attr;   int rc;   PThreadException e;    // I think this is checking to make sure something doesn't already exist,   // not sure what though   if((rc = pthread_attr_init(&attr)))   {     throw e;   }    // We know how many threads the system supports   // So all this does is walk through an array of them and start them up   for(unsigned uThread = 0; uThread < rayTracer->m_uNumThreads; uThread++)   {     ThreadData& data = threadData[uThread];      data.rt = rayTracer;     data.uThreadID = uThread;     data.numTris = numTris;     data.distThreshold = distThreshold;      if(rayTracer->m_uNumThreads > 1)     {       if((rc = pthread_create(&data.threadId, &attr, &findPathsThread, &data)))       {         throw e;       }     }     else     {       findPathsThread(&data);     }   }    // Join all threads   for(unsigned uThread = 0; uThread < rayTracer->m_uNumThreads; uThread++)   {     std::vector<ABPair*>& validPathsThread =         threadData[uThread].validPathsThread;      if(rayTracer->m_uNumThreads > 1)     {       void* res;        if((rc = pthread_join(threadData[uThread].threadId, &res)))       {         throw e;       }     }      // validPathsThread is the set of ABPairs that this thread found     // while validPaths is the globally maintained set of valid paths     // Take each thread's results and merge it into the overall results     validPaths.insert(validPaths.end(),                       validPathsThread.begin(),                       validPathsThread.end());   }    return; }   /* void cullAlphas() {   for(unsigned int i = 0; i < validAlphas.size(); i++)   {     if(!isValidAlpha(validAlphas[i]))     {       validAlphas.erase(i);     }   } } */  /**  * @brief Determines the elements that each face belongs to  * @details the MATLAB script maintains a map of all faces per element.  * This is the opposite of what we want. Accessing it linearly  * walks by column, not by row. Also, MATLAB stores everything 1-indexed.  * Finally, the MATLAB script left them stored as the default, which are singles.  * @param elemFace - A MATLAB facePerElMax by numElems array, storing which  * faces belong to each element (elements being the row number)  * @param numElems - The total number of elements (rows) in the array  * @param facePerElMax - The max number of faces per element (the number of  * columns)  */ void mapFacesToElements(const float* elemFace,                         const unsigned int numElems,                         const unsigned int facePerElMax                         ) {   unsigned int i;    // elemFace[0] = 1. We don't know how elemFace will be structured precisely,   // so we need to keep going until we find a face in it that equals our number   // of faces, since it's 1-indexed.   for(i = 0; i < (numElems * facePerElMax); i++)   {     faceToElementMap[static_cast<unsigned int>(elemFace[i]) - 1] =         (i % numElems);      // Is the next face for that element a NaN? If so, we can skip it. Keep     // skipping until the next element WON'T be NaN.     // Don't cast here, as NaN only exists for floating point numbers,     // not integers.     while(isnan(elemFace[i + 1]) && ((i + 1) < (numElems * facePerElMax)))     {       i++;     }   } }  /**  * @brief checkPairValid - Checks if a pair of an alpha index (of validAlphas),  * beta index form a valid path  * @param i -             Index into validAlphas  * @param j -             Index into all tris (potential beta)  * @param distThreshold - The max distance the tri's centers can be apart  * @return Whether the pair forms a valid path  */ bool checkPairValid(const unsigned int i,                     const unsigned int j,                     const double distThreshold                     ) {   double path_dist_sqrd;   double path_dist;   double alphaCoords[3];   double betaCoords[3];   nanort::Ray<double> ray;    // If they're not an alpha currently, they must be a potential beta,   // must also be alive   if(!validAlphaIndex[j] && triActive[j])   {     alphaCoords[0] =         rayTracer->m_vecTriFixedInfo[validAlphas[i]].center.x();     alphaCoords[1] =         rayTracer->m_vecTriFixedInfo[validAlphas[i]].center.y();     alphaCoords[2] =         rayTracer->m_vecTriFixedInfo[validAlphas[i]].center.z();      betaCoords[0] = rayTracer->m_vecTriFixedInfo[j].center.x();     betaCoords[1] = rayTracer->m_vecTriFixedInfo[j].center.y();     betaCoords[2] = rayTracer->m_vecTriFixedInfo[j].center.z();      // Determine distance squared between alpha and beta     // (x2-x1)^2 + (y2-y1)^2 +(z2-z1)^2     path_dist_sqrd = pow((betaCoords[0] - alphaCoords[0]), 2)                    + pow((betaCoords[1] - alphaCoords[1]), 2)                    + pow((betaCoords[2] - alphaCoords[2]), 2);      // Doing this instead of doing the sqrt to save doing the sqrt when not     // needed for performance     if(path_dist_sqrd <= pow(distThreshold, 2))     {       path_dist = sqrt(path_dist_sqrd);        // Set up a nanort::Ray's origin, direction, and max distance       ray.org[0] = alphaCoords[0]; // x       ray.org[1] = alphaCoords[1]; // y       ray.org[2] = alphaCoords[2]; // z        ray.dir[0] = (betaCoords[0] - alphaCoords[0]) / path_dist;       ray.dir[1] = (betaCoords[1] - alphaCoords[1]) / path_dist;       ray.dir[2] = (betaCoords[2] - alphaCoords[2]) / path_dist;        // TODO: Subtract some EPSILON here so it doesn't report a hit because it       // hit the beta itself (assuming that's how it works)       ray.max_t = path_dist;        // Call ShootRay() to check if there is a path (calls nanoRT)       if(!(nanoRTWrapper->shootRay(ray)))       {         return true;       }       else       {         // There's no path         return false;       }     }     else     {       // The distance is too far between alpha and beta       return false;     }   }   else   {     // The beta is either dead or currently an alpha     return false;   } }  /**  * @brief Determines if a given triangle is a valid alpha.  * @param itri - The triangle index to check  * @return True if it is an alpha, false if it is not  */ bool isTriAlpha(const unsigned int itri,                 const float* iValues,                 const double iThreshold                 ) {   double tri_avg_interesting;   const unsigned int* tri_nodes;    // Do the simple checks first, as it's more performant to do so   // alternate consideration (acccuracy, wouldn't affect performance)   //if(triActive[itri] && (hasBeenAlpha[itri] || isTriInZoneRadius(itri)))   if(triActive[itri] && (hasBeenInZone[itri] || isTriInZoneRadius(itri)))   {     // Retrieve the average iValue of this triangle     tri_nodes = nanoRTWrapper->getTriNodes(itri);      tri_avg_interesting = (iValues[tri_nodes[0]]                          + iValues[tri_nodes[1]]                          + iValues[tri_nodes[2]]) / 3;      if(tri_avg_interesting > iThreshold)     {       return true;     }   }    return false; }  /**  * @brief Uses the raytracer object's current state as well as arguments to  * generate pairs of unobstructed paths between alphas and betas.  * @param numTris - The number of triangles in the finite element mesh  * @param distThreshold - The max distance an alpha and beta pair can be  * apart before not being considered in calculations  */ void findPaths(const unsigned int numTris,                const double distThreshold                ) {   // This function once held more importance, but yes, it could be omitted   // at this point.    // Spool up some threads to take care of the work   try   {     findPathsThreadSpooler(numTris, distThreshold);   }   catch(DynAllocationException& e)   {     throw e;   }    return; }  // Doxygen header (omitted) int mainFunc(args) {   // Initialize the program if we're on a first run   try   {     initialize(elemFace, numElems, facePerElMax, &numTris, &numFaces);   }   catch(DynAllocationException& e)   {     return DYN_ALLOC_ERR;   }    // Need to see if we need to call findPaths   if(checkForModelChanges(numTris, iValues, iThreshold))   {     // Remove old list of valid paths     for(unsigned int i = 0; i < validPaths.size(); i++)     {       free(validPaths[i]);     }     validPaths.clear();      try     {       findPaths(numTris,                 distThreshold);     }     catch(PThreadException& e)     {       return PTHREAD_ERR;     }   }    //mexPrintf("Number of valid paths: %d\n", validPaths.size());    /*    * Walk over all paths and do some calculations    */    return SUCCESS; }  // Doxygen block goes here, omitted void mexFunction(int nlhs,                  mxArray *plhs[],                  int nrhs,                  const mxArray *prhs[]                  ) {   // Register exit function   mexAtExit(exitFcn);    // Prep for writing out results    // Checking to make sure # of arguments was right from MATLAB    // Input argument handling to convert from mxArrays to double*, float*, etc    //*errcode = mainFunc(some args)    // retrieve execution time in clock cycles, convert to seconds, print    // Put the outputs in plhs } ``

Callgraph(?):

This isn’t exactly a callgraph, but it might be useful to get an idea of the flow of the program.

The Problem: Performance

For medium-size models (104k tris, 204k faces, 51k elems) it can take up to a couple seconds for this to complete, even though the worst of it is multi-threaded on a powerful 4C/8T machine. (roughly 100*104k size loop)

For any models where the number of alphas is very large (50K) it can take up to three minutes for a single execution to complete because of how large that double-nested for loop must become. (50k^2 size loop)

Possible optimizations:

An optimization that may be worth considering is creating a sphere around all alphas, and use its center and radius to cull the remaining triangles down to a smaller size based on the threshold distance. However, the benefit of this is extremely variable and may actually be zero for smaller meshes. And to create the sphere, we need to find the two triangle centers that are the furthest apart, an O(alphas^2) operation…very slow if there are many. And there’s no easy way to tell in advance if using this technique would be beneficial for the given state of the mesh, so it’s not easy to toggle on and off as needed.

As well, it’s possible something can be done with nanoRT’s BVHs, but I’m not very familiar with BVH’s or what they’d let me do in this

Note: How it’s being used:

The MATLAB script will likely call this many times. In small models, it may finish its own loop within tenths of a second and then call ours again. In larger ones, there may be half a second between calls. In total, this may be called hundreds of times.

Note: How it’s being built:

This isn’t built using the `MEX` command in MATLAB, nor by using Visual Studio. Instead, g++ is used to create an object file (.o) and then g++ is used again to create the .mexw64 file in a method I’m not entirely familiar with. (This is also a hard limit I cannot touch)

I occasionally compiled with very aggressive warnings enabled to catch things like sign conversion, promotions, bad casts, etc.

Profiling:

I would love to be able to profile this code. However, it seems impossible. MEX files built using `MEX` command in MATLAB can be done. MEX files compiled in Visual Studio can be profiled. But we’re not doing either of those, and so when I try to profile with either MATLAB or Visual Studio, it just doesn’t work.

Even if I could, I don’t think it would reveal anything surprising. The numbers we’re working with are large, so the double-nested loop at the core of it grows very large.

If need be, I could stick a ton of clock() calls around the start/end of every method to get some more info, though that’s not precise. (Line-by-line)

Final Notes:

I’m fresh out of college and this is my first real production code. I’m more familiar with C than C++, so I’m sure that bled into the code. Suggestions for style are always welcome, though performance is the most major concern here.