I’m working on a class that receives a relational EDM/data model (e.g. a SQL DB or OData API manifest) and performs a couple tasks on them (not necessarily at the same time, could be two separate runs). The class/alg doesn’t know anything when it starts, finds everything from the schema data, but the schema doc is static so could be re-processed. What I call “sets” are tables, REST API endpoints, etc., i.e. publicly/user-accessible entry points to the data. One of the things I output is a “path” to each type (e.g. A is a set, so just “A”; B is not a set but is a child of A, so “A→B”). Another is whether anything changes between two models (true/false equality of the model pair).
Unfortunately, the input isn’t necessarily “correct”–some related child types are not available on their own (only through parents), some are missing relations (no ID fields) so I infer based on existence of the property, some have relation loops (A → B → C → A). Some have child collections (1:N) with no mapping type, so I infer that based on the relation prop and invent a mapping type.
Anywho, my current solution uses depth-first and recursion and keeps track of the types it has seen to prevent overflow and false negatives. We’ve found some wacky corner cases (in addition to the above; such as a type set existing, but the type definition itself isn’t in the schema). What I have written works, for what we’ve tested/seen so far, but I am curious if there are other methods by which this sort of traversal or processing might be done, hopefully that could be implemented more simply than what I have now, which gets increasingly ugly and seems more fragile as more weird cases come up.
Generally, what I do now, single-threaded, is: get the list of sets, find outliers that don’t exist in the opposing model, then for each set common between the two models get the type of that set, call
CompareType(t1, t2). This will compare properties on those types between the two models and then descend into each related type. If a child has been seen before or is itself a set, I assume it’s OK (skip/return true) because it gets checked on its own.
This is in C# though I’m not sure that is very important as far as the general design of the algorithm. I was curious if there is a better way to do it, e.g. some sort of token walking like I’ve seen in JSON parsers, or a FSM-like solution, or something like that.