Best practice for modeling data that is both general (default) and entity-specific

I have tried searching for good guidance on this already, but without much luck. Still, apologies in advance if this is duplicated elsewhere.

The Problem

In a nutshell, we have external contractors that work on cases for our clients. We already have tables with contractor and client information in our SQL Server database. Going forward we’d like to store billing info in there too. Billing rates can differ for each client and contractor, but usually each client has a general “default” pay rate that applies to most contractors.

Option A

The initial proposal was to create a new table with the following basic design:

clientContractorPay

  • clientID – foreign key to client table
  • contractorID – foreign key to contractor table
  • basePay – pay rate for this client-contractor combination
  • ... – several more (10+ and likely to grow) columns with supplemental pay rate details
  • A unique index to help optimize lookup and also prevent multiple rows for a given client-contractor combination.

Contractor-specific pay rates would naturally be linked to the relevant contractor (and client). General (default) pay for a client would be stored in a row where contractorID is NULL. This is to avoid having to duplicate the same default pay for all contractors that don’t have specific exceptions.

Option B

However, one of our senior devs has strong reservations about Option A. Their main argument is that using NULL in the contractorID column to mean “this is the default pay rate row” is unintuitive and/or confusing. In other words, it’s bad to assign meaning to NULL values.

Their counter proposal was to duplicate these new pay rate columns in the client table. The data stored there would indicate the default pay for each client, while contractor-specific exceptions would still live in the new table above.

What To Do?

It seems clear both proposals would work just fine, but I have my own reservations about the second. Mainly it seems wrong to store the same type of data (client-contractor pay rate details) in multiple places, not to mention more complex logic to read/write this data. I also don’t like duplicating these new columns in both tables, since it would force us to add any future pay rate columns to both tables.

However, I can see my colleague’s point about potentially misusing NULL in this case. At the very least, it’s not immediately obvious that rows with a NULL contractorID contain default pay rates.

It’s been far too long since my database programming courses, so I’m not sure what the current best practice for this type of entity relationship is? I’m open to whatever is best long term, and would appreciate any expert guidance, especially with links to additional resources.

Thank you in advance!

Modeling a three-way association with optional relation


Business rules

I have three tables (Parties, Categories and Products) which representing the following relationships:

  • A product is classified by zero-one-to-many categories
  • A category classifies zero-one-or-many products

Then, I have the party relationships:

  • A product is classified by one-to-one party
  • A party classifies one-to-many products

In other words, a product doesn’t have to be assigned a category.

Design proposal

I have based my design on the proposal found here, but it’s not entirely applicable since want to enforce party_id for both Products and Categories:

How to model a three-way association that involves Product, Category and Label?

Three-way association design proposal

Question

Is the usage of the three-way association table correct in my proposal to avoid the risk of having the application layer assigning a product to a category without enforcing the party_id?

Threat modeling for visitor access control

I am trying to understand threat modeling but it seems too elasti from restrictive requirements to general requirements.

Now i am trying to understand it with some realistic examples. The first example which comes to my mind is physical access control of an office premise in which visitors have preapproved restrictive access and employees have unrestricted access. Each employee and legitimate visitor is given an id card to prevent this. Any official laptop should not go outside office without permission. Each laptop has an rfid tag to prevent this.

Can somebody help me understand threat model in this example? Or can somebody point me to where similar analysis has been done?

Modeling cutaway prep scenes in Fate

Many crime stories feature a particular kind of foreshadowing and plot twist: Early in the story, you see a character doing something significant, but the scene cuts away before you find out what it is. You don’t find out what happened until the climax, when the character reveals the crazy plan that they set up after the cutaway. There’s often a brief flashback to the missing part of the establishing scene. It’s a staple of heist stories – Leverage does it almost every episode.

How can I model these cutaway prep scenes in Fate?

Cascading One-to-Many relationships modeling

Are there any drawbacks or better alternatives of this non relational model ? I note that it remains easy to understand but the concern is with the code interacting with.

First, as I was introduced to the no-SQL world, in many occasions I confronted a one-to-many relationship between entities. Today, I have a really relatively cascading example that might grow in the future.

Based on functionalities assumptions, I came up with a simple Snowflake model. Specifically a cascading one-to-many relationships with some describing data.

[User] 1 — * [Session] 1 — * [Execution] 1 — * [Report]

The data model as it seems at first is easy to deal with, but I finally found that acting on data using Mongoose (a NodeJS library) can become complex and less performant, especially in a web application context (request and response cycle). The first way of thinking is to simple refer to parents by children in a normalization fashion. Another way to implement this data model is using document embedding approach: https://docs.mongodb.com/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/ which is easier to interact with if you just model all in one entity; However this comes at the expense of performance; Because whenever you load a user, you load all sessions, executions and reports with it.

I found a compromise between a normalized model and the one using embedded documents; Modeled here:

Normalize and embed

The compromise consist of embedding a minimal variant of the child entity like Executions of type ExecutionsMini in Sessions. While maintaining the child entity Executions separate.

The concern grows because between Users and Loggings, there might be other entities added, in a one-to-many kind or not, and this could complex more the solution (not the data model).

Modeling and managing attack surface around individual finance [on hold]

I want to protect myself from fraud and identity theft.

While there are on the Internet plenty of arbitrary collections of precautionary tips, I want to make rational, fully informed choices to manage the risk that I suffer from financial crime. (I’m not an unusually valuable target for crime; I just want to make responsible choices.)

Essentially, I want to know how my choices will affect the attack surface around my individual finances and crime-relevant information. Having a good model of this attack surface would allow me to answer, for example, these questions:

  • How do I evaluate a bank or credit union for its information security practices?
  • How do I choose among email service providers and email information security practices?
  • What practices around financial transactions minimize this attack surface?

I’m not looking for answers to these questions in particular, but rather how to model the attack surface they are asking about.

So, my question is:

When a security expert wants to model a complex attack surface across multiple institutions and information systems, how does he or she go about doing it? What steps does he or she go through? Can a technically capable but non-expert follow these steps?

Modeling a set of probabilistic concurrent processes

I’m looking into discrete-time Markov chains (DTMCs) for use in analyzing a probabilistic consensus protocol. One basic thing I haven’t been able to figure out is how to model a set of independent processes: consider $ N$ processes. These processes will concurrently execute a series of identical instructions labeled $ 0, 1, 2, 3,$ etc. and all are starting in instruction $ 0$ . When probability is not involved, modeling this is simple: it’s a state machine which branches nondeterministically off the start state to $ N$ different states, where in each of those $ N$ states a different process was the first to execute instruction $ 0$ . What do we do when probability is involved? Do we do the same thing with $ N$ states branching from the start state, where the probability of transitioning to each state is $ \frac{1}{N}$ ? As in, it’s uniformly random which process was the first one to execute instruction $ 0$ ?

Is this like taking the product of the state machines of each process?

I’m using a DTMC here, would I gain anything by moving to a CTMC if I don’t care about anything beyond the global order of execution?

Bonus question: assigning probabilities to whichever action (process executing an instruction) is taken first seems like a generalization of the non-probabilistic notion of fairness; if it is, what is the formal definition of this generalized notion of probabilistic fairness?

The Challenges that Practitioners face with on Software Modeling

We have recently released a survey on understanding the challenges that practitioners face with in their software modeling activities. The survey takes approximately 2-5 minutes to complete.

We would be so grateful if you could separate a few minutes of you to participate in our research.

The survey link: https://docs.google.com/forms/d/e/1FAIpQLScwFMKoOJHDYF6GDETWh5H2w3W57G78Fb2SF0ABM9uTthN_hQ/viewform?usp=sf_link

Cassandra data modeling with mutable attribute rows

We set up da timeseries database. Every measurement has also attributes,that may change later, after storage (eg. deviceName may changed) .

As cassandra recomends an query first approach, I thought about pack that attributes into the table, but how can I handle updates?

  • Database update may result in inconsistency and heavy code.
  • Joins are not possible in cassandra
  • Merging the mutable attributes after the database reads in application layer seems possible, but is it the best way?

Thanks in advance