Here’s an example for a use-case that matches what I’m trying to better understand. Say I have 3 objects I need to deal with that are similar in many ways, such as invoice, PO, and receipt. They can all have a number, and amount, etc. They can also have many similar relationships, such as line items, images attached, etc. I can think of a number of ways of modeling this.
Make everything separate. Have tables for invoice, invoice_line, invoice_image, then po, po_line, po_image, etc. This means strong referential integrity with foreign keys and least possible number of joins, but a ton of duplication for each table.
Have parent document table with common fields and a type field, then have invoice, po, and receipt table have a foreign key to document_id. I can then have a single document_image table. For the lines, there are again some differences but many similarities between all, so could have a document_line table with a foreign key to document_id and invoice_line, po_line, and receipt_line tables with foreign key to document_id. Here, we have less duplication, keep referential integrity with foreign keys, but start having many more joins to get all the info we need. If I have an invoice line item and wanted to get all the info, I’d need to join invoice_line to document_line, invoice, and document.
Use separate invoice/po/receipt tables, but for image relationship (or any other) add multiple nullable foreign keys, so image would have invoice_id, po_id, and receipt_id nullable foreign keys. With this we can still enforce referential integrity but we have some fields that will often be useless polluting things, plus we now can’t make a necessary field required because they all need to be nullable. We do cut down on duplication.
Use separate invoice/po/receipt tables, but for image relationship (or any other) have a type field and fk_id field. This way I don’t need to have multiple many-to-many tables so it cuts down on duplication, especially if you have lots of these many-to-many, since it would always be 3 tables each time. I like this option the least because you can’t have a foreign key, I pretty much not even considering it as a valid option.
I’m leaning towards option 1 or 2, I think options 3 or 4 seem like bad design and I’d likely only consider if somebody explained that there are major performance benefits. For option 1, even though there’s duplication, you can get around it with code generation, so maybe not so big a deal. But would be interested in knowing if there’s a major advantage to breaking it down like option 2.