I just started learning about dimensional modeling and I am creating a star to analyse email newsletter signups for an online business.
I have a fact table that records the signups and links to a contact dimension, traffic source dimension, page dimension and a junk dimension with various flags.
I also have a degenerate dimension with the signup ID of the source system.
But one of the pieces I can’t figure out where to put is the email address.
I won’t need it for filtering, but it will be good to have for when I dig into the data and show the signups.
The data high cardinality and contains about 70% unique emails and 30% duplicates.
Option 1: Junk Dimension
I considered putting it in the junk dimension, but that would swell the rows from ~1,000 to currently 133,000.
So that’s clearly a bad choice
Option 2: Email Dimension
But having a dimension with just one column doesn’t seem right. It would provide barely any space savings.
Option 3: Degenerate Dimension Another option is to put it as a degenerate dimension like one would do with natural keys. But my understanding is that you are only supposed to use degenerate dimensions for values that have a unique value for each fact row.
On top of that the that textual values should be avoided. Would 70% qualify for this?
Finally, probably the best option would be to record the email through the contact dimension and make it an SCD Type 2.
But I already have a contact dimension that’s an SCD Type 1 and currently don’t have the knowhow to implement and maintain a type 2 properly. On top of that the data is loaded once per day, so it wouldn’t account for a person who signs up with one email and changes it on the same day to another email before the ETL pulls the data.
Anyways, to make my question more clear I’d like to get guidance on…
- Does it make sense to have a dimension table with just one column and almost as many rows as the fact table?
- Is it ok to use degenerate dimensions for values that are not 1-to-1 in the fact table (but still high cardinality)?
- How bad is it to add a text column to a fact table?