What is best practice for referencing data rows within stored procedures – via PK, code column or data value?

Suppose you have a table for colours with columns:

  • id = automatically incrementing integer, primary key
  • code = short code reference for the colour, unique
  • colour = human-readable name of colour, unique

Example values might be:

  • 1, BL, Blue
  • 2, GR, Green

Now imagine you have a stored procedure that, at some point, needs to reference this table. Let’s say the business logic says to obtain the colour "Green". To achieve this, you could have any of the following three WHERE clauses:

  • WHERE id = 2
  • WHERE code = GR
  • WHERE colour = Green

Now, if the system is designed such that it is agreed that a code value, once created, never changes, then, in my view, that is the best column to reference because:

  • It is an alternate key
  • It is human-readable for people who maintain the code
  • It will not be impacted when the business decides to change the colour value to ‘Sea Green’

However, if a legacy table lacks such code values, what, in your opinion, is best practice? To reference the id column, or the colour column?

If you reference the id column, the code is not readable unless you then also add comments – you shouldn’t have to comment simple things like this. It sucks figuring out what statements like WHERE id not in (1, 7, 17, 24, 56) mean.

I’m not sure how often, in reality, the id value might change – but consider if you run a script during development to insert new colours but then delete those and insert some more. If your stored procedure references the id values from that last set of colours inserted but when you create your new colours in your next environment you skip the step that inserted the colours which ended up deleted, then the id values won’t match in that next environment. Bad practice, but it can happen – a developer develops their script on a dev instance not thinking that the id values will conflict with production (which, for example, may have had additional colours created manually by the business before your colour creation script runs).

If you reference the colour column, you run the risk that if the business does ask to update the description from ‘Green’ to ‘Sea Green’, that your procedure will begin to fail.

I suppose a further solution is to implement the code column when you need it, if it isn’t there already – probably the best solution?