It’s been incredibly difficult to do searches and write essays on subjects like these without knowing the proper terminology.
Here’s an example, to clarify what is being discussed here:
The four nucleotides – the base units or “letters” of DNA – are Adenine, Cytosine, Guanine, and Thymine. 4 different possibilities means you can store each nucleotide in 2 bits.
- A – 00
- C – 01
- G – 10
- T – 11
It took forever to find again, but ASN.1 is the kind of “encoding” I’m talking about. Layman’s Guide to a Subset of ASN.1, BER, and DER.
Elusive terms:
- The general name for this.
- A search for “byte encoding” returns only UTF-8 and ASCII stuff.
- The term “serialization” includes string formats like JSON and XML, which are not what is being discussed.
- The Wikipedia page for serialization even has to refer to this as “the more compact, byte-stream-based encoding”.
- Term to generally refer to one of these non-human-readable streams of data, in a specific format.
- “File format” describes what I’m talking about, but it’s difficult to convey the difference between an FTP JSON file and a “custom-serialized” BLOB on an SQL server.
- “Codec” describes it well, but it is referring almost exclusively to video and audio.
- Term to refer to a domain of possible values to be encoded into bits.
- For the DNA example above, this would refer to the 4 unique letters; the members of an enum.
- For a numeric value, the range would determine how many bits are required to represent the number.
- For a string value, the range of characters available would determine how many bits are required per letter. Variable vs fixed-length string would also be a part of it (covered in the ASN.1 guide).
- This field as a whole.
- Any others that potentially would be helpful in the future.
And if there aren’t specific terms available, then provide alternative ways to convey these concepts.