Should I implement HTML or Markdown for user-generated content?

Technical considerations:

  • HTML takes 3x the space of Markdown. Even though Markdown would be rendered client-side, it doesn’t quite mean 3x advantage in transmitted payload because gzip would close the gap.
  • HTML will also require textContent returned separately to be indexed for full-text-search, whereas Markdown only uses non-alphanumeric characters which would be omitted by the tokenizer. This means there would be two columns of the body in the database: one with the full HTML for returning, and another with all the tags stripped out for indexing.

UX considerations:

  • IMO markdown is actually quite frustrating to use. A lot of it has to do with it trying to “help” and getting in the way instead. For example, sometimes I see a giant wall of text on Reddit, but viewing the source reveals that OP actually made paragraphs with single line breaks.
  • There is no true WYSIWYG editor for Markdown. I’ve only found editors with a live-preview that call themselves “WYSIWYG-like”, which IMO is too cumbersome to use – exhibit 1, people often edit a post on StackExchange just to fix their Markdown.
  • Related:

I don’t know if it’d be feasible to convert from one to another later on, so I would like to make the correct decision at the start.