My end goal is to serve an app where the user can do through a bunch of docs: if the like a document, they see another document similar to the one they liked…if they like that one, they see another on – and so on.
I’m planning to store all of these docs in an elastic search instance. Every day, I’d insert 100 documents via a cronjob at a given time.
Upon insertion of a single document,
d, I would also compute how similar is to all existing documents. I would create a new field called
most_ similar_to in
d and it would be a list of pointers to the other similar documents.
The advantage to the above approach is that the end user doesn’t have to “wait” for my app to find similar docs, as it’s already cached.
How can I improve on this design? It’s a simple server, with no concurrency or anything.