You know this feature in e.g. command-line tools: you enter
git cimmot and it says: “Did you mean
git commit“. It often (not sure about git specifically) works this way: loop through available commands, calculate Levenshtein distance (or any other similarity function) and find the best match.
I’d like to implement a similar feature, but with a twist: translation. So a user enters a phrase in some language and I have English versions of available phrases and I’d like to know what did the user mean.
Ideally, I’d like to have a service to loop through some phrases and ask how likely is that “Bon jour” is an English-French translation of “Hello”, “Goodbye”, “Thanks”.
- likeliness: 0.9853
I’d also appreciate any ideas on how to do that without having such a service, but spending not much time on setup, really quick and dirty solution.
For example, I have an idea: just use google.translate + synonym dictionaries + minimum Levenshtein distance. Might work, but adding stupid dictionary-based step to translation process is like parsing HTML with regular expressions. On top of that I will need to find synonym dictionaries (for narrow subject) for multiple languages and integrate them.
An important feature of that problem is that users actually input data from a limited set of phrases (50+), in several dozen of languages. I will collect input data analyse in background manually and fill in matching phrases. But I cannot be online 24/7 reacting to their input instantly and ideally users should get some response.