Data engineering newbie looking for wisdom.
I currently have multiple different databases (mongo, postgres, etc) that each contain information I need to stitch together a final object. Ex) mongo gives me part A, postgres gives me part B, MSSQL gives me part C, all of which get merged into the final object I need to store. The data sources also may contribute overlapping information that must be merged later. This collection of final objects is what my users need to be able to run analytics/exploratory queries on.
Right now I’m solving the problem in code, writing individual services/queries for each schema in each database type and creating the final objects myself. However we are continually linking in more data sources and it is becoming unmaintainable.
Would a solution such as Hive be of benefit here? Anything I propose must also be able to run on-prem on commodity hardware.
Any guidance is much appreciated!