It comes from big data perspective. Basically, many frameworks (like Apache Spark) “compensate” lack of relational operations by providing Functor/Monad-like interfaces and there is a similar movement towards cats-to-SQL conversions (Slick in Scala). For instance, we need natural join (assuming no repetitions on indexes) for element-wise multiplication of vectors from SQL-perspective, which could be considered as
zip + map(multiply) (Spark’s MLib, however, already has
ElementwiseProduct) in Category Theory’s applications.
Simply saying (following examples are in Scala):
the referenced subcase of join can be thought as applicative functor (over sorted collection), which in its turn gives us
List(1,2,3).ap(List(2,4,8).map(a => (b: Int) => a * b))–>
(List(1,2,3) zip List(2,4,8)).map(x => x._1 * x._2). Moreover, we can induce it to some other joins, assuming some preprocessing (
groupByoperator or just surjection, or generally – an epimorphism).
other joins and selection can be thought as monad. For instance,
List(1,2,2,4).flatMap(x => if (x < 3) List(x) else List.empty)–>
List(1,2,2,4).filter(_ < 3)
data itself is just ADT (GADT too?), which in its turn looks like a simple Set-category (or more generally speaking – Cartesian-closed), so it should (I suppose) cover Set-based operations (due to Curry-Howard-Lambek itself) and also operations like
RENAME(at least in practice).
aggregation corresponds to
So, what I’m asking is can we build an isomorphism between (maybe subset of) category theory and (the whole) relational algebra or is there something uncovered? If it works, what exact “subset” of categories is isomorphic to relalgebra?
You can see that my own assumptions are quite broad while formal solutions like Curry-Howard-Lambek correspondence for logic-cats-lambda are more precise – so actually, I’m asking for a reference to an accomplished study (that shows a direct relationship) with more examples in Scala/Haskell.
Edit: the accepted answer made me think that I went too far representing joins and conditions as a monad (especially using an empty value that effectively instantiates FALSE), I think pullbacks should suffice at least for relalgebra subset of SQL. Monads are better for higher order (nesting) stuff like GROUP BY, which isn’t part of relalgebra.