I have an Oracle 12 database with lots of tables, and I am replicating several of the tables (a subset of rows) into a SQL Server 2016 database. The subset of rows can be established with a WHERE clause on the Oracle side.
I have two web services that can expose anything I want to from that data, one on each side.
Do you have a suggestion for an approach of what I can expose, then compare to find out if the data between the systems matches?
I am currently exposing from one table, which has a few million rows the
COUNT(*), which is a no-brainer since it is very efficient. So far, so good.
I’m also exposing the SUM of each of a few NUMBER(18,2) columns and comparing it to the corresponding SUM on the SQL Server side. However, this is problematic, as it has to scan the whole table in SQL Server; it is sometimes blocked, and sometimes might cause other processes to block. I imagine similar problems could occur on the Oracle side too.
Also, the SUM will not tell me if the rows match–it will only tell me that the totals match; if an amount was improperly added to one row and subtracted from another, I wouldn’t catch it.
I’ve pondered whether Oracle’s STANDARD_HASH might help me, but it seems cumbersome or error-prone to try to generate the exact same HASH on the SQL Server side, and also this doesn’t help with the inefficiency/blocking nature of the call.
So is there any way to have both databases keep track of a hash, checksum, CRC code, or other summary of a column’s data, that is efficient to retrieve, and that I can then use to compare to have some idea whether data is the same on both sides? It need not be a perfect solution–for example, comparing SUMs is close but perhaps not quite good enough.
As a first stab, I created an "summary" indexed view, with columns derived from SUMs, on the SQL Server side. This makes querying the view very fast, but incurs additional penalty on every write to the large table underneath. Still, I think it will work, but I’d like to improve on it. Other, better ideas?