DW design and storage

I am working on new DW design. I have read through many blog posts and training videos regarding DW layers/areas/zones. It seems there are terms that are used or defined differently.

Below I have listed the areas that make sense to me. But I am kind of new and I haven’t built many real-life solutions from scratch.

Would you describe DW zones differently? (image/notes are below)

At my corporation, we have general DW which is conglomerate of tables for old/new system primary capturing history data for loading into SSAS cubes for a small portion of business reporting need (Sales/ Production/ Purchasing). All related to historical transactions aggregation. And it runs nightly to re-load everything anew.

There is a new DW project (or rather an update to an existing solution), that is concerned with future data for Sales/Inventory for projecting demand balance levels and reviewing sales order data. This database is small in size regarding tables/ rowcounts. The current Database size is small as well (RowSizeGB 0.00011, LogSizeGB 0.00030).

What kind of DW solution would this be described as? What kind of storage areas/ process would you recommend for capturing the datasets?

Currently, this planning database refreshes with TSQL scripts on SQL box. We’re looking at plans to update the DW/ETL design.

Any thoughts or recommendations on such a solution?

We are a Microsoft shop. We’re looking to possibly move to cloud (ADF, AAS, ASQL). But that’s a slow transition and this redesign will likely be done for free cost options (SSIS/ TSQL/ TimeXteneder DWA tool we currently used). Unless there is convincing reason to do this in Azure cloud. The corp wants to stop using TXDWA, and possibly moving to Informatica next year. So this whole thing will be scrapped & redone in a year (possibly).

This DW update was requested by my manager. I think he wants to see how I work the project.

Any thoughts or advice?


DW stages/areas/zones

Source. Sources contain the data of interest. Direct connections to OLTP source system is typically scowled upon due to concerns of performance impact, and data sprawl caused be creating too many views.

Operational data store (ODS). A replica of OLTP tables raw system data without transformation. Used while fetching data for loading staging targets. Provides a snapshot of the latest data. Fast load time. (AKA- Operational data exchange, Mirror OLTP).

Configuration (DWConfig). An area for storing data not kept in source system for use in staging. (e.g. AOP, Historical data snapshots, Business cross reference)

Stage (DSA). The data staging area. Data cleansing, scrubbing, merging, and business rules happen here. ETL Transformations. Sourced from the ODX. Referred to as “Integration”.

Data warehouse (DW). A clean, organized, single representation of your data. Data comes from a variety of sources to support strategic and tactical decision making. It’s structured as a schema of views on top of an existing Stage.

Semantic/ OLAP. Category of technology that enables users to their data in a fast, interactive, easy to use manner.

Data Mart. Data mart is usually subject-specific for a specific group. An autonomous subset of the complete data warehouse that is functionally defined by facts and dimensions. The data mart is composed of fact and dimensions that will be derived from the data warehouse but might be slightly different from the original ones.

enter image description here