Testimonial


"We engaged Logimethods to accelerate our vendor selection process for an MDM, BPM and ESB software solution, because time to market was essential to our business transformation strategy. Logimethods immediately understood our needs and using their specialization in this space helped drive the evaluation process to a successful and timely completion. Their unbiased input and strong vendor relationships also contributed uniquely to the process. We are very happy with the value Logimethods brought to YPG and would highly recommend them."


Paul Ryan, CTO
Yellow Pages Group Inc.

Thought Leadership
Hybrid Data Warehouse

The problem most organizations face when developing a data warehouse is how to satisfy a diverse set of information retrieval and Business Intelligence (BI) requirements with just one database. Bill Inmon, often referred to as the “Father of Data Warehousing”, espouses the Corporate Information Factory (CIF) that includes both an Enterprise Data Warehouse (EDW), Operational Data Stores (ODS) along with data marts and exploration data warehouses to support Online Analytical Processing (OLAP) and data mining.


Data changes from source applications are captured, staged, then put through an Extract, Transform and Load (ETL) process that cleanses, maps and transforms the source data to data structures in the EDW and the ODS.


Another ETL process is needed to move data from the EDW to the data marts.

 

 

ODS & EDW


The ODS is used for operational queries and reporting. It stores a minimal amount of historical data, from a few months to a year, and is generally normalized but also contains de-normalized tables to support specific types of queries. The EDW holds the long term memory of the enterprise. It is normalized and generally not used to support ad hoc queries. Datamarts are usually structured to meet the OLAP needs of a specific group of users and have a multi-dimensional structure. The different databases and structures are required because they each serve a different purpose. However, there is a significant cost involved in developing multiple sets of ETL processes. In fact, it has been estimated that ETL accounts for 60-70% of the overall development costs of a data warehouse.


page_features_2


An alternative approach developed by Ralph Kimball, widely regarded as one of the original architects of data warehousing, is focused on designing the data warehouse to be understandable and fast. His methodology, known as dimensional modeling or the Bus architecture, has become the de facto standard model for real-time decision support. Kimball’s star schema connects dimensions to facts, which are stored atomic transactions. A dimension may represent time, geography, product lines, etc

The Hybrid Structure

The Hybrid Data Warehouse (Hybrid) is uniquely suited to support both EDW and datamart applications in one database. It can accommodate large volumes of historical data typically found in the EDW, while also performing well for OLAP queries typically done in datamarts. The Hybrid database structure contains both normalized snowflakes and de-normalized star schemas. The controlled redundancy inherent in this design provides good response time for a variety queries.


The Hybrid architecture can also be used to implement the ODS in the same database, as long as sub-second response times are not a requirement. Because the ODS can be used by operational systems, the response time of the database can become an issue.


Because there is only one database schema, the Hybrid model significantly reduces the cost of developing the ETL processes. Real-time (or near real-time) updates can be supported by pushing data updates out immediately to the Hybrid Warehouse directly from the operational system, or by connecting the ETL engine to an Enterprise Service Bus (ESB).

The Hybrid model was used to develop one of the largest databases in Canada. It includes 34 dimensional roles with multiple hierarchies, has over 1500 attributes, and handles 40 million transactions per day in near real time, which translates into one billion rows per month.


The Hybrid model may not be able to fully replace an ODS requirement for sub-second response time. But it can offer a one stop solution for organizations that have very large data volumes and are looking for a cost effective way to support a variety of BI requirements across the organization.