A Journey from OLTP to OLAP
In this first Part of the series I will discuss the OLTP and OLAP and the transition from OLTP to OLAP in pursuit performance improvements as well as the benefits of the each technology.
Note**: This article is intended for the users who are new to the concepts of the data warehousing.
What is OLTP?
OLTP stands for the Online Transaction Processing which is the SQL Server database engine which has been optimized for the insert updates and deletes. Any kind of the insert, update and delete is termed a transaction – hence the term transaction processing. While designing this kind of a database, relationships are defined between the master and transaction tables to analyze the data and get the proper correlation between the various entities and attributes of the database. The tables should be highly normalized using various normal forms to achieve the query optimization. This relational OLTP database forms the backend of most desktop/web applications.
- Uses E-R Model for the Database design
- Real time and very detailed data analysis
- Used for simple reporting
- Better performance when the database is very small
What is OLAP?
The OLAP stands for Online Analytical Processing. The OLAP database is highly de-normalized and is mostly used for analysis and reporting purposes. OLAP has the capability to enable business users to generate detailed reports without having a dependence on the IT department. OLAP comes with a multi-dimensional model to analyze data using N number of dimensions and measures which are the part of the standard Data warehouse.
- Uses the Star and Snowflake schema for the data warehouse design
- Stores historical data for the trend analysis
- Used for ‘slicing and dicing’ data
- Improved performance as the database is de-normalized
- Data can be integrated from disparate data sources
What it BI?
Business Intelligence is the ability for an organization to take all its capabilities and convert them into knowledge, ultimately, getting the right information to the right people, at the right time, for making strategic decisions.
What is Data Warehousing?
Data Warehousing is the process of creating data marts, by identifying the dimensions and facts which will cater to the need of the organization’s various departments and the key performance indicators (KPI) which are useful in determining the organization’s trend in the real time. Though there is more to Data warehousing than just identifying the dimensions and facts and slicing and dicing of the data, the purpose of this article series is to focus on how to built one.
The following are the properties of the data warehouse:
- Subject-oriented:The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked
- Time-variant:The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time
- Non-volatile:Data in the data warehouse is never over-written or deleted. Once committed, the data is static, read-only, and retained for future reporting
- Integrated:The database contains data from most or all of an organization’s operational applications, and that this data is made consistent
Data Warehouse Development Approaches
There two primary models for the development of a Data Warehouse :
Inmon Model: Enterprise Data Warehouse (EDW) – Top down Approach
The Inmon model focuses on the Big bang approach while developing a Data warehouse. The top down approach means the implementation of the data warehouse for an entire organization simultaneously. The data marts are then derived from the data warehouse. The data in the Data Warehouse should be in the 3rd Normal form.
Kimball Model: Data mart approach – Bottom up Approach
The Kimball model is based on the bottom up approach which means the implementation of the data warehouse is by first developing the department specific data marts and then merging them to get one big data warehouse.
Though both the methods have their own pros and cons, There is no one-size-fits-all strategy to data warehousing.
Elements of BI
The elements of BI comprises of the dimensions and the facts. The KPI’s are derived from the dimensions and the facts based on the given business scenarios.
A dimension is any entity which can be used for analyzing facts. Dimensions can be related with the master tables of the OLTP systems
Typical examples of dimensions are : Product, Location, Customers. These are the standard sets of dimensions which you may find across any Industry.
Facts are the quantities and calculated field such as Amounts, counts, quantity etc. which are to be analyzed and can be related with the transaction tables of OLTP
Typicaly examples of Facts are Sales Amount by Product, by Customer, By Year , By Region.
When developing the data warehouse model the following schemas are used to identify the relationship between the facts and the dimensions:
In Star schema, the central fact table is connected to the dimension tables surrounding the fact table in a star-like shape.
In the Snowflake schema, the dimension table is further normalized to capture the details of an entity. The location table is further normalized to detail the state data as shown below:
Thus in this first part of the series we have looked at the basics of the data warehousing, next we will look at how to go about implementing a data warehouse.