A Journey from OLTP to OLAP
In this first Part of the
series I will discuss the OLTP and OLAP and the transition from OLTP to OLAP in pursuit performance improvements as well as the
benefits of the each technology.
Note**: This article
is intended for the users who are new to the concepts of the data warehousing.
What is OLTP?
OLTP stands for the Online Transaction Processing
which is the SQL Server database engine which has been optimized for the insert
updates and deletes. Any kind of the insert, update and delete is termed a
transaction – hence the term transaction processing. While designing this
kind of a database, relationships are defined between the master and
transaction tables to analyze the data and get the proper correlation between
the various entities and attributes of the database. The tables should be
highly normalized using various normal forms to achieve the query optimization.
This relational OLTP database forms the backend of most desktop/web applications.
- Uses E-R Model for the Database design
- Real time and very detailed data analysis
- Used for simple reporting
- Better performance when the database is very small
What is OLAP?
The OLAP stands for Online Analytical Processing.
The OLAP database is highly de-normalized and is mostly used for analysis and
reporting purposes. OLAP has the capability to enable business users to generate detailed reports without having a dependence on the IT department. OLAP comes with a multi-dimensional model to analyze data using N number of dimensions and measures
which are the part of the standard Data warehouse.
- Uses the Star and Snowflake schema
for the data warehouse design
- Stores historical data for the
- Used for ‘slicing and dicing’
- Improved performance as the database
- Data can be integrated from disparate
What it BI?
Business Intelligence is the ability for an organization to take all
its capabilities and convert them into knowledge, ultimately, getting the right
information to the right people, at the right time, for
making strategic decisions.
What is Data Warehousing?
Data Warehousing is the process of
creating data marts, by identifying the dimensions and facts which will cater
to the need of the organization’s various departments and the key performance
indicators (KPI) which are useful in determining the organization’s trend in
the real time. Though there is more to Data warehousing than just
identifying the dimensions and facts and slicing and dicing of the data, the purpose
of this article series is to focus on how to built one.
The following are the properties of the
- Subject-oriented:The data in the data warehouse
is organized so that all the data elements relating to the same real-world
event or object are linked
- Time-variant:The changes to the data in the
database are tracked and recorded so that reports can be produced showing
changes over time
- Non-volatile:Data in the data
warehouse is never over-written or deleted. Once committed, the data is static,
read-only, and retained for future reporting
- Integrated:The database contains
data from most or all of an organization’s operational applications, and that
this data is made consistent
Data Warehouse Development Approaches
There two primary models for the development of a
Data Warehouse :
Enterprise Data Warehouse (EDW) – Top down Approach
The Inmon model focuses on the Big bang
approach while developing a Data warehouse. The top down approach means the
implementation of the data warehouse for an entire organization simultaneously.
The data marts are then derived from the data warehouse. The data in the Data
Warehouse should be in the 3rd Normal form.
Kimball Model: Data mart approach – Bottom
The Kimball model is based on the bottom
up approach which means the implementation of the data warehouse is by first
developing the department specific data marts and then merging them to
get one big data warehouse.
Though both the methods have their own
pros and cons, There is no one-size-fits-all strategy to data warehousing.
Elements of BI
The elements of BI comprises of the
dimensions and the facts. The KPI’s are derived from the dimensions and the
facts based on the given business scenarios.
A dimension is any entity which can be used for
analyzing facts. Dimensions can be related with the master tables of the OLTP systems
Typical examples of dimensions are : Product, Location, Customers. These are the standard sets of dimensions which you may find across any
Facts are the quantities and calculated field such
as Amounts, counts, quantity etc. which are to be analyzed and can be related with the transaction tables of
Typicaly examples of Facts are Sales Amount by Product, by
Customer, By Year , By Region.
When developing the data warehouse
model the following schemas are used to identify the relationship between the
facts and the dimensions:
In Star schema, the central fact table is connected to the dimension tables surrounding
the fact table in a star-like shape.
Snowflake schema, the dimension table is further normalized to capture the
details of an entity. The location table is further normalized to detail the
state data as shown below:
Thus in this
first part of the series we have looked at the basics of the data warehousing, next we will look at how to go about
implementing a data warehouse.