Conceptual question about "ETL" | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Conceptual question about "ETL"

Hi, A simple question but I just want to clarify my understanding: E = Extract
T = Transform
L = Load
I clearly understand the "E" part which simply means extracting data from the data sources e.g. your OLTP systems or whatever into your Data Mart/ Data Warehouse. However I am now fully clear as to what do the "T" and the "L" really represent. Does the "T" (Transform) represent the processing that we do in our Datamart/DW (which some also refer to as the ‘Staging Area’) or ??? Can you please help be conceptually be clear about this? Does the "L" (Load) part represent the process of updating our dimensions and Cubes?
I’ll appreciate your reply. Thanks
Load = insert into mart tables normally. Cube processing I don’t think is part of the concept, but it’s a general term for tools which do things like DTS does. And of cube processing can be scheduled/invoked by a step in a DTS package, and DTS is an E.T.L. tool. T = transformation is only done IF REQUIRED, you may simply be doing a bulk insert from one database to another – if that’s all you need, there’s no need for the ‘T’. E.g. in our load process (when it works!), the transformation parts including the business logic for loading into the Mart tables is all done in SQL Stored procedures, not in DTS, where transformations (esp. active-X steps), can slow things down. we use worktables and temporary tables during our transform section of our load. There are rival ETL products to DTS, but to be honest I don’t think their use is very widespread, because of course you have to PAY for them, whereas if you have an SQL Server, you have DTS for no extra (and in fact there are some DTS bits you can do without even having an SQL Server..)
Tom Pullen
DBA, Oxfam GB
Tom,<br /><br />Thanks for the reply. I’m not so sure if i totally follow the explanation about the "T" part but will go through the reply again…<br /><br />BTW, you have given me an excellent tip regarding how to imporve the speed of my Loading process. I will try to learn and then use Stored Procedures instead of my current approach (DTS) to load the data from the OLTP into the datamart and see how it works. <b>Any suggestions/tips regarding how to start in this direction? </b><br /><br />So many thanks for the reply and especially for the tip above<br /><br />P.S.<br /><br />1) Why do you say "in our load process (when it works!)"<br /><br />2) I am still curious as to why none of us made use of the break that we were loking forward to i.e. the weekend <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /> just joking
Within DTS, you can "transform" data in-process as it is being moved from one place to another. This is helpful if, for instance, you column mappings aren’t 1 to 1, or you want to perform manipulation of data (e.g. conditionally change it, perform string manipulation, calculations etc etc etc). In my process the "transformation" I guess I was referring to was the process of turning RAW OLTP data into star-schema Data Warehouse data in the Mart SQL Server tables. This is the process I mean which uses sprocs, worktable and temp tables to morph the raw OLTP data (which is present in a staging database on the Mart server) into the shape it needs to be in the Mart. P.S. 1. My data warehouse is the one which is caused my "bad day at the office", hence the comment about "when it works"! 2. We’re so dedicated, joohz, some of us answer stuff from home at the weekend!!!
Tom Pullen
DBA, Oxfam GB
Tom appreciate your reply and the dedication (really). Working from homw…. tell me about it <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />BTW, Tried but could not find your artile/post about "Bad day at office"… can you help me find. Thanks.
It’s here, Joozh, over in the General DBA questions section. http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=6559 Tom Pullen
DBA, Oxfam GB
]]>