Hi everybody, I have a DTS package that runs every morning to import data to SQL 2000 tables from Sybase. I'm having problem with one of the package from this friday. The scenario is as follows: The source for the DTS package is a view. The view does the select from the only one table of around 34 million rows and also has a join with the SELECT from same table using AS. It usually took around 2 hours. However, suddenly the job neither succeeded nor failed on this friday. I explicitly executed DTS package and found out that it get stuck on same data flow step which SELECT view without throwing any error. I would appreciate if anybody can throw some light on this issue. Thanks, Rusty
This could be a blocking/locking issue on SYBASE whereby the resources are waiting to clearup in order to import rows? Check on SYBASE side and SQL Server side too for such indications.
To add to what satya mentioned, since this is happening on a self-join, perhaps the table does not have a primary key, or no index on the join column(s), or no clustered index.
Thanks Satya and Adriaan, I really appreciate you guys precious time to help me out. Let me make myself more clear. Actually, I was just explaining about the process. However, the problem is in SQL 2000 since this occurs after everything is loaded on staging table. I checked for the blocking using SP_WHo2 'active' and didn't see any blocking. Adriaan is right the view does SELECT from TableA which is self joined. It has around 32 million rows and Its a heap table. There are some statistics though. I also updated statistisc with FULLScan. It succeeded but I'm still having problem with the SELECT. Thanks in Advance for your help! The script for the view is as follows: SET QUOTED_IDENTIFIER ON GO SET ANSI_NULLS ON GO ALTER VIEW dbo.X AS SELECT TOP 100 PERCENT * FROM (SELECT A.DIMENSION_NAME,A.DIMENSION_VALUE, CONVERT(datetime, CONVERT(char(2), DATEPART(mm, A.CLM_PD_DT)) + '/01/' + CONVERT(char(4), DATEPART(yyyy, A.CLM_PD_DT))) AS CLM_PD_DT, CONVERT(datetime, CONVERT(char(2), DATEPART(mm, B.CLM_PD_DT)) + '/01/' + CONVERT(char(4), DATEPART(yyyy, B.CLM_PD_DT))) AS PREV_CLM_PD_DT, A.RUN_DATE, B.RUN_DATE AS PREV_RUN_DATE, A.SUM_PAID, A.REC_COUNT, A.RUN_COUNT, B.SUM_PAID AS PREV_SUM_PAID, B.REC_COUNT AS PREV_REC_COUNT, A.SUM_PAID - B.SUM_PAID AS PAID_CHANGE, A.REC_COUNT - B.REC_COUNT AS REC_CHANGE FROM TableA A LEFT OUTER JOIN TableA B ON A.CLM_PD_DT = B.CLM_PD_DT AND A.RUN_COUNT = B.RUN_COUNT + 1 AND A.DIMENSION_NAME = B.DIMENSION_NAME AND A.DIMENSION_VALUE = B.DIMENSION_VALUE) tmp WHERE (RUN_COUNT <> '1') AND (CLM_PD_DT NOT IN (SELECT DISTINCT TOP 5 CONVERT(datetime, CONVERT(char(2), DATEPART(mm, CLM_PD_DT)) + '/01/' + CONVERT(char(4), DATEPART(yyyy, CLM_PD_DT))) FROM TableA ORDER BY CONVERT(datetime, CONVERT(char(2), DATEPART(mm, CLM_PD_DT)) + '/01/' + CONVERT(char(4), DATEPART(yyyy, CLM_PD_DT))) DESC )) ORDER BY CLM_PD_DT, RUN_DATE DESC GO SET QUOTED_IDENTIFIER OFF GO SET ANSI_NULLS ON GO
Add an identity column to your heap table, and make that the primary key. With 32M rows, I'd also add some indexes on columns used for filtering. You'll be amazed.
And by the way, with this much processing going on in the view, do not expect it to perform well. If you need to filter out results from this view, you would be better off re-writing this as a stored procedure, adding the filter criteria as part of the query statement.
Thanks Adriann for your precious time! Last time I had this issue I did update statistics and it worked. I'm not sure we have any unique column. So, I'm planning to create non-clustered index on one of the column. I'll update on this post. In the meantime, if anybody has been through this and any brilliant ideas, please keep me posted!~ Thanks, Rusty
The whole idea of adding an identity column is so that your heap table will have a unique column, that you can use as a PK. The values on the identity column will be meaningless, but a PK is just one of the things that SQL needs in order to perform well.
When you are an index just keep in mind that if that is a non-clustered then make sure to include frequently used columns within the WHERE & JOIN clauses of your query, for clustered it is your own decision to make it the order you want.