SQL Server Performance

The curse of NTEXT

Discussion in 'ALL SQL SERVER QUESTIONS' started by Iain Cowden, Nov 8, 2013.

  1. Iain Cowden New Member

    Dear All,

    As part of the Transform stage in our DWH I have to create a clinical notes table. We are sent two data extracts which store the actual note content in ntext columns. The table containing the data has about 5,000,000 rows currently.

    In our existing ETL, the data extracts are stored on one server and we query across linked servers to compile a clinical notes table from the two extracts, inserting note text for any notes which are dated within the last 180 days. i then convert the created ntext column to varchar(max) in order to be able to provide it to a data view in use by a front end portal.

    This takes about 25 minutes.

    We are now revamping the ETL and have a new server which hosts both the data extracts and the transform database. However, rather than a performance improvement from having both the extracts and creation on the same server, I am getting a performance degradation. The same query takes about 50 minutes to run.

    I reviewed the execution plan and the key difference seems to be that on the linked servers, server A has to organise everything before it gets data from Server B. Whereas on the single server C, the execution plan is much more sprawling.

    I have tried rewriting the query to create an index table without the text data (takes a couple of minutes), and then creating a new table based on the index but joined to obtain the ntext columns, and this has improved speed on the single server to around 35 mins, but is longer on the linked server (around 50 minutes). I originally thought about an update, but of course you can't use ntext in an update statement.

    Has anyone got any suggestions as to how I can optimise this query?

    Thanks

    Iain (code samples are attached).

    Attached Files:

  2. davidfarr Member

    I would agree that an 'index table' to stage 'Rank 1' records in Query1.txt is a smart idea, because I think the large 'PN3' table created from the nested query in Query2.txt is an unecessary overhead.

    I have attached a (Query3.txt) which applies a few suggestions as follows;

    Your 'index table' may perform better if it is indexed after creation. Perhaps you already do this but it was not indicated in Query1.txt.

    Instead of using UNION to collate large amounts of data into a single transaction, it may be faster overall to break them down into smaller transaction sections. In my attached suggestion, I have replaced some of the UNIONs with individual transactions.

    You should avoid using data types and conversions that are larger than they need to be; values of 1 and 2 for RioInstance, and values of 0 and 1 for NoteStatus do not need to be INT types, they can be TINYINT and BIT respectively. CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, @DateValue))) may be faster if done as CONVERT(DATE, @DateValue). Smaller data types execute slightly faster, especially on large recordsets.

    Attached Files:

  3. Iain Cowden New Member

    Thanks David,

    That little bit of jiggling seems to have knocked about 10 minutes off, and brought the timing down to around 25 minutes, which is in line with the original time it took to run the original query across linked servers. Thanks very much for that.

    A shame it just won't go any faster!!

    Iain

Share This Page