Concatenating and casting in SQL MGT Studio 2005… | SQL Server Performance Forums
SQL Server Performance Forum – Threads Archive
Concatenating and casting in SQL MGT Studio 2005…Is it me or do query speeds tend to slow wayyyy down when there is casting and concatenating involved?
In my job (I’m a data analyst not a programmer), I have to create unique "key"s quite often.
I find that the following types of queries really slow me down:
where cast (x as varchar) + cast (y as varchar) not in
cast (x as varchar)+(cast y as varchar))…
Do you guys notice this as well? I can post my entire query, but I dont think its necessary.
Any help is appreciated!
BTW, if any SQL experts are in the NE area and looking for work, please call me cell phone 781-775-6393. Job’s based in RI. We are in a dire need of a SQL expert and we pay very well.
Concatenation and casting is killing your performance because they are used within filter criteria, where it is always better to use straight values. The idea is that the database engine will perform best when it can use indexes or statistics. As soon as your criteria need to manipulate data with functions, those indexes and statistics become useless.
Given your example, you can use a LEFT JOIN with a NULL criteria, like this: SELECT t1.x, t1.y
FROM t1 LEFT JOIN t2 ON t1.x = t2.x AND t1.y = t2.y
WHERE t2.x IS NULL
Alternatively, you could use a correlated subquery in a NOT EXISTS clause, like this:
SELECT t1.x, t1.y
WHERE NOT EXISTS (SELECT t2.* FROM t2 WHERE t2.x =t1.x AND t2.y = t1.y)
… but NOT EXISTS can be a poor performer – a lot will depend on the actual data (matches vs. non-matches.
In any case, since we know nothing about the actual table design (especially indexes) it is hard to predict if this approach gives you all the improvement that may yet be attainable.
Thanks Adrian ….Very useful. My databases usually contain between 2,000,000 and 40,000,000 rows, depending on the size of my client. And about 100 columns. So speed can be an issue. Please see the query below. I didnt include the loop because I dont think its necessary. I’m trying to create a targeted sample. The point of the filter (what I’ve underlined below) is to ensure that my loop selects a sample that represents my entire population. As you can probably tell, the query looks within the table that it is appending into to ensure sure that it doesnt keep selecting the same drug from the same month. From your suggestion, I gather that I may have benefited from actually creating this "key" field in my specialtyclaims table and indexing it. The only problem would be that I would have to write a nupdate query to create the key field. Please let me know if you agree that the following steps would have helped:
A) Index dateservice
B) Create month field (month(dateservice)) and populate
C) delete dateservice index
D) create new index on month and apcdrugname
E) Create key field using month and apcdrugname
F) index key field
G) change underlined portion of the below to "…and key not in (select key from specialtysample)
insert into sample select top 1 * ,month (dateservice) as ‘month’, 1-(ic/awptotal) as ‘disct’, cast (apcdrugname as varchar)+cast(month(dateservice) as varchar) as ‘key’from
apcspecialtydrug in (‘l’,’s’) and awptotal > 0 and cast (apcdrugname as varchar)+ cast (month (dateservice) as varchar) not in (select [key] from specialtysample) and
drugname like ‘NUTROPIN%’or
apcdrugname like "continue long list of drugs")order
by 1 – (ic/awptotal) , ic desc
A concatenated keyfield adds redundant information, which you don’t want to do for a multitude of reasons — just use either of the filter methods that I’ve shown you.
I’d be interested to know if item D improves performance – i.e. an index on (dateservice, apcdrugname).
Also, doing a SELECT * statement on a table of 100 columns is bad news, unless you absolutely must retrieve all columns from the source table.
The number 5 depends on clustered index columns, also you can find how the optimizer is treating the index from the execution plan.