SQL Server Performance Forum – Threads Archive
JOIN Performace Vs. Multiple ResultSetsCan anyone give me some guidelines as to when to chose JOINS over returning multiple resultsets in a strored procedure.. For eample, I have two tables, Orders and OrderDetails, which are linked by a primary key field. There can be orders w/o a corresponding record in orderdetails. 1.) I can return all orders and their details using a stored preocedure that has: SELECT o.order_id as OrderId, o.customername, od.order_id, od.orderdate FROM orders AS o LEFT OUTER JOIN orderdetails AS od ON (o.order_id=od.order_id) 2.) I can do the same by returning two results sets in a different stored procedure: SELECT order_id, customername FROM orders SELECT order_id, orderdate FROM orderdetails I think the client processing time for the second option will be slightly less, because the resultset I need to filter will only be as big as the orederdetails table (it will not include records for orders that have no details). Regardless, I think this would only make for a small performance gain, so if Option 1 is better in Database performace, I would probably go with that. I assume the method to choose also depends on table size and # of JOINS. Any guidance would be appreciated. Thanks, Al
usually, 1 is better than 2. The overhead of joining and maintain separate stored procedure already outweighs any issue you have with first statement. You can add the WHERE if you need to (ie. od.order_id IS NULL). If you don’t need the records without order details, then you should use the INNER JOIN instead of LEFT JOIN. Or you can just query the ORDER table. May the Almighty God bless us all!
RDBMS like SQL Server are highly optimised for JOIN operations. In fact, one might think that the ability to JOIN and relate one table to one or more other(s) is at the core of the relational model and therefore relational database systems. So generally I think you should do such processing at the database level. Actually I also think there is one thing that needs to be carefully evaluated if it has to be done at the database and that is sorting with an ORDER BY. This might be most of the time better done at the client. —
Microsoft SQL Server MVP
Heute schon gebloggt?http://www.insidesql.de/blogs
Thanks for your advice. I just wanted to make myself clear about one thing. Option 2 returns 2 result sets in a single stored procedure (the .NET dataset will have 2 tables in it), so there is no more overhead than there is in option 1. And the processing time is close as well, because even if I return a single resultset with the join, I still have to loop through the records and filter the entire resultset by id during each iteration because the parent and child records will be in the same set. Unless there is a way to get the parent and child records all at once, the processing of a Joined resultset won’t be any easier.
A recordset is the result of a query. A query can take data from one table – which is what you are doing two times. A query can also take data from two tables at the same time – this is really lesson 1 in query basics.<br /><br />You might find it useful to treat yourself to some database-oriented training. It will greatly enhance your skills as a programmer.[<img src=’/community/emoticons/emotion-1.gif’ alt=’‘ />]
Thanks for the suggestion, Adrianne, but I am not actually a novice developer, though I am no expert in database stuff. My post to this forum was meant to get some insight into the relative performance characteristics of two different ways of accomplishing a task. You are right about being able to take data from 2 tables all at once, but if it takes more time to do that, than it does to take the data separately from two tables (again this is done through a single stored procedure…not with two separate calls), wouldn’t that be preferred? I am conserned about the performance of JOIN operations. Perhaps my trivial example wouldn’t warrant much consern, but if I need to do multiple nested joins, then I would like to know if a JOIN is still the prefereed method over multiple resultsets. For instance….in a JOIN operation, the db has to scan the entire right hand table in order to match only those records that coinside with the ON clause parameters. Indexing helps this, but it is non-trivial for very large tables. Conversely, if I just grab all the records in that table, this is a much quicker operation. Now I understand that rdms’ are optimized for JOIN operations, but there has to be a point, determined by number of joins or table size, at which the JOIN operation doesn’t perform as well as returning multiple resultsets. What that point is, is what I am trying to glean from the expertise on this board. Thanks for your reply, though.
I wasn’t <i>questioning</i> your skills as a developer, but your reply again suggests that you will benefit from learning about databases. They’re not scary monsters that you must keep at arm’s length – there’s just a different flow to the whole programming thing. And how many programming projects do you see that do not access at least one database?<br /><br />Don’t assume that retrieving data from two tables will cost more time than separately retrieving data from both tables – it really is quite the other way around.<br /><br />Buy a good book on SQL Server, and increase your market value.[<img src=’/community/emoticons/emotion-1.gif’ alt=’‘ />]
So am I to understand that JOIN will always be faster than retrieving the data in multiple resultsets no matter how nested the JOIN is or how large the tables are? This doesn’t seem to make sense considering I am only calling the db once in both cases, so that overhead is the same.
It’s not so much the overhead, which is minimal, it’s the waste of time, which is avoidable.
Plus your client app will have to be programmed to make the associations that SQL Server would have made for you if you had used a JOIN.
The scenario you posted usually doesn’t make sense unless database is really small, in that case you can use whatever approach you want. The problem is that is not really good idea to return all rows from both tables. Usually you put some criteria that narrows the data set returned. In that case you don’t have a choice but to join tables, because that way you won’t have to access rows you don’t need.
One last question… if anyone is a .NET developer, it was mentioned in this thread that data returned by a JOIN has some inherent relational associations. Is there a way (through the dataset object assumably) to get access to these relations without having to filter the entire result set on the primary key. For instance, I have the two tables in my original example. To get access to the data now, I loop through each record, grab the primary key, create a new dataview object from the reults and filter it on the primary key. That way I get the parent/child relationship. If there were a way to access this directly through the dataset w/o doing the filtering, it would speed up processing time on the client quite a bit.
I’m not sure I understand you. Are you looking for a way to present some kind of hierarchy within your data? If so, google for Trees and hierarchies and SQL:http://www.google.de/search?hl=de&q=trees hierarchies sql&btnG=Suche&meta=lr
Microsoft SQL Server MVP
Heute schon gebloggt?http://www.insidesql.de/blogs
You should check the .Net 101 sample, Data Access – Build a Master Detail Window Form. It will give you an idea how to populate the dataset. You will see in the sample that command text used is the join of product/categories/order detail in the data adapter object. Once you have defined the relationship in the dataset, yes, you can access the data relation collection (dataset.Relations). May the Almighty God bless us all!