SQL Server Performance

Multi-Table Join vs. Single queries - more details

Discussion in 'T-SQL Performance Tuning for Developers' started by IdRatherBeProgramming, Sep 2, 2003.

  1. Three tables (A, B and C) have the same unique primary key.

    From an standard SQL efficiency standpoint (not necessarily Microsoft SQL Server, but across the board as in Oracle, SQL Server, DB2, Access, whatever), which is better to use?

    Multi-Table Join into single Recordset
    set Results = objConn.Execute("SELECT * FROM TableA A, TableB B, TableC C WHERE A.key = 1 AND B.key = 1 and C.key = 1")
    * this example would return 1 row only because of the join by unique primary key

    OR

    3 Single Queries into their own Recordsets
    set Results1 = objConn.Execute("SELECT * FROM TableA WHERE key = 1")
    set Results2 = objConn.Execute("SELECT * FROM TableB WHERE key = 1")
    set Results3 = objConn.Execute("SELECT * FROM TableC WHERE key = 1")
    * this example would have 1 row returned in EACH recordset

    Assumptions:
    1. Primary Key is known up front, it is something like UserID = 1, where other tables are demographic, account settings, etc.
    2. I'm not locking the rows either, this example is simply for reading data from the database across tables
    3. Table size, indexing, database parameters, etc might be a factor, so using execution plan would show skewed results based on the exact details of my test and I'm looking for more of a general principle than a specific example.
    4. I'm looking for something like "It's more efficient to do multi table joins for small table, but if the tables exceed X number of rows, the joins become inefficient, making the single queries the better option"...
    5. Unions would put one row per table in the results set and that is not the desired outcome. Desired outcome is recordsets populated with 1 row so all data is accessible at the same time.

    Also, would having to LEFT OUTER JOIN tables B & C change the answer (because they may not always be populated)?

    Thank you.
  2. gaurav_bindlish New Member

  3. Please remove that thread and make this one the primary. The responses were too vague to help and this question has all of the information right up front.
  4. gaurav_bindlish New Member

    I am locking the other thread.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  5. Assume that network connectivity is also not an issue. Assume the SQL server is being accessed by an ASP server with DS3 connectivity between the two and both servers are PIII 1000mhz quad processors.

    I am looking for an answer limited to how SQL responds to queries on primary keys in the two scenarios.
  6. Assume the ASP page looks like this:

    Section 1: Get DB data from database into 1 row recordset(s)
    Secrion 2: Dump data from the recordset(s) to screen

    The number of recordsets I pull the data into is not an issue. The number of rows per recordset does need to be 1. I want to go through section 2 and simply use the populated recordsets from section 1.

    How I'll use the data if the 1st option is better:
    Username = <%=objRS.Fields("field1")%>
    ActualName = <%=objRS.Fields("field2")%>
    AccountSetting1 = <%=objRS.Fields("field3")%>
    AccountSetting2 = <%=objRS.Fields("field4")%>

    How I'll use the data if the 2nd option is better:
    Username = <%=objRS1.Fields("field1")%>
    ActualName = <%=objRS1.Fields("field2")%>
    AccountSetting1 = <%=objRS2.Fields("field1")%>
    AccountSetting2 = <%=objRS2.Fields("field2")%>

    Hope that clears it up.
  7. bambola New Member

    There is a third option that will not invole joins and will still return you one row as you ask.

    SELECT
    (SELECT field1, field2 FROM TableA WHERE key = 1) a,(SELECT field3 FROM TableB WHERE key = 1) b,
    (SELECT field4 FROM TableC WHERE i = 1) c

    Username = <%=objRS.Fields("field1")%>
    ActualName = <%=objRS.Fields("field2")%>
    AccountSetting1 = <%=objRS.Fields("field3")%>
    AccountSetting2 = <%=objRS.Fields("field4")%>

    Of course, if table depend on each other you would want to use join. inner join will return you rows that
    exists in both (or more) tables. outer join will return (depends how you write it) row existing in one
    but not in the outer.

    So if you don't need rows that exists (or not) in those 3 tables, I wouldn't use join.

    Bambola.
  8. The third option by Bambola is valid. However, it still doesn't answer the question. It only complicates it more. I'm looking to find out which of the original options is more efficient. The 3rd option is certainly valid SQL, but how does it compare for efficiency?

    That's all I'm looking for... efficiency comparisons of SQL on my original 2 code snipets. If known, efficiency comparisons of Bambola's option compared to the original two would also be interesting, but his option seems like it would be the same as option 2 in the original, but having the database also dump all of the data into one recordset (making it a little bit less efficient than original option 2).

    Efficiency...that's what I'm looking for.
  9. gaurav_bindlish New Member

    As I said before, executing statement for three times means 3 network trips and so less performing.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  10. As I stated above, assume network traffic is not an issue. Heck, assume that I have ASP and SQL server installed on the same box and there is no network traffic.

    I can't possibly expect anyone to fully understand the particular network layout, database setup, and all other factors that could effect the answer to this question. I am only looking for what can be answered by a general forum...

    I am only asking for an answer based SOLEY on the efficiency of the SQL Query Engine.
  11. gaurav_bindlish New Member

    The query engine will have to parse, compile and execute the query three times. That's an overhead.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  12. I totally agree that there is additional overhead with accessing the query engine 3 times rather than just once.

    However, Is there more overhead accessing the query engine once with a more complicated query that hits multiple tables or is it more overhead to hit it 3 times with 3 simple queries to single tables?
  13. gaurav_bindlish New Member

    I think the overhead with 3 queries is more as the query optimizer and storage engines are the fastest components of SQL Server and so they should not be a bottleneck.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  14. Argyle New Member

    Agree. In general I would say that you should do a join if your data is related. Let the SQL engine do what it does best, return results from queries. Doesn't matter if they have joins or not. If your data in the tables are related with eachother then do a join and return the related data, instead of relating 3 different recordsets later on in some front end. But if the data isn't realted, say you just want to show X types of data (from different tables) in some front end for a specific user then X queries is what you should do. Don't join the tables if the data isn't related. <br /><br />Maybe it was mentioned before but with for example ADO you can return multiple recorsets with a single query or stored procedure and then step trough them in your code. Sometimes used in login procedures where you might check and return data from tables that aren't related to eachother in itslef but only to the specific user that is logging in etc. <br /><br />Requarding your assumtion:<br />"I'm looking for something like "It's more efficient to do multi table joins for small table, but if the tables exceed X number of rows, the joins become inefficient, making the single queries the better option"<br /><br />I don't see how single queries would be more efficient if your data is related. More efficient compared to what? Some time or later you are going to need to relate the data and that is what SQL does best. And if the data isn't related, then don't do a join [<img src='/community/emoticons/emotion-2.gif' alt=':D' />]<br /><br />/Argyle
  15. joechang New Member

    for a query that returns a single row by index seek, the network overhead, including the interupt to SQL Server, is about 5-6X higher than the index seek cost. One the same system, there is no network overhead, but there is still the interrupt to SQL Server,

    excluding the extra interupts to SQL Server, 3 separate single row queries should be slightly more efficient than 1 single row query involving joins

    efficiency rank:
    1. three-table join (2 joins)
    2. Bambola's (3 joins because of Constant Scan)
    3. three separate calls

    however, a single batch call:
    objConn.Execute("SELECT * FROM TableA WHERE key = 1;SELECT * FROM TableB WHERE key = 1;SELECT * FROM TableC WHERE key = 1;")

    has the benefit of 1) single call to SQL, 2) no joins
  16. Twan New Member

    Well said Joe!

    I'd opt for 3 selects, but either in a batch or in a stored procedure.

    advantages:
    - stored proc, reduces compile time after the first one, and reduces network traffic
    - 3 simple select will be zero cost plans (so SQL sees no benefit in caching the plan for them, i.e. it is faster to recompile these simple queries than it is to cache them...)
    - sanity of the programmer who will support the application

    In any case you'll be talking about a difference measured in perhaps ms, so unless you are talking about it being called very often and/or by loads and loads of people...

    If you want one result set, then using a stored proc which stores the result of each select in a table variable would be efficient too


    Cheers
    Twan
  17. bambola New Member

    You are right about stored procedure Twan, but stored procedure is not an option since it need to work with different databases.
    Personally I'd go with the 3 selects and not with the joins, and if one recordset is that important, I'd store fields in vars and select them as one recordset. Something like

    objConn.Execute("declare @field1 int, @field2 varchar(10), field3 varchar(20); SELECT @field1 = field1 FROM TableA WHERE key = 1;SELECT @field2 = field2 FROM TableB WHERE key = 1;SELECT @field3 = field3 FROM TableC WHERE key = 1;select @field1, @field2, @field3;")

    Bambola.
  18. Twan New Member

    ah ok, didn't read that the first time...<br /><br />Then you will need to do three separate selects or the join as was specified. SQLServer is one of only a few which allows multiple statements to be batched up at the client level... <br /><br />So then you're back to 3 times a simple select versus one time a more complicated select. <br /><br />In this case I would go for 3 times a simple select, since it is much easier to understand and it will behave predictively against all databases. <br /><br />From a straight out efficiency perspective then a single three way join (on 3 primary keys, but without any join criteria) is likely to be faster as there is only a single network roundtrip with all its associated overheads of creating results sets, etc. Although it may work in unexpected ways on different vendor databases (since it is not a normal thing to do... <img src='/community/emoticons/emotion-1.gif' alt=':)' /> )<br /><br />Cheers<br />Twan
  19. joechang New Member

    i did a test once, a stored proc with a single row index seek using the WITH RECOMPILE, this is 10X more expensive than the cache plan (2 CPU-ms versus <0.2 CPU-ms).
    Even though SQL Server can cache plans for queries, it still must parse the query. I saw once multiple table join where simply putting it in a stored proc reduced the Profiler CPU from ~150ms to 90ms.
    So whatever reason one has for not using stored procs, i would suggest re-thinking that decision.
  20. Because there is not a great decrease in performance for the multiple queries and I am not relating tables with 1 to many relationships, I will elect to use the multiple queries into multiple recordsets (option 2).

    Actually, in the specific scenario I am applying this to, there are 5 tables that need to be queried to load the necessary data, and 2 are not always populated (which would required LEFT OUTER JOINS). So, even though the query would not be incredibly difficult to write, designing a single query does not seem to be worth the headache in maintenance it would create considering there would be little or no improvement in performance compared to simply doing 5 direct queries by primary key into 5 separate recordsets.

    I am aware that this will be slightly less efficient across the network, but considering our servers and bandwidth available, this is negligable.

    Thank you for all of your input.

Share This Page