I've been under the impression that joins are costly and should be kept to a minimum. Is this true?

Question

I have a “main” table named Property. This table has approximately 60 columns and over 1 million records (this record count will increase significantly over time). Of these 60 columns, approximately 25 are IDs which reference values in multiple different lookup tables. In any given view, query or store procedure, I end up joining 12-25 tables in an effort to return the appropriate data.

I’ve been under the impression that joins are costly and should be kept to a minimum. Is this true? I have tried several different ways of retrieving this information (i.e. user-defined functions, table datatypes) but nothing is as efficient as joins. I’d like to improve performance if possible. Is there a better way?

Answer Yes, joins can be costly, but of course they cannot be avoided in a relational database. Generally, I prefer to avoid joins that join more than 4 tables. This is not always possible, but that is my goal. If you have to join from 12-25 tables, this seems to me (and this is just a guess, as I don’t know the database) that the database’s design is less than optimal. Most likely, you cannot change the design, but if you can, then this would be a great first step to reducing the number of tables in your joins. One of the easier ways to change the design is to selectively denormalize the tables so that fewer joins are required. This would require the minor changes of some tables, and perhaps any applications that access them, but again, this may not be possible. Another option would be to create some new tables that are denormalized and that are updated using either triggers (if you need real time updating) or by DTS (if real time is not needed). This way, you can query one or more denormalized tables for the data. Of course if the tables are huge, like yours, these options could present some performance problems. Another potential option is to create a data warehouse or cube to summarize the data. This may reduce the amount of data stored (much data is summarized), but it would not be real time, although it could be close with regular updates. Of course, none of the above options may work for you. If that is the case, then joins, using appropriate indexes on the joined columns, are your best bet. Other options, such as using temp tables, sub-selects, and others, probably won’t be as fast as the typical join.

]]>

Leave a comment

Your email address will not be published.