SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Training
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
SQL Azure
Developer
General DBA
ASP.NET / ADO.NET
SQL Azure

USEFUL SITES :

ASP.NET Tutorials
Windows and SQL Azure Tutorials
Cloud Hosting Magazine
SharePoint Tutorials
Windows Server Help

Write for Us

Share your SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server

More     
 
Latest FAQ's

Add Node to A SQL Server failover Cluster failed with invalid ...
SQL Server Destination remote server error
Setting Up Data And Log Files For SQL Server
Will Check Constraints Improve Database Performance?

More     
   
Latest Software Reviews

dbForge Review
Spotlight on ApexSQL Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Data Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Doc 2008

More     

articles >> clustering >> Achieving Massive Scalability with SQL Server ...

Achieving Massive Scalability with SQL Server

By : Doug Kerwin
May 10, 2003

With each release, Microsoft SQL Server continues to gain in performance and scalability. A little known fact buried under Oracle marketing is that SQL Server’s TPC-C score is now even higher than Oracle’s. However, few other than Microsoft have millions of dollars of hardware needed to generate the 709,220 transactions per minute achieved in the published TPC-C test result. So for the rest of us, with down to earth budgets for database licenses and hardware, how can we achieve scalability for extremely high volume database driven applications?

Database scalability is a long thought over and debated topic, and there are no easy answers. There are several ways to achieve more scalability out of your database, however the architecture you choose will greatly depend on your unique needs.

When it comes to scalability, there are two high level ways to achieve it –- scaling up or scaling out. Scaling up is to handle volume on a single high performance server, while scaling out is to distribute load among many low cost servers. When scaling up you add processors to your server, when scaling out, you add servers to your cluster.

 

Scale Up

Scaling up is the easiest fix, where throwing money at the problem will solve it for most, as long as you are willing to scale up your budget. Ultimately, there are levels of volume which even a 32 processor Windows Data Center Server can’t satisfy, and as XML/SOAP Web services usage grows, this kind of super-enterprise scalability will push the most powerful servers to their limits and demand alternate ways of achieving scalability. The following techniques focus on scaling out instead of up.

 

Microsoft Cluster Services

Before discussing any of the real approaches to scaling out with SQL Server, let’s bring light to a common misunderstanding. Microsoft Cluster Service (MSCS) does not do anything to help performance or scalability. Despite its misleading name, it does not enable you to scale out or distribute traffic. Instead, if one server fails, MSCS will fail over to a backup server, which will resume processing –- eventually. In addition to not adding any benefit in performance or scalability, MSCS can take an embarrassing 1 to 5 minutes to transfer its resources to the backup server and complete the failover process, during which time the database is down.

 

Federated Database Design

SQL Server Enterprise Edition comes with a feature called Distributed Partitioned Views (DPV), which enables you to create a federated database, to divide processing of queries between multiple separate database servers. With DPV, the database is partitioned into many member tables, storing each of those tables on separate database servers, and then creating a partitioned view over the set of member tables. DPV can be a useful way to gain scalability, as it is built into SQL Server, and client applications do not need to be aware of the fact that the database has been federated.

However, federated database design (which DPV is based on) has a few drawbacks which may prevent it from truly solving your massive scalability problem. The first flaw in this design is known as the ‘convoy effect’. Because multiple machines are involved in the processing of one query, the complete results of that query cannot be assembled until the slowest database in the federation has completed processing its portion of the view. This means that if you use several quad processor servers and one dual processor server in your federation, your queries may only execute at the speed of the dual processor. It also means that even if identical hardware is used on all servers in the federation, the server with the most popular tables (highest data access volume) may have more load than other servers in the federation, and slow response time while other servers in the federation have CPU cycles to spare.

The other big issue with federated databases is availability. Because the database’s data is physically split between servers in the federation, if any one database server goes down, the entire database system becomes unavailable, as it is no longer able to access part of the data needed to complete processing of queries. To aid this situation, Microsoft Clustering Services can be used on top of DPV, however this is largely cost prohibitive. Both DPV and failover capability via Microsoft Clustering Services are only available with the Enterprise Edition of SQL Server (four times more costly than Standard Edition). A backup database node would be needed for each database server in the federation, doubling the cost of hardware and database licenses, and doing nothing for performance or scalability, as the backup servers do not share in the workload, but sit idle until a failure occurs. That’s not a lot of value for your money.

There are as well, a few development pitfalls to be aware of when considering DPV. The biggest issue is that integrity constraints are not allowed. Integrity constraints can not be implemented because the data is split between separate machines. Writing complex triggers may alleviate some of this problem but adds other problems and additional maintenance of its own.

 

Continuous Replication

A clever way to achieve scalability when certain parts of a client or server application only require read access to the database, is to point those applications to a copy of the database on a separate database server. This can divert volume from the primary database, and can be achieved by using continuous transactional replication or log shipping. The power in this approach is that it is relatively easy to configure, and it allows you a quick solution to load balance your database read access.

The obvious drawback of this approach is that it does not use live data. It must be acceptable for clients accessing this database to use slightly delayed data. Depending on the level of volume, the amount of data which has changed, and the number of servers you are replicating to, replication delays may range from as low as 1 minute to 15 minutes or more. This may be suitable in cases where backend server applications perform complex queries or reporting on the database, and do not require up to the minute data. For example, if you are developing a calendar application, modifications to your calendar will be made to the master database. The application which users interact with to view and modify their calendar must access the master database, so that when a new entry is added, the subsequent calendar view will display the newly added entry. However a background process which calculates unused time segments for all calendar users, could access a replicated copy of the database, since this process does not necessarily require real-time data.

Using replicated database copies is not suitable when you need access to live dynamic data. Let’s say the calendar application above has a web service interface, in which multiple applications can interact with. Perhaps a human resources application which can mark planned vacation days on your calendar, and a conference call application, which can mark your calendar with a reminder about a scheduled conference call, both need update access your calendar. When multiple applications need transactional read and write access to the same data store, they must talk to the master database.

If the human resources application were to check for availability on the replicated copy, it might think the time slot is available when in fact a previous update to the data had not made its way to the replicated copy yet. As applications increasingly interoperate, live access to a single massively scalable data source becomes more and more important.


    Next Page>>    








C# Help and Tutorials | PHP MySQL Tutorial | Sharepoint Tutorial | Azure Tutorial | Cloud Hosting Magazine | ASP.NET Tutorials | Windows Server Help | Windows Phone Pro | Silverlight Ace | Visual Studio Tutorials | Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | SQL Server Training Videos | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 2010 Jude O'Kelly. All rights reserved