Implementing SQL Server in an OLTP Environment – Page 2

Entity Design

The creation and design of tables (entities) and their fields (attributes) should be a process driven by meetings with the business user. The business support for the application will be vital to the correct design of tables and relationships in the OLTP data model. At each stage of logical modeling, the business user should be given the opportunity to verify and correct the relationships and table structures presented by the database developers.

Designing entities in SQL Server requires a database developer to have a full understanding of data types and the ramifications of their use. Specific data types can be used in a variety of situations. However, not all data types fit a design situation as readily as others.

Data Type Specifics

VARCHAR vs CHAR

A varchar is a variable length data type that holds character data. It differs from a char because a char value is a fixed length data type. In other words, a varchar (30) holds up to 30 characters. However, if a 10 character value is stored in this field, the space used in SQL Server is only that of 10 characters. A char(30) containing 10 characters requires SQL Server storage space of 30 characters.
Varchar should be used when the length of character data in a field is expected to vary. A char data type should be used when consistently expecting data to be the same length.
Nvarchar and Nchar data types support Unicode data (international character data) and require twice the space of their Non-Unicode counterparts. These data types should only be used when anticipating Unicode data.

INT vs DECIMAL (NUMERIC)

The int data type only accepts whole number numeric data. The decimal data type accepts numeric data with a definable scale and precision. The numeric data type is a synonym for the decimal data type and will not be supported in future releases of SQL Server.
Use the int data type where appropriate. Storage space requirements are smaller than that of a decimal. The int data type accepts values ranging from -2,147,483,648 to 2,147,483,647. However, the int data type would be inappropriate to use when attempting to house data such as a phone number (a phone number of 214-748-3650 or 2147483650 would never be accepted into an integer field because it exceeds the value that the int data type can hold. Use a char data type for phone number fields. In this manner, developers and coders can search for area codes and other patterns in the phone number field).
Other entity design issues include several considerations that database developers should take into account when designing the logical and physical mode. For example, field names should be logically descriptive and related to their business function and significance.

Field names should not include any characters except alphanumeric (preferably alpha) characters. Spaces, control characters, and other symbols all lead to contribute to allowances that developers and coders must make to accommodate outlandish field names.

Additionally, field names should be concise and not overly verbose or lengthy. This alleviates the strain placed on developers and coders who must utilize these field names in code.

Design Stored Procedures

The role of stored procedures in a business application is invaluable to the entire solution process. Any business transaction application should make use of stored procedures to execute read and DML statements from SQL Server. Even if the middle business logic tier contains the majority of the business logic, the business layer should use stored procedures to communicate directly with SQL Server. Stored procedures offer a palpable performance advantage even if the procedures are supplying read only data to an application through a business logic tier.

Stored procedures are precompiled in SQL Server’s procedure cache. When an application or business tier server product (such as MTS) submits data to SQL Server for processing, they typical steps involved in ad hoc query processing need not occur. Steps such as syntax check, object verification and query plan inception, evaluation and verification need not occur. Because the stored procedure has been created and compiled, these aforementioned steps have already been accomplished and stored in SQL Server procedure cache. The execution plan is read from cache and then executed.

Communicating with Stored Procedures

Data to the Client

Applications utilizing Active X Data Objects (ADO) or OLE DB will interact with stored procedures output on a recordset level. Any read data required by an application for presentation to the user should be retrieved through a stored procedure. The application using ADO or OLE DB will interpret the results of the select statement in a stored procedure as a recordset. Developers can then use the recordset to populate forms, reports or other client interfaces.

Data from the Client

Applications can directly make reference to a stored procedure, independent of a business logic tier. Using ADO, an application can collect data from the user interface (ASP forms, VB forms), call the stored procedure and pass form data to a stored procedure as an ADO parameter. The SQL Server stored procedure accepts the input parameters and executes the SQL task.

Business logic tier products such as MTS facilitate the creation of logical objects which the client application references. When the client makes reference to a desired transactional function (i.e., Add an new customer), the client application calls an MTS object that then initiates the communication with SQL Server on the client’s behalf. The MTS object calls the stored procedure that then executes on the SQL Server. Return values from procedure execution can be passed directly to the client.

Stored Procedure Design

Proper stored procedure programming is similar to the concepts of programming T-SQL batches or T-SQL scripts. The contents of a stored procedure consist of any T-SQL that can be written using a batch or script. All SQL programming techniques should be implemented, including maintaining ANSI compatibility.

Stored procedures cannot contain many Data Definition Language (DDL) statements. Views, defaults, new stored procedures or triggers are among some of the objects that stored procedures cannot create when they are called. However, other database objects can be created, such as tables or temporary tables. Stored procedures can also reference other stored procedures.

Error handling should be an integral component of a stored procedure. Using error handling techniques such as the RAISERROR statement, T-SQL programmers can included custom defined error messages that can be returned to and interpreted by the client. The client can then present the user with the exact SQL Server error message or further customize it based on programming in the client presentation layer. Integration of custom error messages with the sysmessages table using the sp_addmessage system stored procedure allows for ease of message management and consistency.

Input parameters in a stored procedure should be set to a null value. Using this method, the stored procedure allows for a condition where the client application does not pass all expected input parameters. The stored procedure can still execute by evaluating the other parameters.

Naming conventions for stored procedures should follow a logical and consistent formula. Similar to a table object naming convention, stored procedure names should not contain control characters, spaces or symbols. Stored procedure names should also avoid utilizing the ‘sp_’ prefix as this prefix is used by SQL Server system stored procedures. A more appropriate methodology might be to name a procedure ‘P_’ + object name.

Transactional Consistency

Stored procedures should encompass transactional consistency. In other words, if a business transaction, initiated by the client, passed via a business tier server object (MTS) to SQL Server, requires several updates, deletes and inserts to multiple tables to complete, all of the required SQL DML statements should execute. Otherwise, if a catastrophic event occurs, all SQL statements defining the transaction should be rolled back.

Using this method, transactions requiring multiple DML statements will complete in their entirety, or not at all. This will increase data integrity in situations where a multipart transaction begins, completes half of the required SQL DML statements and is interrupted (due to power outages or catastrophic hardware or software failure).

When SQL Server is recovered or the server reboots and the SQL Server service restarts, the transaction log is scanned for uncommitted transactions. If the transaction is not committed by the stored procedure, all of the SQL DML initiated in the transaction will be rolled back. The transaction must then be executed again and complete in its entirety.

Construct an Indexing Strategy

In order to increase efficiency, concurrency and performance of a business application based in SQL Server, indexing of tables is required. Indexes allow for faster searching, inserting, deleting and updating of records in an OLTP schema. By utilizing SQL Server clustered and nonclustered indexes, a database developer can decrease the time and resources required to gather data from SQL Server and report it to a client as well as implement DML statements.

Indexing directly affects the time required to physically access the rows in a specific table. The indexing structure itself exists as a separate balanced tree structure (B-Tree) on disk. A clustered index physically restructures the records in a given table to conform to the index structure. As a result, only one clustered index can exist per table.

A nonclustered index does not physically alter the sequence of rows in a table, but contains pointer to the table data directly from its organized B-Tree structure. If a nonclustered index exists on a table that has a clustered index, the nonclustered index will utilize the clustered index as its B-Tree leaf level. Hence, any value being search in a field with a nonclustered index will be found using the clustered index. If a nonclustered index exists on a table without a clustered index, the nonclustered index B-Tree root exists as a pointer to the row identifier (RID).

It is important for database developers and application designers to realize that index strategy development and implementation is an ongoing process. Well after the initial test and production implementation of the business application, developers and database administrators should be examining the production database activity.

Using tools such as the SQL Server performance monitor counters and the SQL Server Profiler, developers should obtain a good sense of where the hotspots in the OLTP schema are occurring and where indexing should be implemented. Using the SQL Server Index Tuning Wizard, developers can allow SQL Server to analyze production database activity and recommend indexes from a sample trace file.

Index Determinants

Database developers should conduct a through examination of the logical model prior to production implementation to determine the optimal configuration of indexes based on an analysis of anticipated database ‘hotspots’. Hotspots (centers of database activity due to the transactional nature of the application) should be eliminated through accurate database design. However, due to the transaction nature of a business, many of these hotspots cannot be reduced or removed. Indexing represents an alternative method to resolving hotspots, but efficient database design is the primary method to relive intensified localized database activity.

Due to the characteristics of an OLTP environment, a large amount of activity will be centered on searching tables. Even DML statements must search tables for a specific value when a where clause is specified. Because of the typically complex relationships existing between entities in application based data model, indexing will optimize read/write time.

An index should be applied where rapid row lookup is required. Because an OLTP environment has a large amount of relationships, much of the lookup activity is done via primary keys and foreign keys. These key fields should be integer values (int data type) and are primary candidates for indexes. A clustered index is best applied to a primary key. The primary key will often be utilized in looking up values. Nonclustered indexes will enhance join performance when placed on foreign key columns.

Indexes derive their size from the type of data housed in the column. Clustered Index size should be kept as small as possible, primarily because other nonclustered indexes on the same table use the clustered index key as their final leaf node.

Continues…

Leave a comment Cancel reply