VARCHAR (MAX), NVARCHAR (MAX) and NTEXT Data Types

Despite NTEXT and TEXT being deprecated in SQL Server for some time they are still both used in production systems. In this article, I will briefly demonstrate the difference between a VARCHAR (MAX),
NVARCHAR (MAX) and the NTEXT data types, and the impact on performance from using NTEXT/TEXT.

Let’s create a table named Student using the below T-SQL.

create table student

(
student_id int identity(1,1),
student_name varchar(max)
)

Let’s insert a few records into the table using the below T-SQL.

insert student(student_name)
select 'abc'

union
select 'pqr'

union
select 'xyz'

Now let’s have a look at the query execution plan:

It is obvious from this that you do not have a Clustered Index on the
table. Let’s create it using the below T-SQL:

create clustered
index clu_student_stuid on student(student_id)

Now look again at the query execution plan :

Now you can see that the
Table Scan gets converted to a Clustered Index Scan which means that the base
table is completely scanned by the Clustered Index. Now let’s try to improve
it’s performance by creating a Non Clustered Index on the column named
Student_name using the below T-SQL.

As you can see, a Non Clustered index isn’t allowed to be created on a column
using VARCHAR(MAX) as a data type. This will hurt the performance when
the there is a large volume of data inside the table and hence this is one of the major
disadvantages of using the VARCHAR(MAX) data type.

A SQL Data row has a max limited
size of 8000 bytes. Therefore a VARCHAR(MAX) which can be up to 2 GB in size cannot
be stored in the data row and so SQL Server stores it "Out of row". VARCHAR(MAX) is overkill for many types of data such as person’s names, locations, colleges etc. Therefore instead of
using VARCHAR(MAX) such such types of data, one should limit the Varchar size
to a reasonable number of characters which will greatly improve performance.

If you have variable
length Unicode string data then you can go either for an NVARCHAR or NTEXT but note the difference between these:

Let’s create a table named
Student using the below T-SQL.

create table student

(
student_id int,
student_name ntext
)

Now we can insert a few records :

insert student(student_name)

select 'abc'

union
select 'pqr'

union
select 'xyz'

Now let’s have a look at the
Statistic count using the below T-SQL:

set statistics io on
select * from student
set statistics io off

Note that
since the data type of the student_name column is NTEXT the
query optimizer is treating the data in that column as a Large Object.

Now let’s change the data type
from NTEXT to NVARCHAR(MAX) :

alter table student

alter column student_name nvarchar(max)

Now, look at the
Statistics count again :

set statistics io on

select * from student

set statistics io off

Still not good as despite changing the
data type of the column, the LOB Logical Read count is still the same (ie 3).
This is because SQL Server has maintained the data in the Large Object
structure and now uses a pointer as a reference to extract the text from this
column. In order to fix this problem, execute the below T-SQL.

update student

set student_name = student_name

Now let’s have a look at
the Statistics count again:

Now the query optimizer
does not treat is as a LOB Data. Therefore, from a performance standpoint NVARCHAR data type is
always superior to NTEXT.

Please let us know if you have any comments or suggestions. Alternatively you
can email me at singhsatnam84@yahoo.com ..




Array

No comments yet... Be the first to leave a reply!