VARCHAR (MAX), NVARCHAR (MAX) and NTEXT Data Types

Despite NTEXT and TEXT being deprecated in SQL Server for some time they are still both used in production systems. In this article, I will briefly demonstrate the difference between a VARCHAR (MAX), NVARCHAR (MAX) and the NTEXT data types, and the impact on performance from using NTEXT/TEXT.

Let’s create a table named Student using the below T-SQL.

create table student
(
student_id int identity(1,1),
student_name varchar(max)
)

Let’s insert a few records into the table using the below T-SQL.

insert student(student_name)
select 'abc'
union
select 'pqr'
union
select 'xyz'

Now let’s have a look at the query execution plan:

It is obvious from this that you do not have a Clustered Index on the table. Let’s create it using the below T-SQL:

create clustered
index clu_student_stuid on student(student_id)

Now look again at the query execution plan :

Now you can see that the Table Scan gets converted to a Clustered Index Scan which means that the base table is completely scanned by the Clustered Index. Now let’s try to improve it’s performance by creating a Non Clustered Index on the column named Student_name using the below T-SQL.

As you can see, a Non Clustered index isn’t allowed to be created on a column using VARCHAR(MAX) as a data type. This will hurt the performance when the there is a large volume of data inside the table and hence this is one of the major disadvantages of using the VARCHAR(MAX) data type.

A SQL Data row has a max limited size of 8000 bytes. Therefore a VARCHAR(MAX) which can be up to 2 GB in size cannot be stored in the data row and so SQL Server stores it "Out of row". VARCHAR(MAX) is overkill for many types of data such as person’s names, locations, colleges etc. Therefore instead of using VARCHAR(MAX) such such types of data, one should limit the Varchar size to a reasonable number of characters which will greatly improve performance.

If you have variable length Unicode string data then you can go either for an NVARCHAR or NTEXT but note the difference between these:

Let’s create a table named Student using the below T-SQL.

create table student
(
student_id int,
student_name ntext
)

Now we can insert a few records :

insert student(student_name)
select 'abc'
union
select 'pqr'
union
select 'xyz'

Now let’s have a look at the Statistic count using the below T-SQL:

set statistics io on
select * from student
set statistics io off

Note that since the data type of the student_name column is NTEXT the query optimizer is treating the data in that column as a Large Object.

Now let’s change the data type from NTEXT to NVARCHAR(MAX) :

alter table student
alter column student_name nvarchar(max)

Now, look at the Statistics count again :

set statistics io on
select * from student
set statistics io off

Still not good as despite changing the data type of the column, the LOB Logical Read count is still the same (ie 3). This is because SQL Server has maintained the data in the Large Object structure and now uses a pointer as a reference to extract the text from this column. In order to fix this problem, execute the below T-SQL.

update student
set student_name = student_name

Now let’s have a look at the Statistics count again:

Now the query optimizer does not treat is as a LOB Data. Therefore, from a performance standpoint NVARCHAR data type is always superior to NTEXT.

Please let us know if you have any comments or suggestions. Alternatively you can email me at singhsatnam84@yahoo.com ..

]]>

Leave a comment

Your email address will not be published.