VARCHAR (MAX), NVARCHAR (MAX) and NTEXT Data Types

Despite NTEXT and TEXT being deprecated in SQL Server for some time they are still both used in production systems. In this article, I will briefly demonstrate the difference between a VARCHAR (MAX), NVARCHAR (MAX) and the NTEXT data types, and the impact on performance from using NTEXT/TEXT.

Let’s create a table named Student using the below T-SQL.

create table student
(
student_id int identity(1,1),
student_name varchar(max)
)

Let’s insert a few records into the table using the below T-SQL.

insert student(student_name)
select 'abc'
union
select 'pqr'
union
select 'xyz'

Now let’s have a look at the query execution plan:

It is obvious from this that you do not have a Clustered Index on the table. Let’s create it using the below T-SQL:

create clustered
index clu_student_stuid on student(student_id)

Now look again at the query execution plan :

Now you can see that the Table Scan gets converted to a Clustered Index Scan which means that the base table is completely scanned by the Clustered Index. Now let’s try to improve it’s performance by creating a Non Clustered Index on the column named Student_name using the below T-SQL.

As you can see, a Non Clustered index isn’t allowed to be created on a column using VARCHAR(MAX) as a data type. This will hurt the performance when the there is a large volume of data inside the table and hence this is one of the major disadvantages of using the VARCHAR(MAX) data type.

A SQL Data row has a max limited size of 8000 bytes. Therefore a VARCHAR(MAX) which can be up to 2 GB in size cannot be stored in the data row and so SQL Server stores it "Out of row". VARCHAR(MAX) is overkill for many types of data such as person’s names, locations, colleges etc. Therefore instead of using VARCHAR(MAX) such such types of data, one should limit the Varchar size to a reasonable number of characters which will greatly improve performance.

If you have variable length Unicode string data then you can go either for an NVARCHAR or NTEXT but note the difference between these:

Let’s create a table named Student using the below T-SQL.

create table student
(
student_id int,
student_name ntext
)

Now we can insert a few records :

insert student(student_name)
select 'abc'
union
select 'pqr'
union
select 'xyz'

Now let’s have a look at the Statistic count using the below T-SQL:

set statistics io on
select * from student
set statistics io off

Note that since the data type of the student_name column is NTEXT the query optimizer is treating the data in that column as a Large Object.

Now let’s change the data type from NTEXT to NVARCHAR(MAX) :

alter table student
alter column student_name nvarchar(max)

Now, look at the Statistics count again :

set statistics io on
select * from student
set statistics io off

Still not good as despite changing the data type of the column, the LOB Logical Read count is still the same (ie 3). This is because SQL Server has maintained the data in the Large Object structure and now uses a pointer as a reference to extract the text from this column. In order to fix this problem, execute the below T-SQL.

update student
set student_name = student_name

Now let’s have a look at the Statistics count again:

Now the query optimizer does not treat is as a LOB Data. Therefore, from a performance standpoint NVARCHAR data type is always superior to NTEXT.

Please let us know if you have any comments or suggestions. Alternatively you can email me at singhsatnam84@yahoo.com ..




Related Articles :

15 Responses to “VARCHAR (MAX), NVARCHAR (MAX) and NTEXT Data Types”

  1. Good and Knowledgeable Article.. Thanks!!!

  2. Hey let loot at this from other side. Blob data always stores out side of the main data. it means it will be better solution when you hardly ever want to get blob data without filter by this data. So using of blob data are not bad solution sometime.

  3. But sometimes is good for VARCHAR(MAX) or NVARCHAR(MAX) to be treated as LOB data.
    When the list queries don’t require the field, just used on the form, having the table split into two data pages is good since it has less reads to get the data without the LOB fields.. this should also be taken in concern…

  4. But what about fragmentation? Text is a consistent size (a pointer to another location), and thus fragmentation is lower than when using Varchar, assuming you will never want to index the Text. Is this correct?

  5. I have to tell you that if I ever caught a developer making a Student_Name column using any form of blob, I’d have to chase them until their heels smoked.

    Perhaps a more appropriate article would be on the extreme advantages of “right sizing”. Just having a BLOB in a table prevents ONLINE indexing of the clustered index in the Enterprise Edition even if the column IS NOT one of the key columns.

  6. Nice one !!!

  7. Great article, nice and easy to understand for a any novice!!!

  8. Nice article.

  9. Very intuitive article, thank you!

  10. Perfect article!!! Thank you!

  11. sometimes I create nvarchar(max) columns but I index on a calculated column of the first N positions: fits many query needs while matching the size of a string in the application – if you say nvarchar(200), you can be sure you will need 205 one of these days etc

  12. Hi,

    Can you clear one point. What is the benefit of executing the below statement ?

    update student
    set student_name = student_name

    Thanks
    Sham

  13. bPU3mJ This is one awesome blog post. Really Cool.

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |