Use nvarchar() to be on the safe-side? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Use nvarchar() to be on the safe-side?

Hi, We have an application that must support English and French. The application currently can run with 1 byte character set. The developers choose to use Nchar, Nvarchar and Ntext just to be in the safe side. I would like to convert them to char, varchar, text. Mainly to increase the maxlength capacity (varchar can go up to 8000 in length while NVAR only 4000). And secondarily to save some storage space. Later on, when we’ll really need Unicode support, I plan to convert to Unicode 2 bytes encoding on a column by column basis (i.e only when needed). Q1: Does it sound like a reasonable plan or can I just go with Unicode for all text-based data? Q2: Is there any difficulty to convert Unicode <-> non-Unicode data? Providing that we only have English and French text. Thanks very much in advance.
IN your case UNICODE is the suitable one to deploy in order to deal with french text.
Unicode data is stored using the nchar, nvarchar, and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set. Refer to the MSDN and Technet links about dealing with UNICODE data in sQL server. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
All the French accented and cedille’ed characters are in the lower ASCII range (below 255). Doesn’t that mean they have the same ASCII value regardless of code page/collation? I’m not entirely sure about ASCII values for accented capitals – though I don’t think they are commonly used in French.
Although no answer to your question, this site might turn out useful:
http://www.unicode.org/ —
Frank
http://www.insidesql.de

English and French are displayed OK using the same 1 byte page code. There is no need to use Unicode. I have read SQL2K Books online and do know the advantages / disadvantages of Unicode. My question was about the design approach that I can summarize as "Because Unicode covers all text encoding scenarios, let use it even if we don’t need it right now". The obvious disadvantage I’ve found so far are: more storage space (not really a concern) and max column length is limited to 4000 instead of 8000. I would like to know from your experience where you have used Unicode and had possibly experienced some maintenance issues such as slow SELECT, data conversion, bloated storage space, etc.
I never noticed such performance issues with one of our application where we use UNICODE to store data that is dealt in all European languages. I think both Unicode and non-Unicode sorting are compatible with string comparison rules in a particular version of Windows. This provides consistency across data types in SQL Server. In a SQL collation, SQL Server defines different comparison semantics for non-Unicode data. SQL Server bases these comparison semantics on a SQL "sort order." For a mapping of sort orders to SQL collations, see the "SQL Collation Name" topic in SQL Server Books Online. One of the KBA refers
Generally, the degree of performance difference between the Windows and the SQL collations will not be significant. The difference only appears if a workload is CPU-bound (as opposed to being constrained by I/O or by network speed), and the majority of this CPU burden is caused by the overhead of string manipulation or comparisons performed in SQL Server. An example of an application where the performance difference might be pronounced is a system where an application passes a long string value to a SQL Server stored procedure. The stored procedure then parses the string through extensive use of Transact-SQL string manipulation functions like CHARINDEX or PATINDEX. If the workload is fairly one-dimensional and it is dominated by executions of this string parsing stored procedure, the difference in performance between a SQL collation and a Windows collation might be noticeable. However, the design of most applications does not lead to a situation where the performance difference is significant. If you are using Unicode data types (nchar, nvarchar, ntext), there is no difference in the sorting behavior for the SQL and the Windows collations. Both will use Unicode sorting rules
So if the hardware is compatible enough to work under such stressed queries I don’t see any particlar issue to fine tune.
HTH Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
Hi Satya, Thank you very much for your quick and thorough answer. Our hardware is OK to sustain the load. We’ll go with Unicode then.
]]>