SQL Server Performance

Analysis of char field allocation and usage

Discussion in 'Contribute Your SQL Server Scripts' started by Chappy, Jul 4, 2003.

  1. Chappy New Member

    Some time ago, I was asked to help improve a friends database which was suffering from poor query performance. <br />The database was in a bit of a state, especially the field types which were very badly designed, resulting in <br />a database of over 1.3G. The database was created by a developer who just wanted to use sql purely to store <br />data, and not really concerned about performance until it became a problem, and consequently it had a lot <br />of char(255) fields, and lots of other excessive data.<br /><br />By querying lots of fields and their usage, I managed to redesign the database down to under 100 meg, <br />but this was a very laborious task.<br /><br />As a bit of an exercise, I decided to write a script which would help highlight badly designed character <br />fields. The script is far from complete, and only makes suggestions for fields to review, they are not <br />intended to be hard and fast recommendations. It is mainly just for a bit of fun, and to see whats possible. <br /><br />Im sure the algorithms could use a lot of fine tuning, and so if anyone wants to improve on it, please do so. <br />I plan to add support to suggest when varchar fields might benefit from being straight chars, and maybe even <br />deduce when nvarchar or nchar is being used without good reason (if thats even possible to discover!)<br /><br />Also, the script is very slow due to the use of cursors. Be warned! You may want to add a TOP clause to <br />the cursors source query, by default it looks at all char based fields in all user tables, in the current <br />database.<br /><br /><pre><br />DECLARE @tableName varchar(12<img src='/community/emoticons/emotion-11.gif' alt='8)' /><br />DECLARE @colName varchar(12<img src='/community/emoticons/emotion-11.gif' alt='8)' /><br />DECLARE @dataType varchar(12<img src='/community/emoticons/emotion-11.gif' alt='8)' /><br />DECLARE @maxCapacity int<br />DECLARE @objid int<br />DECLARE @qry nvarchar(4000)<br />SET NOCOUNT ON<br /><br />DECLARE curs_cols CURSOR LOCAL FAST_FORWARD FOR<br />SELECT<br />TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, id <br />from <br />sysobjects<br />INNER JOIN INFORMATION_SCHEMA.COLUMNS ON (TABLE_NAME = name)<br />where <br />xtype = 'U' and <br />DATA_TYPE in ('char', 'nchar', 'nvarchar', 'varchar')<br /><br />OPEN curs_cols<br />FETCH FROM curs_cols into @tableName, @colName, @dataType, @maxCapacity, @objId<br /><br />CREATE TABLE #res (<br />R_TableName varchar(12<img src='/community/emoticons/emotion-11.gif' alt='8)' />, <br />R_ColName varchar(12<img src='/community/emoticons/emotion-11.gif' alt='8)' />, <br />R_datatype varchar(12<img src='/community/emoticons/emotion-11.gif' alt='8)' />,<br />R_maxcapacity int,<br />R_MaxLen int, <br />R_MinLen int,<br />R_AvgLen int,<br />R_RecCount int)<br /><br />WHILE (@@FETCH_STATUS = 0) <br />BEGIN<br />SELECT @qry = <br />N'SET NOCOUNT ON ' + CHAR(13) + <br />'DECLARE @avgLen int ' + CHAR(13) + <br />'DECLARE @maxLen int ' + CHAR(13) + <br />'DECLARE @minLen int ' + CHAR(13) + <br />'DECLARE @recCount int ' + CHAR(13) + <br />'SELECT @minLen=MIN(ISNULL(LEN(' + @colName + '), 0)), ' + CHAR(13) + <br />'@maxLen=MAX(ISNULL(LEN(' + @colName + '), 0)), ' + CHAR(13) + <br />'@avgLen=AVG(ISNULL(LEN(' + @colName + '), 0)), ' + CHAR(13) +<br />'@recCount=COUNT(*) FROM ' + @tableName + CHAR(13) +<br /><br />'IF (@recCount &gt; 0) ' + <br />'INSERT INTO #res (R_Tablename, R_ColName, R_DataType, R_MaxCapacity, R_MaxLen, R_MinLen, R_AvgLen, R_RecCount) ' + CHAR(13) + <br />'Values (''' + @tableName + ''', ' + <br />'''' + @colName + ''', ' + <br />'''' + @dataType + ''', ' + <br />CONVERT(nvarchar, @maxCapacity) + ', ' + <br />'CONVERT(nvarchar, @maxLen), ' + <br />'CONVERT(nvarchar, @minLen), ' + <br />'CONVERT(nvarchar, @avgLen), ' + <br />'CONVERT(nvarchar, @recCount))'<br />exec sp_executesql @qry<br />FETCH FROM curs_cols into @tableName, @colName, @dataType, @maxCapacity, @objId<br />END<br /><br />-- try to guess if char fields might perform better, or save significant space as <br />-- varchars, or a smaller capacity<br />select <br />R_TableName,<br />R_ColName,<br />R_Datatype,<br />R_Maxcapacity,<br />R_MaxLen,<br />R_MinLen,<br />R_AvgLen,<br />R_RecCount,<br />"Suggestion" = <br />CASE <br />WHEN (R_MaxLen = 0) THEN <br />'This field appears to be unused. Drop field?'<br />WHEN ((R_MaxLen &gt; CEILING(1.1*R_AvgLen)) or (R_MinLen &lt; FLOOR(0.9*R_AvgLen))) and (R_MaxLen &lt; R_MaxCapacity) THEN<br />'This '+R_DataType+'('+CONVERT(VARCHAR,R_MaxCapacity)+') is under utilised, but variable. Consider a varchar('+CONVERT(VARCHAR,R_MaxLen)+') ?'<br />WHEN (R_MaxLen = R_MinLen) THEN<br />'This '+R_DataType+'('+CONVERT(VARCHAR,R_MaxCapacity)+') is under utilised but content is apparently fixed length. Consider a char('+CONVERT(VARCHAR,CEILING(R_MaxLen))+') ?'<br />WHEN (R_MaxLen &lt; FLOOR(0.9*R_MaxCapacity)) THEN<br />'This '+R_DataType+'('+CONVERT(VARCHAR,R_MaxCapacity)+') is under utilised but reasonably consistent. Consider a char('+CONVERT(VARCHAR,CEILING(R_MaxLen*1.1))+') ?'<br />ELSE<br />'Seems Ok.'<br />END,<br /><br /> -- This field is VERY approximate. Tries to count chars, not bytes. <br /> -- Also, makes wild assumptions about the distribution of string lengths <br /> -- in the field<br />"Total chars saved (Approx/Avg)" = <br />CASE <br />WHEN (R_MaxLen = 0) THEN <br />R_RecCount*R_MaxCapacity<br />WHEN ((R_MaxLen &gt; CEILING(1.1*R_AvgLen)) or (R_MinLen &lt; FLOOR(0.9*R_AvgLen))) and (R_MaxLen &lt; R_MaxCapacity) THEN<br />R_RecCount*(R_MaxLen-R_AvgLen)<br />WHEN (R_MaxLen = R_MinLen) THEN<br />R_RecCount*(R_MaxCapacity-R_MaxLen)<br />WHEN (R_MaxLen &lt; FLOOR(0.9*R_MaxCapacity)) THEN<br />R_RecCount*(FLOOR(0.9*R_MaxCapacity) - CEILING(R_MaxLen*1.1))<br />ELSE<br />0<br />END <br />from <br />#res <br />where <br />R_DataType in ('nchar', 'char') and<br />(R_MaxLen &lt; FLOOR(R_MaxCapacity*0.<img src='/community/emoticons/emotion-11.gif' alt='8)' />) and<br />(R_MaxCapacity &gt; 10)<br /><br />-- #TODO - if a varchar field does not deviate much, it may perform better as a char field<br /><br />DROP TABLE #res <br />CLOSE curs_cols<br />DEALLOCATE curs_cols<br /><br />SET NOCOUNT OFF<br /></pre><br /><br />Heres what sort of output you can expect...<br /><br /><pre><br />R_TableName R_ColName R_Datatype R_Maxcapacity R_MaxLen R_MinLen R_AvgLen R_RecCount Suggestion Total chars saved (Approx/Avg) <br />---------------------------------- ----------- ------------- ----------- ----------- ----------- ----------- ----------------------------------------------------------------------------------------------------------------------------- <br />RHTEMP RH_Name char 255 74 49 56 15 This char(255) is under utilised, but variable. Consider a varchar(74) ? 270<br />RHTEMP RH_Going char 255 12 4 5 15 This char(255) is under utilised, but variable. Consider a varchar(12) ? 105<br />RHTEMP RH_Type char 50 10 4 5 15 This char(50) is under utilised, but variable. Consider a varchar(10) ? 75<br />RHTEMP RH_PenaltyValue char 50 8 7 7 15 This char(50) is under utilised but reasonably consistent. Consider a char(9) ? 540<br />RHTEMP RH_RaceType char 50 0 0 0 15 This field appears to be unused. Drop field? 750<br />RHTEMP RH_Classification char 50 8 6 7 15 This char(50) is under utilised but reasonably consistent. Consider a char(9) ? 540<br />Analysis A_Speed char 255 0 0 0 360 This field appears to be unused. Drop field? 91800<br />Analysis A_Other char 100 0 0 0 360 This field appears to be unused. Drop field? 36000<br />Analysis A_RaceType char 50 18 6 7 360 This char(50) is under utilised, but variable. Consider a varchar(1<img src='/community/emoticons/emotion-11.gif' alt='8)' /> ? 3960<br />Analysis A_Date char 20 10 10 10 360 This char(20) is under utilised but content is apparently fixed length. Consider a char(10) ? 3600<br />Analysis A_Time char 20 4 4 4 360 This char(20) is under utilised but content is apparently fixed length. Consider a char(4) ? 5760<br />Analysis A_RaceName char 255 96 29 51 360 This char(255) is under utilised, but variable. Consider a varchar(96) ? 16200<br />Analysis A_Draw char 15 2 0 1 360 This char(15) is under utilised but reasonably consistent. Consider a char(3) ? 3600<br />Analysis A_ChkDistance char 20 0 0 0 360 This field appears to be unused. Drop field? 7200<br />Analysis A_Going char 30 6 4 4 360 This char(30) is under utilised, but variable. Consider a varchar(6) ? 720<br />RH RH_Name char 255 115 26 58 44734 This char(255) is under utilised, but variable. Consider a varchar(115) ? 2549838<br />stats Stat_RatingType char 100 49 5 26 223 This char(100) is under utilised, but variable. Consider a varchar(49) ? 5129<br />stats Stat_FlatJump char 100 5 4 4 223 This char(100) is under utilised but reasonably consistent. Consider a char(6) ? 18732<br />RHead RHD_RaceName char 255 96 29 51 360 This char(255) is under utilised, but variable. Consider a varchar(96) ? 16200<br /><br /></pre><br /><br /><br /><br /><br /><br />
  2. Chappy New Member

    PS: Ignore the CHAR(13) carriage return appends. This was primarily used for debugging the dynamic SQL
  3. vbkenya New Member

    Pretty cool script on its way to becoming a great tool.

    I haven't noticed any speed issues on my small-to-medium-sized databases. But if you are still worried, would a set based approach improve the speed?

    Nathan H.O.
  4. Chappy New Member

    Hi vbkenya, thanks for the feedback. I dont think its possible to do with a set based approach, as I need to be able to dynamically pass the table and field names to the query which gets the min/max/avg length etc. Or did you mean try to query many of the fields in one table ina single query ?

    I also considered querying system tables or column statistics to see if min/max length was held anywhere but it appears not.
  5. vbkenya New Member

    I was just thinking about replacing cursor usage with temporary tables. Very unfortunate about the min/max situation - one would think MS should try to implement this kind of thing somewhere.

    Nathan H.O.

Share This Page