Hello guys, I would like to open a thread here at sql-server-performance about the advantages and performance impacts of 2 different ways to bring xml data in a relational format. Until now I used openxml and beside the fact that it is write-intensive and felt not really "natural" to me, it just worked fine. Now I wanted to switch to the new xmldata.nodes() method, but I read many posts about a really bad performance of this method. What do you think about? What are your experiences? Abstract Example: sp_xmlpreparedocument @doc output, @xmldata ... OPENXML () sp_xmlremovedocument VERSUS ... INNER JOIN (select POS.attr.value('@ID', 'char(32)') [UID], POS.attr.value('@AddressType', 'smallint') [ATyp], POS.attr.value('@Language', 'char(2)') [LangID] from @XMLData.nodes('/Shop') POS(attr)) XD Cheers P.S.: Sorry for my stressed post, If you do not get the point, please let me know
In my experience and documentation suggest that OPENXML () is more efficient to shred large xml document into relational data, following blogs should get you more information http://sqlblogcasts.com/blogs/simons/1302.aspx & http://www.perfectxml.com/articles/xml/openxml.asp as I have thin exp. on using XML.
I have noticed that inserting from XML is faster when using openxml and slower when using nodes(). Here is a script you can run to see the difference. The difference in performance is even more pronounced when the table is wider. Also, the execution plan for the openxml case is far simpler than for the nodes() case. I've read about supposed memory issues when using sp_xml_preparedocument (openxml), but I have not been able to force any bad memory issues (by running several scripts similar to this in parallel). I am on SQL 2005. I would love to know the results of this script on SQL 2008. Or any comments anyone has in general. --------------------------------------------- -- Set the number of rows declare @N int; set @N=10; -- Set the number of test repetitions declare @Reps int; set @Reps = 100; -- create the test table declare @TempTable table ( a int, b uniqueidentifier, c nvarchar(100), d datetime, e bit ); -- Create some data while (@N>0) begin insert into @TempTable (a, b, c, d, e) values (rand(1000), newid(), newid(), getdate(), 0); set @N = @N - 1; end declare @Xml xml; set @Xml = (select * from @TempTable for xml raw('ITEM'), type, elements, root('INSERTFROMXML'), binary base64); declare @StartTime datetime; declare @Counter int; declare @Elapsed1 int; declare @Elapsed2 int; -- Test Method 1: delete from @TempTable; set @Counter = 0; set @StartTime = getdate(); while (@Counter < @Reps) begin -- The slow part: insert into @TempTable select T.c.value('(a)[1]','int'), T.c.value('(b)[1]','uniqueidentifier'), T.c.value('(c)[1]','nvarchar(100)'), T.c.value('(d)[1]','datetime'), T.c.value('(e)[1]','bit') from @Xml.nodes('/INSERTFROMXML/ITEM') T(c) set @Counter = @Counter + 1; end set @Elapsed1 = datediff(ms, @StartTime, getdate()); -- Test Method 2: delete from @TempTable; set @Counter = 0; set @StartTime = getdate(); while (@Counter < @Reps) begin -- Not so slow any more: declare @idoc int; exec sp_xml_preparedocument @idoc output, @Xml; insert into @TempTable select * from openxml(@idoc, '/INSERTFROMXML/ITEM', 2) with ( a int, b uniqueidentifier, c nvarchar(100), d datetime, e bit ); exec sp_xml_removedocument @idoc; set @Counter = @Counter + 1; end set @Elapsed2 = datediff(ms, @StartTime, getdate()); -- Results: select 'Method 2 speedup: ' + convert(varchar, round(convert(float, @Elapsed1) / @Elapsed2, 1)) + 'x';
Hi,ran your script on mt SQL 2008 Dev box.@Reps = 100 gives 1.1@Reps = 100000 gives 4.1xSo thanks, Microsoft, for introducing another easy-to-use programming method that doesn't bloddy scale.I see more refactoring in my future, what fun.
Pete Welcome to the forums. I suggest to input your comments and suggestions for such feature using Microsoft Connect site - http://connect.microsoft.com and please do search if a similar suggestion has been proposed. Hope this helps.