SQL Server Performance

What Settings Will Speed This Up?

Discussion in 'Performance Tuning for DBAs' started by peridian, Jun 9, 2008.

  1. peridian New Member


    This has been driving me nuts for a few days now. I'm more a developer than DBA, but the answer I suspect lies in the Server configuration. Apologies for the length of this, but I have tried a lot of things.
    We get a monthly database dump in the form of a 1.5Gb csv file, comma delimited with quote text qualifiers, no headers. This is too much for Access to handle, so I decided we needed an SQL Server backend. The only licensed install we have available for this is SQL Server 2000 (SP3).
    There are two other large(ish) files that we need to import and then compare against the large file to flag certain records. The routine of doing all of this production needs to be automated and made simple for anybody without SQL skills to run.
    I place the three source data files on a shared folder on the same box as the SQL Server instance. As I needed a front end for the user, as well as an environment that would be long-term maintainable by anybody who takes over, I decided on using an MS Access and VBA solution to act as the front end and central processing point.
    It appears that the BULK INSERT command does not have the ability to interpret text qualifiers (please do let me know if this is incorrect). To allow automation of importing the data, I decided to place an Access database on the file share, and simply create linked tables to the source files. I then create a linked server on the SQL Server instance to that Access database.
    As SQL Server cannot 'see' linked tables, I create a simple SELECT * query for the linked table, which the SQL Server can see as a View. This allows me to query the text files as though they were tables in a database. As I understand it, this is simply a long winded way of doing this, as SQL Server uses the Jet Database Connection Provider anyway when accessing text files. The file specification for the link is stored in the Access database.
    This way, the user can open the Access database, get a form with the various controls on to run the processes, and select what they want to do. This database will connect to the server, and pass instructions on importing and processing the data. Although the user is opening the Access database across the network, no data transfer takes place, as the server will access the database locally, acting as another user, and using the database as a 'bridge' to the text files. It also makes the whole process back-end independant, as all the code and statements to create the database and objects are called from the front-end.
    It's worth mentioning that the live server is a dual-core, 4Gb RAM machine. Also, all the development of these routines and SQL statements is actually being done on a stand-alone install of SQL Server 2005 Express Edition. Although there are some differences in the SQL statements between the two servers, they are minimal. I knew that performance on the Express edition would be limited, even though the machine it is on is more powerful, due to the limitations in Express (i.e. 1 core, 1Gb of RAM).

    Dynamic SQL vs Stored Procedures:
    Initially, I wrapped the commands to transfer the data inside stored procedures (something I don't like doing). I went with an INSERT INTO command over SELECT INTO. Although SELECT INTO supposedly is not logged during Bulk Logged recovery model, I found very little difference in the times between the two. I also prefer being able to explicity create the tables and their column types. Finally, I wanted to retain the previous contents of the table, instead of having them wiped out each time.
    When I ran this on SQL Server 2000, it took approximately 8-10 minutes. This was both when called through the Enterprise Manager, and when called through the ADODB connection created in the VBA code within the MS Access front end. Great.
    When run on SQL Server 2005, about 12-13 minutes. I then found out about SQL Server having a similar caching approach as Oracle, so in theory it would not matter whether I called dynamically generated SQL or a stored procedure. I tested this on 2005, both through ADODB and in the Management Studio, and found that it varied abit more in times, but it did indeed make no difference which I did. Both took about 13 minutes.
    When I ported all this back to the Server 2000, the ADODB connection suddenly was timing out after 15 minutes. I extended the timeout on the connection, and found it was now taking around 40-50 minutes to do something that previously took 10.
    When I copied the statement into the Enterprise Manager, it still only took 10 minutes.

    Proposed Solution:
    I have since discovered all the articles online about the ARITHABORT setting. Apparently this can have this effect on SQL Server 2000 installations. It gets turned on when you turn ANSI_WARNINGS on, which I had had to do in the stored procedure as I was getting a compile error without it. Hence the stored procedures did not suffer, but my statement did.
    So I tried adding SET ARITHABORT ON as the first command called when the connection is created in VBA. I tried calling EXEC ('SET ARITHABORT ON'). I tried using ALTER DATABASE DatabaseName SET ARITHABORT ON. I tried a combination.
    This had no effect. I then tried including ANSI_WARNINGS, and even ANSI_NULLS in all this. Nothing. I then tried setting these options using the sp_configure command. Here I think I screwed my server over. I tried to set all the standard options as per Microsofts MSDN site, with no outcome. If anything it got slower.
    So I decided to set them back to nothing again. This seems to have turned something else either on/off, and now it takes even longer. The call to this statement in the Enterprise Manager takes about 28-30 minutes (whether a direct SQL statement, or wrapped into a stored procedure), and VBA just dies.
    Execution Plan:
    I decided to take a look at the execution plans of these statements, as at one point I was experimenting with using a temporary table initially, then normalising the flat table, to see if using a temporary table might skip a step somewhere. At another point I had an additional column in the table being inserted into that was just a record ID column with an IDENTITY clause on it (with or without this did not seem to affect performance on SQL Server 2005, couldn't tell with 2000).
    There is about 20% in the initial query to the external database, and 33% in the actual INSERT command. The remaining 47% is in an Index Spool/Eager Spool operation (no, there are no indexes in any of the tables, as I delete them before any operations are carried out and recreate them after everything is done). The only thing I've managed to change is at what point in the execution the Spool occurs, which differs depending on whether the target table has the IDENTITY column or not.
    What now?:
    So what do I try next? I guess I would definitely like to know how to get my server's settings and configuration back to the defaults that it should be set to. At least then I can try from there.
    If it really is the ARITHABORT flag that was causing this, I've tried all the settings to sort this out. If there's some other setting I have overlooked, I would like to hear about it.
    All the other statements run in the overall process run without problem or taking any noticeable length of time. It is just this import of data from the big file.
    All help is greatly appreciated, as I've just about given up on this.
  2. MohammedU New Member

    It is too much to digest in one or two readings...but here few things I can point...
    Bulk insert: You can use bulk insert command to import the data from text files and it has the option to use the de-limiter...check BOL for details...
    Once you loaded the data, have you created the indexes? if not based on your where cluase create the indexes?

Share This Page