SQL Server Performance

slow bulk insert perf on 2.5 gb txt file

Discussion in 'Performance Tuning for DBAs' started by sajosh, Dec 6, 2002.

  1. sajosh New Member

    a new bulk insert project(project2) is taking far too long to complete. we're loading 2.5 gigs from a tilda delimited txt file. it's taking 2.5 hrs. as a benchmark we have an existing project (project1) with 2.7 gigs that takes 40mins.

    we think the slowness must be due to more columns, large decimal data type and/or page size for project2 (the slow one). trying to test this.

    does anyone have any clues on this slowness? or any ideas on the three problems in the previous paragraph?

    here is a checklist of the stats ...

    1. serversamesame
    2. bulk insert from drivesame (D drive)same (D drive)
    3. bulk insert into drivesame (G drive)same (G drive)
    4. data file size2.7 gb2.5 gb
    5. stored proc to run BIsamesame
    6. index creation with BInono
    7. BI parameterssamesame
    8. primary key created bybulk insertbulk insert
    9. loggingoffoff
    10. constraint checkingoffoff
    11. data file pre-orderedyesyes
    12. tempdb growthnormalhigh (upto 10gb)
    13. columns1634
    14. predominant data typessmall intdecimal (19,2)
    15. LOAD TIME40 MINS2.5 HRS

  2. sajosh New Member

    oh there's actually another problem. the tempdb is increase in size until the dba clears it. any ideas why?
  3. sqljunkie New Member

    There are 2 perfmon counters

    Bulk Copy Rows/sec
    Bulk Copy Throughput/sec

    They are located in the Databases object.

    What do they report for the two different executions?

    What do you mean by #8 "primary key created by bulk insert bulk insert"
  4. bradmcgehee New Member

    Based on what you described, and as you are suggesting, I would not expect to see this much difference between the two bulk loads, assuming that virtually everything else is identical. Obviously, there must be some difference, and I don't think the different data types or differences in the number of colums can explain most of the time difference. The goal now is to find some difference between the two BI, as there must be something.

    As rortloff has suggested, you need some more data to help you find out what is going on. I would try Performance Monitor first, using the counter already suggested, plus the standard counters used to look for hardware bottlenecks, and see if there are any obvious differences. If that doesn't work, then I would do a Profiler trace of both BI to check for differences. One or the other of these will give you a good clue to what is happening.

    Also, the tempdb should always clear itself when the connection that created the tempdb tables is disconnected. Be sure to check that the connection used is properly closed.

    Let us know what you find out.

    Brad M. McGehee

Share This Page