SQL Server Performance Forum – Threads Archive

problems with import from access

Hi, I’m not entirely sure this is the right part of the forum, but here we go. I’m using a SQL Server 2005 Express and i have to import a large number of Access Databases on a regular basis. I have no control over the access databases, no control over data format, no control over indices, nothing. Most of the databases are small and no problem, but one of them is very large, around 1.5 GB. It’s all in 3 tables, one with 10 rows (a tag-table), one with ~4 million rows and the largest has around 27 million rows. What i need is to extract the data and transfer it to the sql server. Lots of transformations will have to be done, cast() and convert(). I do know that express has a limited capacity, this is also out of my control and it has to fit in there. Regular shrinking and truncation of the log is planned. The tables in access has the format: tagtable:
tagindex stringtable:
tagindex
value
datetime floattable:
tagindex
value
datetime This is the result of a system that outputs 17 pieces of information at the same time, so that i will have 17 rows in access, each with its own value and tagindex and a shared datetime. What i need to end up with is a row in sql server that has 17 fields and a datetime. I run this statement (names changed to protect the innocent!): INSERT INTO
DataLogTemp
SELECT
*, DATEADD(DAY, DATEDIFF(DAY, 0, Import.DateAndTime), 0) AS Dato
FROM
OPENROWSET(‘Microsoft.Jet.OLEDB.4.0’,
‘C

rojectscustomer estdatadatalogDatalogdatalog.mdb’;’Admin’;”,
‘SELECT
TagName, s1.TagIndex, TagType, TagDataType, DateAndTime, Millitm, Val, Status, Marker
FROM
(TagTable
LEFT OUTER JOIN (SELECT * FROM StringTable UNION ALL SELECT * FROM FloatTable) s1 ON TagTable.TagIndex = s1.TagIndex)
WHERE
s1.DateAndTime > DateAdd("yyyy", -1, Date())’) Import
WHERE
Import.DateAndTime > ISNULL((SELECT MAX(DateAndTime) FROM DataLog), ‘1-1-1900’)
This chokes on full load. I’ve tried doing it in 2 statements (taking out the union), with not much of an improvement. I plan on running this statement after (runs on less than full load): INSERT INTO
DataLog
SELECT DISTINCT
CONVERT(real, REPLACE(d0.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d1.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d2.val, ‘,’, ‘.’)),
CONVERT(nvarchar(100), REPLACE(d3.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d4.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d5.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d6.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d7.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d8.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d9.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d10.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d11.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d12.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d13.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d14.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d15.val, ‘,’, ‘.’)),
CONVERT(real, REPLACE(d16.val, ‘,’, ‘.’)),
d0.DateAndTime,
”,
DATEADD(DAY, DATEDIFF(DAY, 0, d0.DateAndTime), 0)
FROM
DataLogTemp d0
INNER JOIN DataLogTemp d1 ON d0.DateAndTime = d1.DateAndTime
INNER JOIN DataLogTemp d2 ON d0.DateAndTime = d2.DateAndTime
INNER JOIN DataLogTemp d3 ON d0.DateAndTime = d3.DateAndTime
INNER JOIN DataLogTemp d4 ON d0.DateAndTime = d4.DateAndTime
INNER JOIN DataLogTemp d5 ON d0.DateAndTime = d5.DateAndTime
INNER JOIN DataLogTemp d6 ON d0.DateAndTime = d6.DateAndTime
INNER JOIN DataLogTemp d7 ON d0.DateAndTime = d7.DateAndTime
INNER JOIN DataLogTemp d8 ON d0.DateAndTime = d8.DateAndTime
INNER JOIN DataLogTemp d9 ON d0.DateAndTime = d9.DateAndTime
INNER JOIN DataLogTemp d10 ON d0.DateAndTime = d10.DateAndTime
INNER JOIN DataLogTemp d11 ON d0.DateAndTime = d11.DateAndTime
INNER JOIN DataLogTemp d12 ON d0.DateAndTime = d12.DateAndTime
INNER JOIN DataLogTemp d13 ON d0.DateAndTime = d13.DateAndTime
INNER JOIN DataLogTemp d14 ON d0.DateAndTime = d14.DateAndTime
INNER JOIN DataLogTemp d15 ON d0.DateAndTime = d15.DateAndTime
INNER JOIN DataLogTemp d16 ON d0.DateAndTime = d16.DateAndTime
WHERE
d0.DateAndTime > ISNULL((SELECT MAX(DateAndTime) FROM DataLog), ‘1-1-1900’)
AND d0.TagIndex = 0
AND d1.TagIndex = 1
AND d2.TagIndex = 2
AND d3.TagIndex = 3
AND d4.TagIndex = 4
AND d5.TagIndex = 5
AND d6.TagIndex = 6
AND d7.TagIndex = 7
AND d8.TagIndex = 8
AND d9.TagIndex = 9
AND d10.TagIndex = 10
AND d11.TagIndex = 11
AND d12.TagIndex = 12
AND d13.TagIndex = 13
AND d14.TagIndex = 14
AND d15.TagIndex = 15
AND d16.TagIndex = 16 Any kind of help will be much appreciated… I’m going nuts here :/

Perhaps you can drop indexes on the DataLogTemp table before inserting, then add them again after the insertion is finished. Second, why not do the inserts in batches of 1,000 rows, and repeat until all rows have been inserted? For this, you do need the proper index for the lookup on the target table. Third, why not link the target table from within Access, and let Access chew on it?

Hmm, i’ll try doing it in batches.

And also watch the transaction log contention in this case and keeping the update in batches will avoid such issues. Satya SKJ
Microsoft SQL Server MVP
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing.

Just an update. I rewrote the UNION ALL in the INSERT SELECT into SELECT
TagName, StringTable.TagIndex, TagType, TagDataType, DateAndTime, Millitm, Val, Status, Marker
FROM
TagTable
INNER JOIN StringTable ON TagTable.TagIndex = StringTable.TagIndex
UNION ALL
SELECT
TagName, FloatTable.TagIndex, TagType, TagDataType, DateAndTime, Millitm, Val, Status, Marker
FROM
TagTable
INNER JOIN FloatTable ON TagTable.TagIndex = FloatTable.TagIndex It appears that Access completely botched it and went utterly berserk on my disk drive, the new version doesn’t do that. What i also did was rewrite it as a SELECT INTO instead of an INSERT SELECT. The transaction log was _not_ a happy camper with the INSERT SELECT. I also set recovery to simple, which resultet in the INSERT SELECT not using the transaction log at all. With simple recovery and the INSERT SELECT it was generating roughly 12 bytes of log for every byte of data entered. Ouch. The second SELECT INTO is running as i’m writing this, i did some index juggling and the execution plan looks ok, if it isn’t done by tomorrow, i’ll have to look into that some more i guess.

Also keep an eye on TEMPDB usage in this case, it varies. Satya SKJ
Microsoft SQL Server MVP
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing.

]]>