SQL Server Performance

Merge Replication failed with error "Could not allocate space for object 'dbo.SORT temporary run storage......"

Discussion in 'SQL Server 2008 Replication' started by luckyjason, Mar 13, 2011.

  1. luckyjason New Member

    I am frastrated by this issue, hope here you guys can help me to figure out the reason and the solution. Thanks first, any information and reply are welcome.
    This problem and error happened in one of our client environments during the snapshot initialization for a merge replication subscription.
    The Hardware and Platform:
    4 X Intel Xeon 2.4GHz quad core CPUs
    24GB RAM
    Windows Server 2008 R2 Enterprise 64-bit
    SQL Server 2008 R2
    Disk Driver N is a SAN disk with 200GB space.

    We create database data and log file on disk N, create subscription for Merge Repication with Web Synchronization.

    The database need to be replicated are around 95GB.
    We started the replication agent job, all went well for the first 2 hours, we see snapshot files are applying, data has been bulk copied to subscriber database, later the agent job applied scripts in *.dri files to each tables (these *.dri files contains index creation scripts to run on tables). Some tables are very large that have 9.7GB size for data and 100M rows.
    Finally the replication failed during snapshot propagation. We have collect the verbose log file. The error shows:
    2011-03-11 00:12:50.707 OLE DB Subscriber 'ORANTESTSQL1TESTSQL2008': CREATE NONCLUSTERED INDEX [IDX_Device_XXXX] ON [dbo].[Device]
    (
    [XXXX] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
    2011-03-11 00:22:18.415 OLE DB Subscriber 'XXXXTESTSQL1TESTSQL2008': sp_MSreleasesnapshotdeliverysessionlock
    2011-03-11 00:22:18.562 The schema script 'Device_25.dri' could not be propagated to the subscriber.
    2011-03-11 00:22:18.836 OLE DB Subscriber 'XXXXTESTSQL1TESTSQL2008': {call sys.sp_MSadd_merge_history90 (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)}
    2011-03-11 00:22:18.928 Percent Complete: 100
    2011-03-11 00:22:18.934 The schema script 'Device_25.dri' could not be propagated to the subscriber.
    2011-03-11 00:22:19.075 Percent Complete: 0
    2011-03-11 00:22:19.081 Category:NULL
    Source: Merge Replication Provider
    Number: -2147201001
    Message: The schema script 'Device_25.dri' could not be propagated to the subscriber.
    2011-03-11 00:22:19.088 Percent Complete: 0
    2011-03-11 00:22:19.095 Category:NULL
    Source: Microsoft SQL Server Native Client 10.0
    Number: 1105
    Message: Could not allocate space for object 'dbo.SORT temporary run storage: 423234189459456' in database 'XXXXMirror' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
    2011-03-11 00:22:19.308 Disconnecting from OLE DB Subscriber 'ORANTESTSQL1TESTSQL2008'
    Just after it failed, we check the N drive, it seems no free space cause this issue.
    The database data file size are around 70GB, the log file size is around 120GB.
    Based on the error message in log file, we can say it failed because the space on N driver has been used off. The last thing replication agent did is trying to create nonclustered index on table Device, this table is 9.7GB and 100M row count.
    The question is:

    Why the log file increased to nearly 120GB to create that index. You know, we also have another 2 environments have been setup follow the same steps and procedures. They both finished sucessfully, after done, the database file size is 95GB, the log file size is 23GB, we didn't do any shink on the log file. But why this client environment failed and the log file increased to 120GB, this is the point I cannot understand.
    In the log file, before create the index, We can find many other indexes have been created sucessfully, some of them are also on big tables. Each statement of creating indexes have "GO" between them. According to my understanding, the space of the log file should be reclaimed after each index being created.
    Anyone can tell me what's the reason cause the log file increase to 120GB in client environment while in our environment it is just 23GB.
    Is it possible the client SAN disk has wrong configuration? Showing 200GB free space while it is not?
    I really have no idea and don't know where to start checking this issue.
  2. ghemant Moderator

  3. luckyjason New Member

    Thanks ghemant for the link about the disk space requirement for creating index, it is helpful information to refresh my knowledge.
    The point of my question is why client environment's log size increase to 120GB while our environment's one never exceed 23GB, and all the procedures to setup the subscription and doing the initialization and replication are the same.
    Even the device table is big (9.7GB and 100M rows), but all the indexes (4 or 5) on this table only take no more than 12GB on our sucessful deployed environment). So why the client log file reach 120GB when only create one of the index on this table.
    And the database is new created on only for this replication subscription.
  4. satya Moderator

    There are multiple issues here as I can see:
    • Are you performing REINDEXING for all the tables in the database?
    • Is this a third party application database?
    • Do you know what are the active tables in terms of INSERt/DELETE/UPDATE operations?
    • What is the SLA for this application?
    • When was the last time the reindex job has completed without any errors?
    • When did this actual issue happened?
    If you are able to explain these pointers it will ideal to differentiate the issue to granular level.
  5. luckyjason New Member

    [quote user="satya"]
    There are multiple issues here as I can see:
    • Are you performing REINDEXING for all the tables in the database?
      • Is this a third party application database?
        • Do you know what are the active tables in terms of INSERt/DELETE/UPDATE operations?
          • What is the SLA for this application?
            • When was the last time the reindex job has completed without any errors?
              • When did this actual issue happened?
            • If you are able to explain these pointers it will ideal to differentiate the issue to granular level.
              [/quote]
              • Are you performing REINDEXING for all the tables in the database?
                -- This is new created database just for replication subscription setup, but I am not sure whether the client's SQL server has some job to reindexing this new create database ( I am not a database administrator, I am only a developer, so not very good at the server admin stuff. Is it possible there an schedule job to reindex on the new create database automatically?)
                • Is this a third party application database?
                  -- This is totally refresh and self-owned database just create before replication subscription setup.
                  • Do you know what are the active tables in terms of INSERt/DELETE/UPDATE operations?
                    -- What do you mean, the large script running on this database by the replication agent is creating an index on device table. There should not be Insert/Delete/Update operation on this database at that moment.
                    • What is the SLA for this application?
                      -- You mean SLA = Service Level Agreement. Not sure about this.
                      • When was the last time the reindex job has completed without any errors?
                        -- In fact, we are not doing reindex job, we are configuring replication subscription and doing the first initialization and sychronization. Those index creating script is totally from snapshot files generated by SQL server, not us.
                        • When did this actual issue happened?
                          -- When we are configuring replication subscription and doing the first initialization and sychronization, and after the agent job started running for nearly 3 hours. According to the log timestamp, it failed after start creating the index on device table and running for 9.5mins about that index creating statement.
  6. Adriaan New Member

    Db probably has "SIMPLE" recovery mode. As suggested, set up regular backups of the log, with truncation.

Share This Page