SQL Server Performance

SQL Server 2008 - multiple import processes simultaneously

Discussion in 'SQL Server 2008 General DBA Questions' started by Diamond2016, Aug 19, 2011.

  1. Diamond2016 New Member

    The query is regarding data modeling for core functionality of my application.

    I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients. Also, at the same time client's data could be shown to the user and can be modified/inserted too, while the import process for the same or different client is in process.

    I have 2 core tables which get the most hits whenever import processes run. I have 2 options now
    1. To have 2 core tables and do the sequential imports by making queue for the import processes.
    Table 1



    ID


    ClientID


    SourceID


    Count


    AnotherCol1


    AnotherCol2


    AnotherCol3

    Table 2



    ID


    ClientID


    OrderID


    Count


    AnotherCol4


    AnotherCol5


    AnotherCol6


    1. To have 1000 core table, 2 for each client (I may have maximum 500 clients), it will allow the users to work on the import processes simultaneously without waiting for one another.
    More information about the import process:
    1. These table is not going to be used in any Reporting.
    2. Each import process will insert 20k-30k records (7 columns) in these each table. And there will be around 40-50 such imports in a day.
    3. While the import process is going on, data could be retrieved from these tables by some other user and INSERT OR UPDATED too.
    4. These are going to be one of the most usable tables in the application.
    5. BULK INSERT will be used for insertion.
    6. Clustered index is on the Primary Key which is an Identity column.
    7. We are considering the table partitioning too.

    Can you please suggest which option is better and why?
    Also, if you suggest to go with option 2, then would it not be a performance hit to create so many tables in the database? Should we create a separate database for these 1000 tables in this case?
  2. Diamond2016 New Member

    Can someone please answer these questions?
  3. satya Moderator

    Welcome to the forums.
    What will be the database growth in next 2 years (or 1 year)?
    Is there any provision for data archiving?
    Is this data used for any SELECT purpose?

    Rather than creating many tables why not group them and create a table per group to make it more feasible, the table partitioning is a good idea in achieving the scalability and you must ensure the data is inserted with the paritioning column. By large having 1000+ tables for data insertion is a big problem in managing them, unless you have no other way to group the data from those clients.
  4. Diamond2016 New Member

    Database will grow with each import process which will insert 20k-30k records (7 columns) in these each table. And there will be around 40-50 such imports in a day.
    We have not thought of data archiving yet.
    Yes, data is used for SELECT purpose. Please ready the first couple of paragraphs in my post.

    Please let me know if this provided information helps in clearing my requirements.
  5. satya Moderator

    How about the grouping of those clients for a table creation?
    From your explanation I don't see problem for the data inserts, however if the same data is getting modified means there might be a lag in performance due to the locking. You can address this by going through the locking hints, also the inserts are taken care by BULK INSERT which si a better one on performance.
  6. Diamond2016 New Member

    Hi Satya

    What I realized is that importing 40k-50k records (with 10-15 columns) into a staging table using BULK INSERT take 2-3 seconds.

    Once the data is in staging then I can transform it and then use SQL Server 2008's MERGE statement to make changes in my main table which also does not take long (2-3 seconds) on a heavy database.

    Hence, BULK INSERT->Transformations->MERGE would work for me.

    Anyways, Thanks for giving attention to me problem.

    You can still provide your comments on my above method.
  7. Diamond2016 New Member

    Hi Satya

    To give you more insight into my database schema so that you can address the issue properly, please have a look at the below:


    1. Firstly, please have a look at the database design below.
    [IMG]

    · MS_HospitalTrusts - This refers the actual trust.
    · MS_Hospitals - TrustID is the foreign key in this table. There will be many hospitals under one trust, hence ther 1-n relationship.
    · MS_HospitalUnits - HospitalID is the foreign key in this table. There will be 3-4 units under one hospital, hence ther 1-n relationship.
    · MS_HospitalDepartments: UnitID is the foreign key in this table. There will be 30-40 departments under one unit, hence ther 1-n relationship.
    · MS_HospitalPatients: DepartmentID is the foreign key in this table. This is the table in which acutal import has to happen. There will be 20K-30 records/import i.e. per department, hence ther 1-n relationship.


    2. Secondly, data in table MS_HospitalPatients is going to be of same format for all trusts.

    Main Problem
    As there will be multiple imports simultaneously and also at the same time system will be reading data (dirty read), showing in UI where user can make changes in the data. All these things will happen on MS_HospitalPatients table.

    My Analysis
    What I analyzed is that importing 40k-50k records into a staging table using BULK INSERT take 2-3 seconds.

    Once the data is in staging then I can transform it and then use SQL Server 2008's MERGE statement to make changes in my main table i.e. MS_HospitalPatients which should not take long even on a heavy database.

    Hence, BULK INSERT->Transformations->MERGE seems to be the solution.

    Questions for you
    1. Can you please comment on my proposed solution explained above? Should I go with one common table i.e MS_HospitalPatients for all hosipitals/units/departments OR should I create separate MS_HospitalPatients table for each trust (which will be created when a new trust is added) and then use dynamic queries to insert/update/delete/select from that trust specific table.


    2. Data from MS_HospitalPatients table would be accessed based on UnitID wise most of the times and DataID wise sometimes. So, should the table MS_HospitalPatients be de-normalized by adding field UnitID into it to avoid joins with table MS_HospitalDepartments which contains the UnitID. In either cases what should be the clustered index for table MS_HospitalPatients table and why?


    3. On what basis (columns) the table partioning should be implemented in MS_HospitalPatients table? Would it be good idea to add another column HospitalID in MS_HospitalPatients table and to do the table partioning on that column?


    4. What other techniqiues (like indexing, table partioning) can be implemented to make the database design efficient, robust so that there are NO performance issues later on?

    Kindly provide your reply at the earliest.
  8. preethi Member

    In earlier post, you mentioned as data inserted belongs to different clients. Here I am not clear, where it fits into in the actual scenario? TrustID, HospitalID, DepartmentID or UnitID?
  9. Diamond2016 New Member

    Hi Preethiviraj

    thanks for taking interest in it.. BTW, I have got the answer now.
  10. preethi Member

    Cool. Why don't you share it for the benefit of others?

Share This Page