SQL Server Performance

Partitioned Views

Discussion in 'Performance Tuning for DBAs' started by ssmoot, Jun 2, 2005.

  1. ssmoot New Member

    Does a non-distributed paritioned view make sense?

    Say for example you have a table with a billion Orders. Orders have a DateTime, Description, Id, etc. There are say, 500 million records in the Orders table.

    The Orders table is clustered on DateTime, Id. Will splitting it up into seperate tables on the same server, in the same database and horizontally partitioning on the DateTime by month-year increase performance? (Say the month-year partitioning results in member tables that are each 50 million records in size.)

    If you could go into detail of why/why-not I'd appreciate it.

    My hunch is no. That as long as the DateTime is the first thing in your Where clause (SQL Server 2000 btw), the partitioned view is nothing more than a "second order cluster" since all source tables are on the same server, in the same database.

    Am I wrong?
  2. It will definitely increase the performance as the Cost of Data accession will be drastically reduced. Moreover I beleve the partition view is used among the linked Servers only. What you are doing here is Purging the Database in two or more than two parts to manage the Data accessability only.

    But by the process of splitting tables within database will result the occurences of number of sub tables which are also required to be tuned up with same index gear and Modification in all the related objects specially Stored Procedures and Views for data integrity will be a major issue. Beside manageability of all the newly created tables too will be some extra works.

    Regards.

    Arindam Ganguly
  3. ssmoot New Member

    quote:Originally posted by gangulyarindam

    It will definitely increase the performance as the Cost of Data accession will be drastically reduced. Moreover I beleve the partition view is used among the linked Servers only. What you are doing here is Purging the Database in two or more than two parts to manage the Data accessability only.

    This is actually concerning only a Partitioned View whose source tables are all on the same server, in the same database. This is not concerning Distributed Partitioned Views.

    Are you still of the same opinion? If so, why?

    Thanks.
  4. satya Moderator

    Check the constraints and rules that associated in the underlying tables for this view.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  5. ssmoot New Member

    I think maybe the question is being misinterpreted. I'm not asking how to create a partitioned view, or wether a distributed partitioned view makes sense, but wether a non-distributed paritioned view make sense?
  6. tjjfv New Member

    quote:Originally posted by ssmoot

    Does a non-distributed paritioned view make sense?

    Yes, in some scenarios.

    The underlying separate tables can be placed into separate files, which can be placed on different storage systems. Thus a scenario might benefit from partitioning the orders into a recent and archive table, with the recent table on faster hardware (for example raid 1 for increased write performance) and with the archive file elsewhere (for example raid 5 for increased storage space).

    One important consideration (I believe the following reasoning is sound, though I may be in error): all of the logical indices on the logical table represented by the partitioned view would be comprised of indices on the underlying constrained tables. Thus all of the indices on the logical table would be inherently prefixed by the 'year-month'. If there is a need for indexing the logical table based upon a list of fields which does not include the inherent 'year-month', this would need to be achieved through other means, such as an indexed view.

    In general, I do not think simply dividing a table into subtables and creating a partitioned view would increase performance, and I think it could decrease performance, so there would most likely need to be a further justification for doing so, especially as there is additional development and managment complexity required for partitioned views.




  7. satya Moderator

    Exactly and also the constraints on the underlying tables have the consequences of reduced performance.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  8. mmarovic Active Member

    Actually, if you successed to have index b-tree depth in partitiones lower then in original table, you may gain performance. Also, you can schedule maintenance tasks (e.g. index defragmentation) on each partition at different time which you possible may not perform during maintenence window or frequently enough on orignial table. That would indirectly help performance. However, it is not easy task to design indexes, queries and overall process to take advantage of partitioning.
  9. akus New Member

    I tested similar:

    table with 10 million records
    which is using datetime as partition criteria (approgimately 1M recorde per year)
    all same indexes are applied to each partitioned table. all partitioned tables are within same database.

    TEST 1.

    SELECT SUM(kmstravelled)
    FROM LOADS
    WHERE internaloperatortime < '2004-11-21'
    AND internaloperatortime > '2004-01-21'

    38 seconds

    SELECT SUM(kmstravelled)
    FROM LOADS_partitioned
    WHERE internaloperatortime < '2004-11-21'
    AND internaloperatortime > '2004-01-21'

    20 seconds

    TEST 2.

    running Report which is not fitered by date.
    Running without partition: 1.14
    Running using partition: 2.20!!

    So using the partition for query with result solely from one fo tha partitions is 50% quicker. BUT if you are running queries which don't take advantage of the partition criteria there is a HEAVY penalty (in our case it means that we cannot use this stragety).



  10. tjjfv New Member

    quote:Originally posted by akus@mail.com

    I tested similar:
    table with 10 million records
    which is using datetime as partition criteria (approgimately 1M recorde per year)
    all same indexes are applied to each partitioned table. all partitioned tables are within same database.

    ...

    So using the partition for query with result solely from one fo tha partitions is 50% quicker.
    The improvement in peformance between your two tests seems to high to represent solely the improvement of a lower index depth as a result of partitioning. How many partitions did you use? Was there an index on the search criteria on the original and partitioned tables? Did you clear the data cache before each execution?
  11. mmarovic Active Member

    quote:for example raid 1 for increased write performance
    Can you please elaborate that? I believe raid 1 (mirroring) improves read performance. Raid 0 is used to improve write performance.
  12. tjjfv New Member

    quote:Originally posted by mmarovic


    quote:for example raid 1 for increased write performance
    Can you please elaborate that? I believe raid 1 (mirroring) improves read performance. Raid 0 is used to improve write performance.
    I was specifically referring to Raid 1 versus Raid 5, as they both involve redundancy for the protection of to stored data.

    Raid 0, which has no redundancy, has twice the write performance of Raid 1, but only equivalent read performance (assuming the same number of disks are used). Raid 5 has equivalent read performance as 0 and 1 (again, assuming the same number of disks are sued), but has less write performance than Raid 1 (due to the need to read/write parity).
  13. akus New Member

    tjjfj,

    -data is spread to 10 partitioned tables

    -non-clustered index exists on both partitioned and non partitioned tables for the filter column

    -cache not refreshed between runs (took average of 3 runs which each were within 1 second)

    Only explanation I have is that the old table is less sequential and new ones (partitioned ones) have been created more sequentially (in bulk for this test purpose) - I mean less disk i/o to get to data. Would you say this is valid??

    Regardless, the performance of joining column other than the date, even when join column have indexes on them, is ridiculously poor. Maybe in your case it makes sense as your recordsets are considerably larger.

    Do you have any test results you can share? If not, are you going to upgrade without testing this first? If so, I recon you have considerably larger balls than me!!

  14. mmarovic Active Member

    I was refering to different raid solution against "non-raid". Raid 1 improves reads because data can be read from either mirrored disk. Writting cost is the same as non-raid solution, so it doesn't improve write performance. Raid 5 should have worst write performance because more data are written (it has to maintain parrity). Raid 0 writes chunks of data on several disks in parallel so it improves write performance.
  15. mmarovic Active Member

    Akus, I have implemented partitioned views 5 years ago. The problem was that we had to import daily 30 million rows. That was a hell of work. We didn't have large enough maintenance window for any maintenance task. Data were used for fraud detection, they were source for data marts and they were even used in call centre. I don't remember many details anymore. We successed because we designed queries to allways include data range in conditions. We also restricted maximal data range that can be used. Data were allways added to the last partition. When the new partition had to be added we created new table, recompiled partition view to include new one and exlude the oldest one, oldest one was then moved to archive diskspace. Krajdba (I think) did it a few months ago, I guess he still administers that server so he can possibly share much more details.
  16. tjjfv New Member

    quote:Originally posted by mmarovic

    I was refering to different raid solution against "non-raid". Raid 1 improves reads because data can be read from either mirrored disk. Writting cost is the same as non-raid solution, so it doesn't improve write performance. Raid 5 should have worst write performance because more data are written (it has to maintain parrity). Raid 0 writes chunks of data on several disks in parallel so it improves write performance.
    If you are comparing the raid solutions against non-raid, then both Raid 0 and Raid 5 also have better read performance than non-raid as reads can be done in parallel from the multiple disks used.

    However, if the raid solutions are being compared against non-raid solutions that involve the use of multiple disks (either spanning, or multiple filesets), then assuming a fully distributed load, the non-raid solutions will have equivalent read performance to the raid solutions, though often loads are not going to be fully distributed, and thus the non-raid solutions will not acheive equivalent read performance.
  17. mmarovic Active Member

    Yes, Raid 0 does read faster because of the reason you mentioned. However, I'm not quite sure about raid 5. I believe read performance is pretty much the same. When you read data from two disks it is not because you read in parallel parts of information, it is because you read parity data. However I'm not quite sure how it actually works, so I would like to hear from someone who knows for sure either because he tested or read somewhere test results or because he knows better how raid 5 works.
  18. mmarovic Active Member

    I overlooked it reads from multilple disks. If it is the way it works I can imagine the more disks are involved read is faster.
  19. tjjfv New Member

    quote:Originally posted by mmarovic

    However, I'm not quite sure about raid 5. I believe read performance is pretty much the same. When you read data from two disks it is not because you read in parallel parts of information, it is because you read parity data.

    quote:The parity data is not read, as it is not needed.

    However I'm not quite sure how it actually works, so I would like to hear from someone who knows for sure either because he tested or read somewhere test results or because he knows better how raid 5 works.
    A very good refernce in which you can confirm this is "Microsoft SQL Server 2000(TM) Performance Tuning Technical Reference" http://www.amazon.com/exec/obidos/ASIN/0735612706/ref=sib_rdr_dp/103-0063622-2766248

    You can view the pages on comparing raid level performance: http://www.amazon.com/gp/reader/073...oxTa7f+W2WatQj3odG+lCyFEMhbnOqPc=#reader-page


Share This Page