SQL Server Performance

paging and multithreading for faster search

Discussion in 'ALL SQL SERVER QUESTIONS' started by aiza, Oct 25, 2013.

  1. aiza New Member

    hi,i am new to database..i am using sql server 2008 R2 for my project that will have 1 terabyte data..now i am trying to search some records from this huge data. As i want to extract data from a large table in small time,i have learnt that I would first have to look at paging as i would have to split work among threads or runnable interfaces. So Thread 1 handles pages 0 to 10, Thread 2 handles pages 11 to 20, etc..I want to know exactly how i can i use this approach for my search query..how to page database and how to make threads to do searching in different page ranges?Small example would be appreciated..Thanks in advance
  2. davidfarr Member

    Your requirement of "extract data from a large table in small time" is not uncommon, most database users and admins require this every day.
    For this reason; multithreading and paging is already integrated into the SQL Server Query engine.

    As a database admin, you have the option to set the degree of multithreading in all query executions;
    exec sp_configure 'max degree of parallelism'
    As a user; you have the option of adding a query hint to each specific query such as;
    OPTION (MAXDOP 2) ...to specify 2 threads when executing the query, up to a maximum of 1024 threads. The query engine will never use more threads than it needs. Even if you specify 1024 threads in a query execution, the engine may determine that only 8 are necessary.

    However, users and developers typically should not concern themselves with this feature because the SQL Server query engine by default will use as many threads as it needs to optimize the execution performance of each query.
    SQL Server does not make provision for manually specifying which thread should execute which data page. Such things are inherently handled by the query engine.

    If you are executing the queries from a programmed client application, such as something written in .NET, then it is up to you if you want to attempt asynchronous data queries from within that client-side code. For example; you may have your application executing one query for all financial transactions for the month of July while another query simultaneously executes for the month of August.

    As far as SQL Server itself goes; you can have limited control over multithreading (and I advise that you allow the query engine to determine the threading) and no control over which data pages each thread will process.
  3. aiza New Member

    I have integrated netbeans 7.2 with sql server..currently "search query" written in netbeans retrieves required results from sql table(whole)and displays them in netbeans console.
    now the below query written in sql is used for paging:
    declare @MaxThreadCount INT;
    declare @CurrentThreadIndex INT;
    set @MaxThreadCount=3;
    set @CurrentThreadIndex=0;
    WHILE(@CurrentThreadIndex<=@MaxThreadCount)
    BEGIN
    set @CurrentThreadIndex=@CurrentThreadIndex+1;
    declare @rowsPerPage as bigint;
    declare @pageNum as bigint;
    declare @RowCount as bigint;
    SELECT @RowCount = COUNT(*)
    FROM cdr
    SET @rowsPerPage = ((@RowCount / @MaxThreadCount) + 1);
    set @pageNum=@MaxThreadCount;

    With SQLPaging As (
    Select Top(@rowsPerPage * @pageNum) ROW_NUMBER() OVER (ORDER BY ID asc)
    as resultNum, *
    FROM cdr )
    SELECT *
    FROM SQLPaging
    WHERE resultNum BETWEEN (@CurrentThreadIndex - 1) * @rowsPerPage + 1
    AND @CurrentThreadIndex * @rowsPerPage
    ORDER BY ID ASC

    END

    The last select statement gets some part of table..now i want that each of these parts to be handled by a different thread to execute that "search query" and so that at the end whole table is searched(all of that should happen in netbeans code)..i want to know if this can work?will this improve performance and if yes..how to include threads here?
  4. davidfarr Member

    You are using a strange definition of "paging".
    In SQL server terms; a single data page is only 8kb in size, and typically carries less than 4 data rows per page, depending on the data types within the row. A large data table has millions of pages, not 3.

    In your query example, you appear to be simply dividing the total table rows by 3 and expecting that 3 queries will simultaneously and independently search each "chunk" without contesting each other. This is unlikely to offer any performance improvement. It may even be slower due to disk I/O contention (locks, latches).
    SQL Server disk I/O reads 8kb pages from disk, not individual rows. Some of the data rows from your "Page 1" might exist in the same 8kb data page as rows from your "Page 2" or "Page 3", and so running 3 queries together can cause disk I/O contention (locking, latches) as all 3 queries attempt to read from the same 8kb pages, thereby slowing performance on all queries.

    In my opinion; You are overthinking something that has already been anticipated and catered for by the designers of SQL Server. You are not the first person who has needed to query a large table.
    The key to fast single query performance on a large table is adequate server RAM, fast RAID (striped) disks, correct and unfragmented indexes, correct (small) data types in the table design, and sometimes transaction isolation to minimize locks and blocks. Partitioning might also be a consideration on very large tables.
    The rest is up to the query engine to optimize the I/O process; to determine the required number of threads for the query, and to determine which data pages to read.
    In my opinion, your "3 chunk - 3 threads" idea above will not improve overall query performance.
  5. aiza New Member

    public List <String> called=new ArrayList();
    System.out.println("Enter your required number: ");
    caller = user_input.next( );
    String SQLL = "SELECT called_id FROM cdr WHERE primary_caller_id = ?";

    PreparedStatement stmts =con.prepareStatement(SQLL);
    stmts.setString(1,caller);
    rs=stmts.executeQuery();
    while (rs.next()) {

    called.add(rs.getString("called_id"));
    }

    cdr //table of 1 terabyte data
    So do you mean if i want to get the results from this query written in netbeans, this simple way is the fastest one and what really matters is the changes in hardware? or if there is some better way can u guide me towards that in detail please?thanx
  6. davidfarr Member

    I'm not much of an expert on netbeans, but I do believe that a simple, straight-up query on the [cdr] table is the better way to go.

    Your example query above is;
    SELECT called_id FROM cdr WHERE primary_caller_id = ?;
    The column for [primary_caller_id] should therefore be indexed in the table (if not already the primary key), and if that table has daily inserts or updates of new data then that column index should undergo a periodic (weekly) defrag or reindexing to keep it optimized.

    Only change hardware if its practical and affordable for you to do so. SQL Server queries will still execute on basic hardware (meeting minimum requirements for installation) , but will naturally be faster on better hardware, especially a fast disk system. The balance between hardware expense/effort and expected query performance is yours to decide.

    It may also be a good idea to read up on table partitioning;
    http://technet.microsoft.com/en-us/library/ms178148(v=sql.105).aspx
    These are methods of optimizing maintenance and performance on very large tables by splitting them into 'horizontal' or 'vertical' sections.
    Queries on smaller tables are generally faster.

    To simulate horizontal partitioning, you could split the [cdr] table into [cdr1], [cdr2], [cdr3], etc.
    and have [primary_caller_id] values partitioned between them based on their value range.
    When you run a search query on a [primary_caller_id] value, you could do something similar to;
    SELECT CASE primary_caller_id
    CASE <= 10000000
    SELECT called_id FROM [cdr1] WHERE primary_caller_id = ?;
    CASE 10000001 to 20000000
    SELECT called_id FROM [cdr2] WHERE primary_caller_id = ?;
    CASE 20000001 to 30000000
    SELECT called_id FROM [cdr3] WHERE primary_caller_id = ?;
    etc.....

Share This Page