Indexes and inserts on a large flat table

Last post 09-03-2008 8:52 AM by martins. 7 replies.
Page 1 of 1 (8 items)
Active Topics My Discussions Unanswered Sort Posts: Previous Next
  • 09-03-2008 6:45 AM

    Indexes and inserts on a large flat table

    I've a question or two about performance issues with a large flat table.

    It looks something like this:
    id | col2 | col3 | col4 | data

    id is unique. col2/3/4 will be used as select constraints.

    The only select query that will be run is SELECT data FROM table WHERE col2=a AND col3=b AND col4=c.

    1) Is there any reason not to index col2/3/4?

    2) When inserting I will have a lot of duplicate data, based on the id, which I don't want inserted (and can't be because id is constrained as unique). Is the fastest way to insert a load of data, which may contain unwanted duplicates, to just go and try and insert all the data and let it reject data which has the same id?
    Or would it be better to first check if the row exists and then attempt to insert it? ie.
    SELECT COUNT(*) FROM table WHERE id = x
    [if result = 0]
    INSERT ...
    [else]
    move on to next.

    Typically I will have say, 100k rows to insert, 30k of which are unwanted duplicates.

    3) The table will be in the order of 60million+ rows. col2 will only have maybe 5 different values. Provided it is indexed, will it be just as fast to leave all the data in that single table over creating 5 different tables and choosing the correct table programmatically before running a query? Basically, even if the table were 200mill rows, will SELECT (*) FROM table WHERE col2 = x LIMIT 1 take a miniscule amount of time, if there are only 5 values of col2?
  • 09-03-2008 7:56 AM In reply to

    • martins
    • Top 75 Contributor
    • Joined on 08-06-2007
    • South Africa
    • Posts 258

    Re: Indexes and inserts on a large flat table

    Hi,

    To answer your questions:

    1) I don't think there will be any reason not to index col2/3/4...unless you want to scan the whole table everytime the query is run.

    2) You should definitely handle unwanted duplicates. If they are true duplicates, a "select distinct" will help to eliminate. Otherwise it really depends on where your data comes from and whether you are inserting one row at a time (hope not)

    3) If col2 will only have 5 distinct values, filtering based on that column might also take a while to complete because there will be many records with that value for col2. Splitting it out into separate tables might be an option, but I think views (indexed views if you can) might be a better choice. Have a look at performance with an index on col2 before creating views though. If performance is good enough then it will not be necessary to even bother with views.

    Hope this helps.

  • 09-03-2008 8:09 AM In reply to

    Re: Indexes and inserts on a large flat table

    Thanks for that.

    What is the problem letting the database handle rejecting duplicate inserts? Shouldn't it internally see that there is a unique constraint on the id column and itself search to see if the row exists before inserting? Is it just bad DB design if it does not do this or is there some other reason that I'm missing?

  • 09-03-2008 8:14 AM In reply to

    • martins
    • Top 75 Contributor
    • Joined on 08-06-2007
    • South Africa
    • Posts 258

    Re: Indexes and inserts on a large flat table

    Unless I am mistaken, the whole insert statement will be rolled back if one row fails based on a constraint...meaning that no rows will be inserted depending on how you are currently inserting the records. That is the reason why you should handle it...

  • 09-03-2008 8:24 AM In reply to

    Re: Indexes and inserts on a large flat table

    Ok. Each row is inserted by seperate insert statement. That being the case do you see any downside to letting the DB hande the duplicates?

     Regarding inserting multiple rows in one statement. Do you mean to achieve this by using UNION? I've never got this to work for large inserts (more than a few hundred rows) so tend to stay away from it.

  • 09-03-2008 8:32 AM In reply to

    • martins
    • Top 75 Contributor
    • Joined on 08-06-2007
    • South Africa
    • Posts 258

    Re: Indexes and inserts on a large flat table

    If you have separate insert statements for every record, then there is no downside to letting the database handle it.

    Using a UNION will not achieve anything different with what you are doing now (and I won't recommend it), but inserting with separate insert statements will take some time especially if you have a lot of data to insert.

    Why not use a bulk insert or SSIS package to insert the data from its source? This all depends however on what your source is...

  • 09-03-2008 8:43 AM In reply to

    Re: Indexes and inserts on a large flat table

     Great, thanks for your help.

  • 09-03-2008 8:52 AM In reply to

    • martins
    • Top 75 Contributor
    • Joined on 08-06-2007
    • South Africa
    • Posts 258

    Re: Indexes and inserts on a large flat table

    cool man, no problem.

Page 1 of 1 (8 items)
Active Topics   My Discussions    Unanswered Posts


© 2000 - 2007 vDerivatives Limited All Rights Reserved.