DB in Memory Only? (Architecture Q) | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

DB in Memory Only? (Architecture Q)

Hi All, I was wondering if you might so kind to give me your thoughts on an SQL Server setup and performance issue? I#%92d be very grateful for anything you can contribute. Hopefully it’ll be an interesting thread.
The situation is this: My company is becoming an XML web service gateway for travel agents to search, book and package travel product from our 50+ suppliers. We have a travel agents website where they login and perform searches. Our middle tier receives their search criteria (dates, destinations, star rating, car hire and flight requirements) and then uses a multi threaded core to connect to all 50+ suppliers simultaneously. The challenge is, that all suppliers use incredibly different APIs and data structures (the travel industry is full of I.T. cowboys and no standards). To unify, compare, filter and sort this mish mash of data our application cleans and restructures this data and inserts this into our ‘standard#%92 table structure. These inserts vary a lot, when we get a good quality feed we usually do an xml insert into a temp table then select into our table structure. Some times we require a cursor and other times the data is so poor the middle tier has to be extracted using xml DOM, xPath, string pattern matching and then inserted row by row from the middle tier – great! (If it were simply well formed xml I#%92d consider BizTalk) Search wise, we estimate we#%92ll get about 200-2000 searches a minute which equals a lot of threads and a lot of inserts etc. Each search takes anything between 5-120 seconds as we#%92re only as quick as a slowest supplier so I also have to worry about total number of threads used at the middle tier. In SQL config terms parts of the database we want to be persisted (bookings info and searches made), and others are purely there as a search cache that needs to last as long as the users search session (5 mins ish). This brings me to my first specific question: would it be better performance wise to: a) use temp tables for each search (big over head of creating, populating, indexing and filtering the tables 2000 times a minute?)
b) use fixed shared cache tables (current config) in a separate database (separate to bookings tables that would be written to disk and backup etc) with minimal logging etc (is it possible to create an in memory database)? SQL version wise we have an old sql 2000 Ent Edition server licence (open licence Open Licence 5) which is valuable as we can use as many cpus as we want without extra license fees. Many thanks in advance, I look forward to hearing from you.

Hi ya, first things first, in terms of licensing I think you’ll find that any sql database which is exposed via a website needs to ensure that either you have a valid SQL client license for every possible client, OR you need a per processor license. Pretty certain that an OL5 doesn’t cut it… in terms of a front end for searching one way may be to have a OLAP database for your insertion of new products/updating existing products and a separate DS database for searching. This database can be created/updated periodically, as data is changed in the OLAP database whichever is appropriate for your need SQL does allow pinning of objects, but it is generally not recommended. The caching algorithm within SQL are about the best you’ll find… HOWEVER if your frontend has regular searches then it can sometimes make sense to save the result sets in an xml or html form and have the web application use that rather than going back to the database for every request and creating the xml/html itself. This depends largely on the number of repeated requests for exactly the same data. In .NET there are relatively easy ways to achieve this sort of thing Cheers
Twan
quote:SQL does allow pinning of objects, but it is generally not recommended. The caching algorithm within SQL are about the best you’ll find…
Yes, you’re likely to find that your "hot" pages are in the buffer pool anyway. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs

Hi Guys, Thanks for the info. I had a reposnse from Tony from the UK SQL server user group too, so I’m going to post that, and then my response and have a thinak about them all.. Thanks again. Tony says: The shared table approach is probably the way to go, however, you will have blocking contention there, so may be partition it up so you have multiple tables and balance the load across; that will reduce locking contention and also give you the ability to put tables on separate disks if necessary etc… If they are truly temporary then I#%92d probably have them as permanent tables in tempdb to take advantage of tempdb#%92s logging (less that a normal database), you will need to recreate the tables on SQL starting up which can be done using a start up stored procedure for instance. Going out to all those feeds must be horrendous, is it true that if I do a query then all ‘x#%92 feeds are checked each time or do you cache results in your database and just pull data across in intervals? Tony.
Tony Rogerson SQL Server MVP

Hi Tony, Many thanks for the response. On the license front, I only mentioned it as I was not sure about what a colleague told me. My colleague thought we purchased 2 ‘server#%92 (‘is this called ‘per seat#%92?) licences for SQL2000 Ent edition (on ‘open licence 5#%92) before MS introduced CPU licensing. However I#%92ve just made him dig out the docs and it seems they are CPU licences – good job we only have 2 CPUs. Was there ever such a thing as ‘per seat#%92 licensing with unlimited CPUs? I see your point about the shared tables and blocking. How does the use of TempDB compare to creating a user database and setting the recovery model to ‘simple#%92 or ‘bulk-logged#%92? I take it ‘simple#%92 recovery model means the least logging and thus best performance? Would this be faster, slower or the same as tempdb assuming they were configured to use the same disks? If I used tempDB I#%92d have to create the structure several times with prefixed, or suffixed names to reduce blocking contention. But I#%92d be stuck with using one disk config for all tables as they#%92d all be on one database right? Our server has two volumes, one 5 disk raid 5, and one 4 disk raid 1-0. Currently our fully logged busy databases use the raid5 for data(mdf) and mirror-stripe for log (ldf). Our tempDB uses the mirror-stripe for both ldf and mdf. Is all of tempdb lost on service restart? Touch wood our server is very stable and has not had a restart in over a year. Would it be advisable to put temp db on a 4 disk stripe or RAM disk? I heard a rumour about RAM disks, do they exist and become a small but very fast hard disk? On the nightmare front – yes it is! The feeds provide live availability which is important. We currently cache a few airlines for 1 year a head and it takes 10Gb of data to be updated several times a day. To cache so many of the worlds flights and hotels etc would be too much data changing too frequently. I am however considering using the cached results where we get similar searches in a short time frame. Hope this isn#%92t a questions overload! Many thanks again, Andrew

Rereading closely through your thread again, I think you should consider hiring a consultant with proven experience in SQL Server (I think, Tony and Twan (in no particular order, except alphabetical) do consulting). Such questions can’t satisfyingly be answered in an online community. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs

agree with Frank on this one… If your data is ultimately only used for searching then it may even be worth revisiting your db structure to ensure that it is optimised for searching. I’ve often come across situations where feeds are taken from various sources, put into a fully normalised db only to be pulled out in a severely denormalised fashion. In those cases it is sometimes worth storing the data in the denormalised form in the first place… Cheers
Twan
About blocking: You can use nolock hint on search tables anyway, so I don’t see it as a problem. If the same query is frequently executed multiple times during the day or some shorter period you can also think about caching query results in sql table or elsewhere.
Hi Guys, Thanks again. MMarovic: Yeah I’m using with nolock hints everywhere possible.
FrankKalis: I agree it’s quite wide open. I said to Tony I’d be happy to share my experiences with the user group and hopefully it will benefit others.
Twan: I see your point, but in my case the data from individual sources is awful. I need to get it into a good hierarchical structure just to work with it and make a good front end. Tony has added some more, so I#%92ll paste it in again with my latest response.

From Tony: Hi Andrew, For tempdb the transaction logging process does not need to log redo records because on restart of SQL Server there is no recovery of tempdb (unlike the other databases) so that reduces some of the load on the tempdb log; you could use file groups to place objects (on file groups) but in the same database… Create table x_1 ( myblah int ) on flg_1
Create table x_2 ( myblah int ) on flg_2 Etc… Sounds like you are ok with the licence. There are companies that do RAM disks; I#%92ve asked MS for what the support considerations are because I#%92ve not done it before. You have to consider caching, if you use # or preferably table variables then it#%92s not likely that the data will get written to disk anyway (if you have enough RAM on your machine), I think your temp tables are short lived yes? The memory used to create a RAM disk might well be best spent assigned to SQL Server anyway. How much memory is in your machine? And how is SQL Server using of it (through task monitor)? Tony. Tony Rogerson SQL Server MVP Hi Andrew, Here is what came back from Redmond…. – The contention for temp table scenarios like this is usually on the
catalog pages or allocation pages that are already in the buffer pool, so
not clear what they are hoping to gain by moving what is likely already in
memory to a RAM disk. They should try to find some way to test that it
actually helps their problem first. Might be that they just need more tempdb
files. Flushes of the log and data pages for tempdb are all async, so
ideally nobody is waiting on writes. – There are some sort and workfile scenarios where tempdb in RAM can help as
there are some synchronous reads/writes in those cases. Ideally, the SQL
engine will work to better understand these scenarios and just make better
use of caches in future releases so that RAM drives are not required, but
there will likely always be some extreme cases. If there are other
scenarios where the RAM disks seem to help a lot, we would like to better
understand them as it may just be a bug in our code where some function is
not special casing a tempdb IO to be async. – Make sure they have a disaster plan to move the tempdb files back off the
RAM disk if it fails or is unavailable at startup time for some reason. Hope that helps, I read it as try and stick with the cache, alright for some stuff the ramdisk will be better but that won#%92t always be the case in the future. Tony.
Hi Tony and the guys, Thanks for all the info! On the RAM disk front I was actually thinking more along the solid state disk route with only one temp db on that solid state volume. This way logging and data writes are very fast. I get what you mean about the file groups, and by the sounds of it tempDB has no recovery model, which has to be quicker than ‘simple#%92 or ‘bulk logged#%92. My server has 4GB of memory and is using about 2.7GB of it last check (looked on SQLCentral#%92s spotlight). So I think we#%92ve decided tempDB should be quicker than a user db with ‘simple#%92 recovery model. Just to clarify (I#%92ve only ever used tempDB indirectly by using #tables or tableVars), did you mean open tempDB and create our table structure in there, or did you mean use TempDB by way of #tables and tableVars? If you ment the former, is there a difference? I assume if you use a #table and you have plenty of available memory the table will only ever exist in memory and have no I/O overhead? If we use #tables and table vars will their being scoping issues? For example can we create all our tables as #temp (there are about 25 tables) create foreign key constraints, indexes and keys etc, the have the whole relational structure available to all connections until the server restarts? I’d keep an identical replica in a user db and copy bits over where logging, audeting and reporting my be required. Cheers, Andrew

Thanks for the feedback from Tony and Microsoft. You should really go for some consulting as in your case all aspects of the system must fit together. You cannot optimise the database, the hardware and the application isolated, but only integrated. And this is far beyond the scope of an online community. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs

quote:Originally posted by knottronix
For tempdb the transaction logging process does not need to log redo records because on restart of SQL Server there is no recovery of tempdb (unlike the other databases) so that reduces some of the load on the tempdb log; you could use file groups to place objects (on file groups) but in the same database…
.
It is true that transaction logging process does not need to log redo records, but the question is, if it is really implemented differently then for common databases with a simple recovery model? I didn’t just get from this thread what is information, speculation or opinion. Anyway if there is a source (an article or a book) describing tempDB internals I would appriciate if anyone is willing to share it. I would really like to better understand how it works.
quote:
Anyway if there is a source (an article or a book) describing tempDB internals I would appriciate if anyone is willing to share it. I would really like to better understand how it works.
I guess the most you can, again, get out of Inside SQL Server. The rest, and I guess the more interesting part, is not available for us mere mortals, AFAIK. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs

]]>