Memory leak using SQL FileStream - sql-server

I have an application that uses a SQL FILESTREAM to store images. I insert a LOT of images (several millions images per days).
After a while, the machine stops responding and seem to be out of memory... Looking at the memory usage of the PC, we don't see any process taking a lot of memory (neither SQL or our application). We tried to kill our process and it didn't restore our machine... We then kill the SQL services and it didn't not restore to system. As a last resort, we even killed all processes (except the system ones) and the memory still remained high (we are looking in the task manager's performance tab). Only a reboot does the job at that point. We have tried on Win7, WinXP, Win2K3 server with always the same results.
Unfortunately, this isn't a one-shot deal, it happens every time.
Has anybody seen that kind of behaviour before? Are we doing something wrong using the SQL FILESTREAMS?

You say you insert a lot of images per day. What else do you do with the images? Do you update them, many reads?
Is your file system optimized for FILESTREAMs?
How do you read out the images?
If you do a lot of updates, remember that SQL Server will not modify the filestream object but create a new one and mark the old for deletion by the garbage collector. At some time the GC will trigger and start cleaning up the old mess. The problem with FILESTREAM is that it doesn't log a lot to the transaction log and thus the GC can be seriously delayed. If this is the problem it might be solved by forcing GC more often to maintain responsiveness. This can be done using the CHECKPOINT statement.
UPDATE: You shouldn't use FILESTREAM for small files (less than 1 MB). Millions of small files will cause problems for the filesystem and the Master File Table. Use varbinary in stead. See also Designing and implementing FILESTREAM storage
UPDATE 2: If you still insist on using the FILESTREAM for storage (you shouldn't for large amounts of small files), you must at least configure the file system accordingly.
Optimize the file system for large amount of small files (use these as tips and make sure you understand what they do before you apply)
Change the Master File Table
reservation to maximum in registry (FSUTIL.exe behavior set mftzone 4)
Disable 8.3 file names (fsutil.exe behavior set disable8dot3 1)
Disable last access update(fsutil.exe behavior set disablelastaccess 1)
Reboot and create a new partition
Format the storage volumes using a
block size that will fit most of the
files (2k or 4k depending on you
image files).

Related

In memory databases with LMDB

I have a project which uses BerkelyDB as a key value store for up to hundreds of millions of small records.
The way it's used is all the values are inserted into the database, and then they are iterated over using both sequential and random access, all from a single thread.
With BerkeleyDB, I can create in-memory databases that are "never intended to be preserved on disk". If the database is small enough to fit in the BerkeleyDB cache, it will never be written to disk. If it is bigger than the cache, then a temporary file will be created to hold the overflow. This option can speed things up significantly, as it prevents my application from writing gigabytes of dead data to disk when closing the database.
I have found that the BerkeleyDB write performance is too poor, even on an SSD, so I would like to switch to LMDB. However, based on the documentation, it doesn't seem like there is an option creating a non-persistent database.
What configuration/combination of options should I use to get the best performance out of LMDB if I don't care about persistence or concurrent access at all? i.e. to make it act like an "in-memory database" with temporary backing disk storage?
Just use MDB_NOSYNC and never call mdb_env_sync() yourself. You could also use MDB_WRITEMAP in addition. The OS will still eventually flush dirty pages to disk; you can play with /proc/sys/vm/dirty_ratio etc. to control that behavior.
From this post: https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
vm.dirty_ratio is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. When the system gets to this point all new I/O blocks until dirty pages have been written to disk.
If the dirty ratio is too small, then you will see frequent synchronous disk writes.

TempDB performance crawling; should we reboot?

A bit of background: We have 17 different TempDB database files and 6 TempDB log files on the server. These are spread out on different drives, but are hosted on 2 drive arrays.
I’m seeing Disk IO response times exceeding the recommended limits. Typically you want your disks to respond in 5-10ms, with nothing going over 200ms. We’re seeing random spikes up to 800ms on the TempDB files, but only on one drive array.
Proposed solution: Restart SQL server. While SQL server is shut down, reboot the drive array hosting the majority of the TempDB files. In addition, while SQL is down, redo the network connection to bypass the network switch in an attempt to eliminate any source of slowness on the hardware.
Is this a good idea or a shot in the dark? Any ideas?
Thanks in advance.
17? Who came up with that number? Please read this and this - very few scenarios where > 8 files will help, particularly if you only have 2 underlying arrays/controllers. Some suggestions:
Use an even number of files. Most folks start with 4 or 8, and increase beyond that only when they've proven that they still have contention (and also know that their underlying I/O can actually handle more files and scale with them; in some cases it will have no effect or the exact opposite effect - a different drive letter does not necessarily mean better I/O pathing).
Make sure all of the data files are sized the same, and have identical autogrow settings. Having 17 files with different sizes and autogrowth settings will defeat round robin - in a lot of cases only one file will be used due to the way SQL Server performs proportional fill. And having an odd number just seems... well, odd to me.
Get rid of your 5 extra log files. They are absolutely useless.
Use trace flag 1117 to make sure that all the data files grow at the same time and (because of 2.) at the same rate. Note though that this trace flag applies to all databases, not just tempdb. More info here.
You can also consider trace flag 1118 to change allocation, but please read this first.
Make sure instant file initialization is on, so that the file doesn't have to be zeroed out when it expands.
Pre-size your tempdb files so that they don't have to grow during normal day-to-day activity. Do not shrink tempdb files because they suddenly got big - this is just a rinse and repeat operation, since if they got that big once, they'll get that big again. It's not like you can lease out the recovered space in the meantime.
When possible, perform DBCC CHECKDB elsewhere. If you're running CHECKDB regularly, yay! Pat yourself on the back. However this can take a toll on tempdb - please see this article on optimizing this operation and pulling it away from your production instance where feasible.
Finally, validate what type of contention you're seeing. You say that tempdb performance crawls, but in what way? How are you measuring this? Some info on determining the exact nature of tempdb bottlenecks here and here and here and here and here.
Have you considered making less use of tempdb explicitly (fewer #temp tables, #table variables, and static cursors - or cursors altogether)? Are you making heavy use of RCSI, or MARS, or LOB-type local variables?

what is faster database querys or file writing/reading

I know that in normal cases is faster to read/write from a file, but if I created a chat system:
Would it be faster to write and read from a file or to insert/select data in a db and cahe results?
Database is faster. AND importantly for you, deals with concurrent access.
Do you really want a mechanical disk action every time someone types? Writing to disk is a horrible idea. Cache messages in memory. Clear the message once it is sent to all users in the room. The cache will stay small, most of the time empty. This is your best option if you don't need a history log.
But if you need a log....
If you write a large amount of data in 1 pass, I guarantee the file will smoke database insert performance. A bulk insert feature of the database may match the file, but it requires a file data source to begin with. You would need to queue up a lot of messages in memory, then periodically flush to the file.
For many small writes the gap will close and the database will pull ahead. Indexes will influence the insert speed. If thousands of users are inserting to a heavily indexed table you may have problems.
Do your own tests to prove what is faster. Simulate a realistic load, not a 1 user test.
Databases by far.
Databases are optimized for data storage which is constantly updated and changed as in your case. File storage is for long-term storage with few changes.
(even if files were faster I would still go with databases because it's easier to develop and maintain)
Since I presume your system would write/read data continuously (as people type their messages), writing them to a file would take longer time because of the file handling procedure, i.e.
open file for writing
lock file
write & save
unlock file
I would go with db.

Advice on using a web server as a cache

I'd like advice on the following design. Is it reasonable? Is it stupid/insane?
Requirements:
We have some distributed calculations that work on chunks of data that are sometimes up to 50Mb in size.
Because the calculations take a long time, we like to parallelize the calculations on a small grid (around 20 nodes)
We "produce" around 10000 of these "chunks" of binary data each day - and want to keep them around for up to a year... Most of the items aren't 50Mb in size though, so the total daily space requirement is more around 5Gb... But we'd like to keep stuff around for as long as possible, (a year or more)... But hey, you can get 2TB hard disks nowadays.
Although we'd like to keep the data around, this is essentially a "cache". It's not the end of the world if we lose data - it just has to get recalculated, which just takes some time (an hour or two).
We need to be able to efficiently get a list of all "chunks" that were produced on a particular day.
We often need to, from a support point of view, delete all chunks created on a particular day or remove all chunks created within the last hour.
We're a Windows shop - we can't easily switch to Linux/some other OS.
We use SQLServer for existing database requirements.
However, it's a large and reasonably bureaucratic company that has some policies that limit our options: for example, conventional database space using SQLServer is charged internally at extremely expensive prices. Allocating 2 terabytes of SQL Server space is prohibitively expensive. This is mainly because our SQLServer instances are backed up, archived for 7 years, etc. etc. But we don't need this "gold-plated" functionality because we can just recreate the stuff if it goes missing. At heart, it's just a cache, that can be recreated on demand.
Running our own SQLServer instance on a machine that we maintain is not allowed (all SQLServer instances must be managed by a separate group).
We do have fairly small transactional requirement: if a process that was producing a chunk dies halfway through, we'd like to be able to detect such "failed" transactions.
I'm thinking of the following solution, mainly because it seems like it would be very simple to implement:
We run a web server on top of a windows filesystem (NTFS)
Clients "save" and "load" files by using HTTP requests, and when processes need to send blobs to each other, they just pass the URLs.
Filenames are allocated using GUIDS - but have a directory for each date. So all of the files created on 12th November 2010 would go in a directory called "20101112" or something like that. This way, by getting a "directory" for a date we can find all of the files produced for that date using normal file copy operations.
Indexing is done by a traditional SQL Server table, with a "URL" column instead of a "varbinary(max)" column.
To preserve the transactional requirement, a process that is creating a blob only inserts the corresponding "index" row into the SQL Server table after it has successfully finished uploading the file to the web server. So if it fails or crashes halfway, such a file "doesn't exist" yet because the corresponding row used to find it does not exist in the SQL server table(s).
I like the fact that the large chunks of data can be produced and consumed over a TCP socket.
In summary, we implement "blobs" on top of SQL Server much the same way that they are implemented internally - but in a way that does not use very much actual space on an actual SQL server instance.
So my questions are:
Does this sound reasonable. Is it insane?
How well do you think that would work on top of a typical windows NT filesystem? - (5000 files per "dated" directory, several hundred directories, one for each day). There would eventually be many hundreds of thousands of files, (but not too many directly underneath any one particular directory). Would we start to have to worry about hard disk fragmentation etc?
What about if 20 processes are all, via the one web server, trying to write 20 different "chunks" at the same time - would that start thrashing the disk?
What web server would be the best to use? It needs to be rock solid, runs on windows, able to handle lots of concurrent users.
As you might have guessed, outside of the corporate limitations, I would probably set up a SQLServer instance and just have a table with a "varbinary(max)" column... But given that is not an option, how well do you think this would work?
This is all somewhat out of my usual scope so I freely admit I'm a bit of a Noob in this department. Maybe this is an appalling design... but it seems like it would be very simple to understand how it works, and to maintain and support it.
Your reasons behind the design are insane, but they're not yours :)
NTFS can handle what you're trying to do. This shouldn't be much of a problem. Yes, you might eventually have fragmentation problems if you run low on disk space, but make sure that you have copious amounts of space and you shouldn't have a problem. If you're a Windows shop, just use IIS.
I really don't think you will have much of a problem with this architecture. Just keep it simple like you're doing and things should be fine.

performance of web app with high number of inserts

What is the best IO strategy for a high traffic web app that logs user behaviour on a website and where ALL of the traffic will result in an IO write? Would it be to write to a file and overnight do batch inserts to the database? Or to simply do an INSERT (or INSERT DELAYED) per request? I understand that to consider this problem properly much more detail about the architecture would be needed, but a nudge in the right direction would be much appreciated.
By writing to the DB, you allow the RDBMS to decide when disk IO should happen - if you have enough RAM, for instance, it may be effectively caching all those inserts in memory, writing them to disk when there's a lighter load, or on some other scheduling mechanism.
Writing directly to the filesystem is going to be bandwidth-limited more-so than writing to a DB which then writes, expressly because the DB can - theoretically - write in more efficient sizes, contiguously, and at "convenient" times.
I've done this on a recent app. Inserts are generally pretty cheap (esp if you put them into an unindexed hopper table). I think that you have a couple of options.
As above, write data to a hopper table, if what ever application framework supports batched inserts, then use these, it will speed it up. Then every x requests, do a merge (via an SP call) into a master table, where you can normalize off data that has low entropy. For example if you are storing if the HTTP type of the request (get/post/etc), this can only ever be a couple of types, and better to store as an Int, and get improved I/O + query performance. Your master tables can also be indexed as you would normally do.
If this isn't good enough, then you can stream the requests to files on the local file system, and then have an out of band (i.e seperate process from the webserver) suck these files up and BCP them into the database. This will be at the expense of more moving parts, and potentially, a greater delay between receiving requests and them finding their way into the database
Hope this helps, Ace
When working with an RDBMS the most important thing is optimizing write operations to disk. Something somewhere has got to flush() to persistant storage (disk drives) to complete each transaction which is VERY expensive and time consuming. Minimizing the number of transactions and maximizing the number of sequential pages written is key to performance.
If you are doing inserts sending them in bulk within a single transaction will lead to more effecient write behavior on disk reducing the number of flush operations.
My recommendation is to queue the messages and periodically .. say every 15 seconds or so start a transaction ... send all queued inserts ... commit the transaction.
If your database supports sending multiple log entries in a single request/command doing so can have a noticable effect on performance when there is some network latency between the application and RDBMS by reducing the number of round trips.
Some systems support bulk operations (BCP) providing a very effecient method for bulk loading data which can be faster than the use of "insert" queries.
Sparing use of indexes and selection of sequential primary keys help.
Making sure multiple instances either coordinate write operations or write to separate tables can improve throughput in some instances by reducing concurrency management overhead in the database.
Write to a file and then load later. It's safer to be coupled to a filesystem than to a database. And the database is more likely to fail than the your filesystem.
The only problem with using the filesystem to back writes is how you extend the log.
A poorly implemented logger will have to open the entire file to append a line to the end of it. I witnessed one such example case where the person logged to a file in reverse order, being the most recent entries came out first, which required loading the entire file into memory, writing 1 line out to the new file, and then writing the original file contents after it.
This log eventually exceeded phps memory limit, and as such, bottlenecked the entire project.
If you do it properly however, the filesystem reads/writes will go directly into the system cache, and will only be flushed to disk every 10 or more seconds, ( depending on FS/OS settings ) which has a negligible performance hit compared to writing to arbitrary memory addresses.
Oh yes, and whatever system you use, you'll need to think about concurrent log appending. If you use a database, a high insert load can cause you to have deadlock conditions, and on files, you need to make sure that you're not going to have 2 concurrent writes cancel each other out.
The insertions will generally impact the (read/update) performance of the table. Perhaps you can do the writes to another table (or database) and have batch job that processes this data. The advantages of the database approach is that you can query/report on the data and all the data is logically in a relational database and may be easier to work with. Depending on how the data is logged to text file, you could open up more possibilities for corruption.
My instinct would be to only use the database, avoiding direct filesystem IO at all costs. If you need to produce some filesystem artifact, then I'd use a nightly cron job (or something like it) to read DB records and write to the filesystem.
ALSO: Only use "INSERT DELAYED" in cases where you don't mind losing a few records in the event of a server crash or restart, because some records almost certainly WILL be lost.
There's an easier way to answer this. Profile the performance of the two solutions.
Create one page that performs the DB insert, another that writes to a file, and another that does neither. Otherwise, the pages should be identical. Hit each page with a load tester (JMeter for example) and see what the performance impact is.
If you don't like the performance numbers, you can easily tweak each page to try and optimize performance a bit or try new solutions... everything from using MSMQ backed by MSSQL to delayed inserts to shared logs to individual files with a DB background worker.
That will give you a solid basis to make this decision rather than depending on speculation from others. It may turn out that none of the proposed solutions are viable or that all of them are viable...
Hello from left field, but no one asked (and you didn't specify) how important is it that you never, ever lose data?
If speed is the problem, leave it all in memory, and dump to the database in batches.
Do you log more than what would be available in the webserver logs? It can be quite a lot, see Apache 2.0 log information for example.
If not, then you can use the good old technique of buffering then batch writing. You can buffer at different places: in memory on your server, then batch insert them in db or batch write them in a file every X requests, and/or every X seconds.
If you use MySQL there are several different options/techniques to load efficiently a lot of data: LOAD DATA INFILE, INSERT DELAYED and so on.
Lots of details on insertion speeds.
Some other tips include:
splitting data into different tables per period of time (ie: per day or per week)
using multiple db connections
using multiple db servers
have good hardware (SSD/multicore)
Depending on the scale and resources available, it is possible to go different ways. So if you give more details, i can give more specific advices.
If you do not need to wait for a response such as a generated ID, you may want to adopt an asynchronous strategy using either a message queue or a thread manager.

Resources