Storing a file in a database as opposed to the file system? - sql-server

Generally, how bad of a performance hit is storing a file in a database (specifically mssql) as opposed to the file system? I can't come up with a reason outside of application portability that I would want to store my files as varbinaries in SQL Server.

Have a look at this answer:
Storing Images in DB - Yea or Nay?
Essentially, the space and performance hit can be quite big, depending on the number of users. Also, keep in mind that Web servers are cheap and you can easily add more to balance the load, whereas the database is the most expensive and hardest to scale part of a web architecture usually.
There are some opposite examples (e.g., Microsoft Sharepoint), but usually, storing files in the database is not a good idea.
Unless possibly you write desktop apps and/or know roughly how many users you will ever have, but on something as random and unexpectable like a public web site, you may pay a high price for storing files in the database.

If you can move to SQL Server 2008, you can take advantage of the FILESTREAM support which gives you the best of both - the files are stored in the filesystem, but the database integration is much better than just storing a filepath in a varchar field. Your query can return a standard .NET file stream, which makes the integration a lot simpler.
Getting Started with FILESTREAM Storage

I'd say, it depends on your situation. For example, I work in local government, and we have lots of images like mugshots, etc. We don't have a high number of users, but we need to have good security and auditing around the data. The database is a better solution for us since it makes this easier and we aren't going to run into scaling problems.

What's the question here?
Modern DBMS SQL2008 have a variety of ways of dealing with BLOBs which aren't just sticking in them in a table. There are pros and cons, of course, and you might need to think about it a little deeper.
This is an interesting paper, by the late (?) Jim Gray
To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem

In my own experience, it is always better to store files as files. The reason is that the filesystem is optimised for file storeage, whereas a database is not. Of course, there are some exceptions (e.g. the much heralded next-gen MS filesystem is supposed to be built on top of SQL server), but in general that's my rule.

While performance is an issue, I think modern database designs have made it much less of an issue for small files.
Performance aside, it also depends on just how tightly-coupled the data is. If the file contains data that is closely related to the fields of the database, then it conceptually belongs close to it and may be stored in a blob. If it contains information which could potentially relate to multiple records or may have some use outside of the context of the database, then it belongs outside. For example, an image on a web page is fetched on a separate request from the page that links to it, so it may belong outside (depending on the specific design and security considerations).
Our compromise, and I don't promise it's the best, has been to store smallish XML files in the database but images and other files outside it.

We made the decision to store as varbinary for http://www.freshlogicstudios.com/Products/Folders/ halfway expecting performance issues. I can say that we've been pleasantly surprised at how well it's worked out.

I agree with #ZombieSheep.
Just one more thing - I generally don't think that databases actually need be portable because you miss all the features your DBMS vendor provides. I think that migrating to another database would be the last thing one would consider. Just my $.02

The overhead of having to parse a blob (image) into a byte array and then write it to disk in the proper file name and then reading it is enough of an overhead hit to discourage you from doing this too often, especially if the files are rather large.

Not to be vague or anything but I think the type of 'file' you will be storing is one of the biggest determining factors. If you essentially talking about a large text field which could be stored as file my preference would be for db storage.

Interesting topic.
There is no absolutely one correct answer to this question.
There are few key elements to consider:
What’s your database engine?
What’s the route of file from database to end user and/or backwards?
What are the security requirements?
If files are meant for public audience and accessible via website, you shouldn’t even consider storing files in database. Use some smart indexing for files instead.
If files are containing highly sensitive information, then it might be worth of storing these into database. But you have to implement proper safe gateways too.
If performance is crucial, it’s better do not store files in database.
Backup and restoring and migrating of database might become a nightmare if database grows big just because of files. If you are DBA, then you would like to kill the person who “invented” an idea to put files into database.
I recommend to use storing files into database at last option, when there is absolutely no any better alternative available.

Related

Should I use file system files or a database for files such as images?

I recently had an interview. In the context of my mini project explanation, the interviewer asked me "where did you store the users uploaded image files?".
I replied that I used the file system rather than database. I also added that doing so wouldn't degrade the performance.
(Those images can be viewed by anyone, so there might be accesses from many users for an image).
To my surprise, he said that when your file is accessed by many, then you should use the database.
I argued with him that when the data has no concerns with atomicity, consistency, isolation, durability then why use a database.....
But he kept saying that when multiple users are accessing same file, then a database is to be used. He also said that many organizations do the same and that-why would there be a blob type in databases if storing files is not done in database.
I nodded to him; I didn't want any further debating.
So, now I'm confused. Is what he said true? Should one use a database for files when there are multiple accesses? Do many organizations do that?
If yes, please explain me why is that so.
As my understanding, you should store file path in DB to improve performance, BUT absolutely different opinion storing image data in DB, this is completely wrong. normally database should keep high efficiency, reliability, so don't make database file is too large, I am sure my concept is definitely true. so your understanding is true, in order to improve performance when many users access one file at same time, you have to use other cache solution.
The issue for me is two fold
A) backup - if images in db then the db will get much larger, if images in filestore then need to have two backup schemes - DB and filestore
B) retrieval - getting blob/binary datatypes in and out of DB is no as simple as plain datatypes and some ORM tools can not handle this.
extra considerations is cost of storage, can have very fast disks on db server and slower but very much bigger ones on a file server - get return on investment.
Personally i store images in a fileserver and link in the database

database vs. flat files

The company I work for is trying to switch a product that uses flat file format to a database format. We're handling pretty big files of data (ie: 25GB/file) and they get updated really quick. We need to run queries that randomly access the data, as well as in a contiguous way. I am trying to convince them of the advantages of using a database, but some of my colleagues seem reluctant to this. So I was wondering if you guys can help me out here with some reasons or links to posts of why we should use databases, or at least clarify why flat files are better (if they are).
Databases can handle querying
tasks, so you don't have to walk
over files manually. Databases can
handle very complicated queries.
Databases can handle indexing tasks,
so if tasks like get record with id
= x can be VERY fast
Databases can handle multiprocess/multithreaded access.
Databases can handle access from
network
Databases can watch for data
integrity
Databases can update data easily
(see 1) )
Databases are reliable
Databases can handle transactions
and concurrent access
Databases + ORMs let you manipulate
data in very programmer friendly way.
This is an answer I've already given some time ago:
It depends entirely on the
domain-specific application needs. A
lot of times direct text file/binary
files access can be extremely fast,
efficient, as well as providing you
all the file access capabilities of
your OS's file system.
Furthermore, your programming language
most likely already has a built-in
module (or is easy to make one) for
specific parsing.
If what you need is many appends
(INSERTS?) and sequential/few access
little/no concurrency, files are the
way to go.
On the other hand, when your
requirements for concurrency,
non-sequential reading/writing,
atomicity, atomic permissions, your
data is relational by the nature etc.,
you will be better off with a
relational or OO database.
There is a lot that can be
accomplished with SQLite3, which
is extremely light (under 300kb), ACID
compliant, written in C/C++, and
highly ubiquitous (if it isn't already
included in your programming language
-for example Python-, there is surely one available). It can be useful even
on db files as big as 140 terabytes, or 128 tebibytes (Link to Database Size), possible
more.
If your requirements where bigger,
there wouldn't even be a discussion,
go for a full-blown RDBMS.
As you say in a comment that "the system" is merely a bunch of scripts, then you should take a look at pgbash.
Don't build it if you can buy it.
I heard this quote recently, and it really seems fitting as a guide line. Ask yourself this... How much time was spent working on the file handling portion of your app? I suspect a fair amount of time was spent optimizing this code for performance. If you had been using a relational database all along, you would have spent considerably less time handling this portion of your application. You would have had more time for the true "business" aspect of your app.
They're faster; unless you're loading the entire flat file into memory, a database will allow faster access in almost all cases.
They're safer; databases are easier to safely backup; they have mechanisms to check for file corruption, which flat files do not. Once corruption in your flat file migrates to your backups, you're done, and you might not even know it yet.
They have more features; databases can allow many users to read/write at the same time.
They're much less complex to work with, once they're setup.
What types of files is not mentioned. If they're media files, go ahead with flat files. You probably just need a DB for tags and some way to associate the "external BLOBs" to the records in the DB. But if full text search is something you need, there's no other way to go but migrate to a full DB.
Another thing, your filesystem might provide the ceiling as far as number of physical files are concerned.
Databases all the way.
However, if you still have a need for storing files, don't have the capacity to take on a new RDBMS (like Oracle, SQLServer, etc), than look into XML.
XML is a structure file format which offers you the ability to store things as a file but give you query power over the file and data within it. XML Files are easier to read than flat files and can be easily transformed applying an XSLT for even better human-readability. XML is also a great way to transport data around if you must.
I strongly suggest a DB, but if you can't go that route, XML is an ok second.
What about a non-relational (NoSQL) database such as Amazon's SimpleDB, Tokio Cabinet, etc? I've heard that Google, Facebook, LinkedIn are using these to store their huge datasets.
Can you tell us if your data is structured, if your schema is fixed, if you need easy replicability, if access times are important, etc?
Difference between database and flat files are given below:
Database provide more flexibility whereas flat file provide less flexibility.
Database system provide data consistency whereas flat file can not provide data consistency.
Database is more secure over flat files.
Database support DML and DDL whereas flat files can not support these.
Less data redundancy in database whereas more data redundancy in flat files.
SQL ad hoc query abilities are enough of a reason for me. With a good schema and indexing on the tables, this is fast and effective and will have good performance.
Unless you are loading the files into memory each time you boot, use a database. Simple as that.
That is assuming that your colleges already have the program to handle queries to the files. If not, then use a database.
Although other answers are good, I would like to emphasize a point that was not really well talked about:
The developer's ease of use. databases are much simpler to work with! If you don't have any strong reason(s) for using files, use a database.

Caching moderate amounts of data in a web app - DB or flat files?

A web app I'm working on requires frequent parsing of diverse web resources (HTML, XML, RSS, etc). Once downloaded, I need to cache these resources to minimize network load. The app requires a very straightforward cache policy: only re-download a cached resource when more than X minutes have passed since the access time.
Should I:
Store both the access time (e.g. 6/29/09 at 10:50 am) and the resource itself in the database.
Store the access time and a unique identifier in the database. The unique identifier is the filename of the resource, stored on the local disk.
Use another approach or third party software solution.
Essentially, this question can be re-written as, "Which is better for storing moderate amounts of data - a database or flat files?"
Thanks for your help! :)
NB: The app is running on a VPS, so size restrictions on the database/flat files do not apply.
To answer your question: "Which is better for storing moderate amounts of data - a database or flat files?"
The answer is (in my opinion) Flat Files. Flat files are easier to backup, and easier to remove.
However, you have extra information that isn't encapsulated in this question, mainly the fact that you will need to access this stored data to determine if a resource has gone stale.
Given this need, it makes more sense to store it in a database. Flat Files do not lend themselves well for random access, and search, compared to a relational DB.
Depends on the platform, IF you use .NET
The answer is 3, use Cache object, ideally suited for this in ASP.NET
You can set time and dependency expiration,
this doc explains the cache object
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-5034946.html
Neither.
Have a look at memcached to see if it works with your server/client platform. This is easier to set up and performs much better than filesystem/rdbms based caching, provided you can spare the RAM needed for the data being cached.
All of the proposed solutions are reasonable. However, for my particular needs, I went with flat files. Oddly enough, though, I did so for reasons not mentioned in some of the other answers. It doesn't really matter to me that flat files are easier to backup and remove, and both DB and flat-file solutions allow for easy checking of whether or not the cached data has gone stale. I went with flat files first and foremost because, on my mid-sized one-box VPS LAMP architecture, I think it will be faster than a third-party cache or DB-based solution.
Thanks to all for your thoughts! :)

Storing a small number of images: blob or fs?

I'm adding some functionality to my site so that users can upload their own profile pictures, so I was wondering about whether to store them in the database as a BLOB, or put them in the file system.
I found a question similar to this here: Storing images in DB: Yea or Nay, but the answers given were geared more towards people expecting many many thousands or even millions of images, whereas I'm more concerned about small images (JPEGs up to maybe 150x150 pixels), and small numbers of them: perhaps up to one or two thousand.
What are the feelings about DB BLOB vs Filesystem for this scenario? How do clients go with caching images from the DB vs from the filesystem?
If BLOBs stored in the DB are the way to go - is there anything I should know about where to store them? Since I imagine that a majority of my users won't be uploading a picture, should I create a user_pics table to (outer) join to the regular users table when needed?
Edit: I'm reopening this question, because it's not a duplicate of those two you linked to. This question is specifically about the pros/cons of using a DB or FS for a SMALL number of images. As I said above, the other question is targeted towards people who need to store thousands upon thousands of large images.
To answer parts of your question:
How do clients go with caching images from the DB vs from the filesystem?
For a database: Have a last_modified field in your database. Use the Last-Modified HTTP header so the client's browser can cache properly. Be sure to send the appropriate responses when the browser requests for an image "if newer" (can't recall what it's called; some HTTP request header).
For a filesystem: Do the same thing, but with the file's modified time.
If BLOBs stored in the DB are the way to go - is there anything I should know about where to store them? Since I imagine that a majority of my users won't be uploading a picture, should I create a user_pics table to (outer) join to the regular users table when needed?
I would put the BLOB and related metadata in its own table, with some kind of relation between it and your user table. Doing this will make it easier to optimize the table storage method for your data, makes things tidier, and leaves room for expandability (e.g. a general "files" table).
I once faced a similar question with a small DMS for pdf files. The scenario was different from yours: A maximum of may be 100 files with sizes up to 10 MB each - not what you expect for profile pictures. But the answer a friend gave me back then applies to your case as well:
Use each storage system for what it is designed to do.
Store data in a database. Store files in a file system.
This is not the ultimate answer(*), but its a good rule of thumb for starters.
I have never heard of the Windows FS being slow and sometimes unreliable, as Aaron Digulla states in his answer. If there are such problems, this certainly needs to be factored in. But for avatar pictures, it does not strike me as important.
(*) I know, I know, 42...
DB is optimized for latency, transactions, etc.
Image storage is optimized for read latency, storage cost, etc.
A blob store is ideal for storing millions of images. I work on SeaweedFS. It was based on Facebook's design for storing their user photos.
What would be more convenient, from the perspective of serving them, writing the code to serve them, backup procedures, etc.? You want the right answer for you, not the right answer for someone else.
From my point of view anything what may be left outside of database should stay outside. It may be file system or separate tables which you do not replicate or backup every day. It makes database much lighter, it grows slower and it easier to understand and maintain.
If you are on MSSQL make sure that blobs are stored in separate data file. Not in PRIMARY as everything else.
On Windows, put as much as you can in the database. The filesystem is somewhat slow and sometimes even unreliable.
On Linux, you have more options. Here, you should consider moving big files into a filesystem and just keep the name in the DB. If you use a modern filesystem like Ext3 or ReiseFS, you can even create many small files with pretty good performance.
You also need to take into account how you can access the data. If you have everything in the DB, you have one access path, need not worry about another set of permissions, but you have to deal with the extra complexity of reading/writing BLOBs. In many DBs, BLOBs can't be searched.
On the filesystem, you can run other tools on your data which isn't possible if the files are stored in a DB.
I would store them in the database:
Backup/restore is easy (if you backup files and also the database, point-in-time recovery is more complicated)
Transactions in the db mean you should never end up pointing at a file-name that is not there
Less chance someone is going to figure out a sneaky way of putting a script onto your server via a dodgy image upload hack
Since you are talking about a small number of images, ease of use/administration should take preference over performance issues which are debated in the linked questions.
I think there is a managability advantage storing them in the database; they can be backed up and restored consistently with the other data - you won't forget to delete obsolete ones (well, you might, but it's a bit less likely), and if you migrate the database to another machine, the images go with it.

Storing Images in DB - Yea or Nay?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB.
What do you think are the pros/cons?
I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.
There are a couple of issues:
database storage is usually more expensive than file system storage
you can super-accelerate file system access with standard off the shelf products
for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
things like web servers, etc, need no special coding or processing to access images in the file system
databases win out where transactional integrity between the image and metadata are important.
it is more complex to manage integrity between db metadata and file system data
it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem
As with most issues, it's not as simple as it sounds. There are cases where it would make sense to store the images in the database.
You are storing images that are
changing dynamically, say invoices and you wanted
to get an invoice as it was on 1 Jan
2007?
The government wants you to maintain 6 years of history
Images stored in the database do not require a different backup strategy. Images stored on filesystem do
It is easier to control access to the images if they are in a database. Idle admins can access any folder on disk. It takes a really determined admin to go snooping in a database to extract the images
On the other hand there are problems associated
Require additional code to extract
and stream the images
Latency may be
slower than direct file access
Heavier load on the database server
File store. Facebook engineers had a great talk about it. One take away was to know the practical limit of files in a directory.
Needle in a Haystack: Efficient Storage of Billions of Photos
This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.
FileStream solves most of the problems around storing the files in the DB:
The Blobs are actually stored as files in a folder.
The Blobs can be accessed using either a database connection or over the filesystem.
Backups are integrated.
Migration "just works".
However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.
From the MSDN Article:
Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.
File paths in the DB is definitely the way to go - I've heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much.
In my experience, sometimes the simplest solution is to name the images according to the primary key. So it's easy to find the image that belongs to a particular record, and vice versa. But at the same time you're not storing anything about the image in the database.
The trick here is to not become a zealot.
One thing to note here is that no one in the pro file system camp has listed a particular file system. Does this mean that everything from FAT16 to ZFS handily beats every database?
No.
The truth is that many databases beat many files systems, even when we're only talking about raw speed.
The correct course of action is to make the right decision for your precise scenario, and to do that, you'll need some numbers and some use case estimates.
In places where you MUST guarantee referential integrity and ACID compliance, storing images in the database is required.
You cannot transactionaly guarantee that the image and the meta-data about that image stored in the database refer to the same file. In other words, it is impossible to guarantee that the file on the filesystem is only ever altered at the same time and in the same transaction as the metadata.
As others have said SQL 2008 comes with a Filestream type that allows you to store a filename or identifier as a pointer in the db and automatically stores the image on your filesystem which is a great scenario.
If you're on an older database, then I'd say that if you're storing it as blob data, then you're really not going to get anything out of the database in the way of searching features, so it's probably best to store an address on a filesystem, and store the image that way.
That way you also save space on your filesystem, as you are only going to save the exact amount of space, or even compacted space on the filesystem.
Also, you could decide to save with some structure or elements that allow you to browse the raw images in your filesystem without any db hits, or transfer the files in bulk to another system, hard drive, S3 or another scenario - updating the location in your program, but keep the structure, again without much of a hit trying to bring the images out of your db when trying to increase storage.
Probably, it would also allow you to throw some caching element, based on commonly hit image urls into your web engine/program, so you're saving yourself there as well.
Small static images (not more than a couple of megs) that are not frequently edited, should be stored in the database. This method has several benefits including easier portability (images are transferred with the database), easier backup/restore (images are backed up with the database) and better scalability (a file system folder with thousands of little thumbnail files sounds like a scalability nightmare to me).
Serving up images from a database is easy, just implement an http handler that serves the byte array returned from the DB server as a binary stream.
Here's an interesting white paper on the topic.
To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
The answer is "It depends." Certainly it would depend upon the database server and its approach to blob storage. It also depends on the type of data being stored in blobs, as well as how that data is to be accessed.
Smaller sized files can be efficiently stored and delivered using the database as the storage mechanism. Larger files would probably be best stored using the file system, especially if they will be modified/updated often. (blob fragmentation becomes an issue in regards to performance.)
Here's an additional point to keep in mind. One of the reasons supporting the use of a database to store the blobs is ACID compliance. However, the approach that the testers used in the white paper, (Bulk Logged option of SQL Server,) which doubled SQL Server throughput, effectively changed the 'D' in ACID to a 'd,' as the blob data was not logged with the initial writes for the transaction. Therefore, if full ACID compliance is an important requirement for your system, halve the SQL Server throughput figures for database writes when comparing file I/O to database blob I/O.
One thing that I haven't seen anyone mention yet but is definitely worth noting is that there are issues associated with storing large amounts of images in most filesystems too. For example if you take the approach mentioned above and name each image file after the primary key, on most filesystems you will run into issues if you try to put all of the images in one big directory once you reach a very large number of images (e.g. in the hundreds of thousands or millions).
Once common solution to this is to hash them out into a balanced tree of subdirectories.
Something nobody has mentioned is that the DB guarantees atomic actions, transactional integrity and deals with concurrency. Even referentially integrity is out of the window with a filesystem - so how do you know your file names are really still correct?
If you have your images in a file-system and someone is reading the file as you're writing a new version or even deleting the file - what happens?
We use blobs because they're easier to manage (backup, replication, transfer) too. They work well for us.
The problem with storing only filepaths to images in a database is that the database's integrity can no longer be forced.
If the actual image pointed to by the filepath becomes unavailable, the database unwittingly has an integrity error.
Given that the images are the actual data being sought after, and that they can be managed easier (the images won't suddenly disappear) in one integrated database rather than having to interface with some kind of filesystem (if the filesystem is independently accessed, the images MIGHT suddenly "disappear"), I'd go for storing them directly as a BLOB or such.
At a company where I used to work we stored 155 million images in an Oracle 8i (then 9i) database. 7.5TB worth.
Normally, I'm storngly against taking the most expensive and hardest to scale part of your infrastructure (the database) and putting all load into it. On the other hand: It greatly simplifies backup strategy, especially when you have multiple web servers and need to somehow keep the data synchronized.
Like most other things, It depends on the expected size and Budget.
We have implemented a document imaging system that stores all it's images in SQL2005 blob fields. There are several hundred GB at the moment and we are seeing excellent response times and little or no performance degradation. In addition, fr regulatory compliance, we have a middleware layer that archives newly posted documents to an optical jukebox system which exposes them as a standard NTFS file system.
We've been very pleased with the results, particularly with respect to:
Ease of Replication and Backup
Ability to easily implement a document versioning system
If this is web-based application then there could be advantages to storing the images on a third-party storage delivery network, such as Amazon's S3 or the Nirvanix platform.
Assumption: Application is web enabled/web based
I'm surprised no one has really mentioned this ... delegate it out to others who are specialists -> use a 3rd party image/file hosting provider.
Store your files on a paid online service like
Amazon S3
Moso Cloud Storage
Another StackOverflow threads talking about this here.
This thread explains why you should use a 3rd party hosting provider.
It's so worth it. They store it efficiently. No bandwith getting uploaded from your servers to client requests, etc.
If you're not on SQL Server 2008 and you have some solid reasons for putting specific image files in the database, then you could take the "both" approach and use the file system as a temporary cache and use the database as the master repository.
For example, your business logic can check if an image file exists on disc before serving it up, retrieving from the database when necessary. This buys you the capability of multiple web servers and fewer sync issues.
I'm not sure how much of a "real world" example this is, but I currently have an application out there that stores details for a trading card game, including the images for the cards. Granted the record count for the database is only 2851 records to date, but given the fact that certain cards have are released multiple times and have alternate artwork, it was actually more efficient sizewise to scan the "primary square" of the artwork and then dynamically generate the border and miscellaneous effects for the card when requested.
The original creator of this image library created a data access class that renders the image based on the request, and it does it quite fast for viewing and individual card.
This also eases deployment/updates when new cards are released, instead of zipping up an entire folder of images and sending those down the pipe and ensuring the proper folder structure is created, I simply update the database and have the user download it again. This currently sizes up to 56MB, which isn't great, but I'm working on an incremental update feature for future releases. In addition, there is a "no images" version of the application that allows those over dial-up to get the application without the download delay.
This solution has worked great to date since the application itself is targeted as a single instance on the desktop. There is a web site where all of this data is archived for online access, but I would in no way use the same solution for this. I agree the file access would be preferable because it would scale better to the frequency and volume of requests being made for the images.
Hopefully this isn't too much babble, but I saw the topic and wanted to provide some my insights from a relatively successful small/medium scale application.
SQL Server 2008 offers a solution that has the best of both worlds : The filestream data type.
Manage it like a regular table and have the performance of the file system.
It depends on the number of images you are going to store and also their sizes. I have used databases to store images in the past and my experience has been fairly good.
IMO, Pros of using database to store images are,
A. You don't need FS structure to hold your images
B. Database indexes perform better than FS trees when more number of items are to be stored
C. Smartly tuned database perform good job at caching the query results
D. Backups are simple. It also works well if you have replication set up and content is delivered from a server near to user. In such cases, explicit synchronization is not required.
If your images are going to be small (say < 64k) and the storage engine of your db supports inline (in record) BLOBs, it improves performance further as no indirection is required (Locality of reference is achieved).
Storing images may be a bad idea when you are dealing with small number of huge sized images. Another problem with storing images in db is that, metadata like creation, modification dates must handled by your application.
I have recently created a PHP/MySQL app which stores PDFs/Word files in a MySQL table (as big as 40MB per file so far).
Pros:
Uploaded files are replicated to backup server along with everything else, no separate backup strategy is needed (peace of mind).
Setting up the web server is slightly simpler because I don't need to have an uploads/ folder and tell all my applications where it is.
I get to use transactions for edits to improve data integrity - I don't have to worry about orphaned and missing files
Cons:
mysqldump now takes a looooong time because there is 500MB of file data in one of the tables.
Overall not very memory/cpu efficient when compared to filesystem
I'd call my implementation a success, it takes care of backup requirements and simplifies the layout of the project. The performance is fine for the 20-30 people who use the app.
Im my experience I had to manage both situations: images stored in database and images on the file system with path stored in db.
The first solution, images in database, is somewhat "cleaner" as your data access layer will have to deal only with database objects; but this is good only when you have to deal with low numbers.
Obviously database access performance when you deal with binary large objects is degrading, and the database dimensions will grow a lot, causing again performance loss... and normally database space is much more expensive than file system space.
On the other hand having large binary objects stored in file system will cause you to have backup plans that have to consider both database and file system, and this can be an issue for some systems.
Another reason to go for file system is when you have to share your images data (or sounds, video, whatever) with third party access: in this days I'm developing a web app that uses images that have to be accessed from "outside" my web farm in such a way that a database access to retrieve binary data is simply impossible. So sometimes there are also design considerations that will drive you to a choice.
Consider also, when making this choice, if you have to deal with permission and authentication when accessing binary objects: these requisites normally can be solved in an easier way when data are stored in db.
I once worked on an image processing application. We stored the uploaded images in a directory that was something like /images/[today's date]/[id number]. But we also extracted the metadata (exif data) from the images and stored that in the database, along with a timestamp and such.
In a previous project i stored images on the filesystem, and that caused a lot of headaches with backups, replication, and the filesystem getting out of sync with the database.
In my latest project i'm storing images in the database, and caching them on the filesystem, and it works really well. I've had no problems so far.
Second the recommendation on file paths. I've worked on a couple of projects that needed to manage large-ish asset collections, and any attempts to store things directly in the DB resulted in pain and frustration long-term.
The only real "pro" I can think of regarding storing them in the DB is the potential for easy of individual image assets. If there are no file paths to use, and all images are streamed straight out of the DB, there's no danger of a user finding files they shouldn't have access to.
That seems like it would be better solved with an intermediary script pulling data from a web-inaccessible file store, though. So the DB storage isn't REALLY necessary.
The word on the street is that unless you are a database vendor trying to prove that your database can do it (like, let's say Microsoft boasting about Terraserver storing a bajillion images in SQL Server) it's not a very good idea. When the alternative - storing images on file servers and paths in the database is so much easier, why bother? Blob fields are kind of like the off-road capabilities of SUVs - most people don't use them, those who do usually get in trouble, and then there are those who do, but only for the fun of it.
Storing an image in the database still means that the image data ends up somewhere in the file system but obscured so that you cannot access it directly.
+ves:
database integrity
its easy to manage since you don't have to worry about keeping the filesystem in sync when an image is added or deleted
-ves:
performance penalty -- a database lookup is usually slower that a filesystem lookup
you cannot edit the image directly (crop, resize)
Both methods are common and practiced. Have a look at the advantages and disadvantages. Either way, you'll have to think about how to overcome the disadvantages. Storing in database usually means tweaking database parameters and implement some kind of caching. Using filesystem requires you to find some way of keeping filesystem+database in sync.

Resources