Database storage engine for images - database

My application uses a database for (among other things) storing scanned documents. These are usually JPEG images, and mostly just text.
I know it's recommended to use files for images and link to them from the database, but it's just easier to retrieve them this way. The application uses one single server and multiple clients, and storing files would waste a lot of hard drive space (overhead, sector allocations, etc.) and require the use of shared folders and mapped drives and stuff. Slow, and less secure.
Anyway, some of our customers have been using our application for a few years, and their databases have grown into the tens of gigabytes, mostly because of these images. At the moment they're stored in an InnoDB table (MySQL).
Question is: is there a better way?
The data itself is write-once, and not normally deleted (but theoretically possible), so something that is slow to store (within reason), unchangeable and fast to access would be perfect.
I'm thinking of making my own storage engine that would include compression, indexing, and caching that only allows two columns (bigint, blob) per table. Does something like this already exist?

MSSQL 2005 introduced file-streaming which allows you to store pointers in the database, but keep the blobs out of the database and in a directory (as they should be). The neat thing is that when you backup the database, the blobs get backed up as well. Best of both worlds. The directories are restricted access as well, typically only SQL itself gets access.
Store the images as files, keep the pointers in the database.

Related

Store images in database or on file system [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Storing Images in DB - Yea or Nay?
Is it faster and more reliable to store images in the file system or should I store them in a database?
Let's say the images will be no larger than 200 MB. The objective is fast, reliable access.
In general, how do people decide between storing files (e.g., images, PDFs) in the file system or in the database?
Personal opinion: I ALWAYS store images on the file system, and only store a filepath in the database. In many situations, databases are stored on fast (read: expensive storage, 15k RPM or SSD drives) storage. Images or other files, typically can be stored on slower (read: cheaper, larger drives, 7.2k RPM drives) storage.
I find this to be the best, since it allows for the database to remain small in size. In general, databases store "data" well. They can search and retrieve small bits of data fast. File Systems store "files" well, they are optimized to find and retrieve larger bits of data fast.
Obviously there are tradeoffs to both approaches, and there isn't going to be a one-size fits all; however, there may be some use cases where storing images in the database is a good thing, if they are all quite small, and you don't anticipate very many of them, and your database is on the same storage medium as your file share, then it probably makes sense to drop the images directly into the database.
As a side note, SQL Server 2008R2 has a FileStream field type, which can provide the best of both worlds, I have not used it yet, so I can't speak to how well it works.
Store files/images in the database if you require following:
Access Control
Versioning
Checkin/Check out
Searching based on metadata
That has been the design of major CMS like SharePoint has been.
However, if your content is much more static and not going to change over time , you can go with files ystem and enable optimizations/cache on the web server.
With the database approach, you only have one thing to connect to. If your users are distributed, that might be simpler. Note that if the images are not accessed too frequently, the efficiency issues might not matter.
First of all you must know that in databases, you can't store files bigger than i think 8 mb, leavinig that, so i think is better to store files in the system, and small images in the database

Best way storing binary or image files

What is the best way storing binary or image files?
Database System
File System
Would you please explain, why?
There is no real best way, just a bunch of trade offs.
Database Pros:
1. Much easier to deal with in a clustering environment.
2. No reliance on additional resources like a file server.
3. No need to set up "sync" operations in load balanced environment.
4. Backups automatically include the files.
Database Cons:
1. Size / Growth of the database.
2. Depending on DB Server and your language, it might be difficult to put in and retrieve.
3. Speed / Performance.
4. Depending on DB server, you have to virus scan the files at the time of upload and export.
File Pros:
1. For single web/single db server installations, it's fast.
2. Well understood ability to manipulate files. In other words, it's easy to move the files to a different location if you run out of disk space.
3. Can virus scan when the files are "at rest". This allows you to take advantage of scanner updates.
File Cons:
1. In multi web server environments, requires an accessible share. Which should also be clustered for failover.
2. Additional security requirements to handle file access. You have to be careful that the web server and/or share does not allow file execution.
3. Transactional Backups have to take the file system into account.
The above said, SQL 2008 has a thing called FILESTREAM which combines both worlds. You upload to the database and it transparently stores the files in a directory on disk. When retrieving you can either pull from the database; or you can go direct to where it lives on the file system.
Pros of Storing binary files in a DB:
Some decrease in complexity since the
data access layer of your system need
only interface to a DB and not a DB +
file system.
You can secure your files using the
same comprehensive permissions-based
security that protects the rest of
the database.
Your binary files are protected
against loss along with the rest of
your data by way of database backups.
No separate filesystem backup system
required.
Cons of Storing binary files in a DB:
Depending on size/number of files,
can take up significant space
potentially decreasing performance
(dependening on whether your binary
files are stored in a table that is
queried for other content often or
not) and making for longer backup
times.
Pros of Storing binary files in file system:
This is what files systems are good
at. File systems will handle
defragmenting well and retrieving
files (say to stream a video file to
through a web server) will likely be
faster that with a db.
Cons of Storing binary files in file system:
Slightly more complex data access
layer. Needs its own backup system.
Need to consider referential
integrity issues (e.g. deleted
pointer in database will need to
result in deletion of file so as to
not have 'orphaned' files in the
filesystem).
On balance I would use the file system. In the past, using SQL Server 2005 I would simply store a 'pointer' in db tables to the binary file. The pointer would typically be a GUID.
Here's the good news if you are using SQL Server 2008 (and maybe others - I don't know): there is built in support for a hybrid solution with the new VARBINARY(MAX) FILESTREAM data type. These behave logically like VARBINARY(MAX) columns but behind the scenes, SQL Sever 2008 will store the data in the file system.
There is no best way.
What? You need more info?
There are three ways I know of... One, as byte arrays in the database. Two, as a file with the path stored in the database. Three, as a hybrid (only if DB allows, such as with the FileStream type).
The first is pretty cool because you can query and get your data in the same step. Which is always nice. But what happens when you have LOTS of files? Your database gets big. Now you have to deal with big database maintenance issues, such as the trials of backing up databases that are over a terabyte. And what happens if you need outside access to the files? Such as type conversions, mass manipulation (resize all images, appy watermarks, etc)? Its much harder to do than when you have files.
The second is great for somewhat large numbers of files. You can store them on NAS devices, back them up incrementally, keep your database small, etc etc. But then, when you have LOTS of files, you start running into limitations in the file system. And if you spread them over the network, you get latency issues, user rights issues, etc. Also, I take pity on you if your network gets rearranged. Now you have to run massive updates on the database to change your file locations, and I pity you if something screws up.
Then there's the hybrid option. Its almost perfect--you can get your files via your query, yet your database isn't massive. Does this solve all your problems? Probably not. Your database isn't portable anymore; you're locked to a particular DBMS. And this stuff isn't mature yet, so you get to enjoy the teething process. And who says this solves all the different issues?
Fact is, there is no "best" way. You just have to determine your requirements, make the best choice depending on them, and then suck it up when you figure out you did the wrong thing.
I like storing images in a database. It makes it easy to switch from development to production just by changing databases (no copying files). And the database can keep track of properties like created/modified dates just as well as the File System.
I personally never store images IN the database for performance purposes. In all of my sites I have a "/files" folder where I can put sub-folders based on what kind of images i'm going to store. Then I name them on convention.
For example if i'm storing a profile picture, I'll store it in "/files/profile/" as profile_2.jpg (if 2 is the ID of the account). I always make it a rule to resize the image on the server to the largest size I'll need, and then smaller ones if I need them. So I'd save "profile_2_thumb.jpg" and "profile_2_full.jpg".
By creating rules for yourself you can simply in the code call img src="/files/profile__thumb.jpg"
Thats how I do it anyway!

Storing Images : DB or File System -

I read some post in this regard but I still don't understand what's the best solution in my case.
I'm start writing a new webApp and the backend is going to provide about 1-10 million images. (average size 200-500kB for a single image)
My site will provide content and images to 100-1000 users at the same time.
I'd like also to keep Provider costs as low as possible (but this is a secondary requirement).
I'm thinking that File System space is less expensive if compared to the cost of DB size.
Personally I like the idea of having all my images in the DB but any suggestion will be really appreciated :)
Do you think that in my case the DB approach is the right choice?
Putting all of those images in your database will make it very, very large. This means your DB engine will be busy caching all those images (a task it's not really designed for) when it could be caching hot application data instead.
Leave the file caching up to the OS and/or your reverse proxy - they'll be better at it.
Some other reasons to store images on the file system:
Image servers can run even when the database is busy or down.
File systems are made to store files and are quite efficient at it.
Dumping data in your database means slower backups and other operations.
No server-side coded needed to serve up an image, just plain old IIS/Apache.
You can scale up faster with dirt-cheap web servers, or potentially to a CDN.
You can perform related work (generating thumbnails, etc.) without involving the database.
Your database server can keep more of the "real" table data in memory, which is where you get your database speed for queries. If it uses its precious memory to keep image files cached, that doesn't buy you hardly anything speed-wise versus having more of the photo index in memory.
Most large sites use the filesystem.
See Store pictures as files or in the database for a web app?
When dealing with binary objects, follow a document centric approach for architecture, and not store documents like pdf's and images in the database, you will eventually have to refactor it out when you start seeing all kinds of performance issues with your database. Just store the file on the file system and have the path inside a table of your databse. There is also a physical limitation on the size of the data type that you will use to serialize and save it in the database. Just store it on the file system and access it.
Your first sentence says that you've read some posts on the subject, so I won't bother putting in links to articles that cover this. In my experience, and based on what you've posted as far as the number of images and sizes of the images, you're going to pay dearly in DB performance if you store them in the DB. I'd store them on the file system.
What database are you using? MS SQL Server 2008 provides FILESTREAM storage
allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system. It covers choices for BLOB storage, configuring Windows and SQL Server for using FILESTREAM data, considerations for combining FILESTREAM with other features, and implementation details such as partitioning and performance.
details
We use FileNet, a server optimized for imaging. It's very expensive. A cheaper solution is to use a file server.
Please don't consider storing large files on a database server.
As others have mentioned, store references to the large files in the database.

Store images(jpg,gif,png) in filesystem or DB? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
Which is more secure: filesystem or database?
User images - database vs. filesystem storage
store image in database or in a system file ?
I can't decide which one I should follow. Can you guys give some opinions? Should I store my images in the file-system or DB? (I would like to prevent others from stealing my images)
When you answer this question, please include comparisons of the security, performances etc.
Thanks.
Exact Duplicate: User Images: Database or filesystem storage?
Exact Duplicate: Storing images in database: Yea or nay?
Exact Duplicate: Should I store my images in the database or folders?
Exact Duplicate: Would you store binary data in database or folders?
Exact Duplicate: Store pictures as files or or the database for a web app?
Exact Duplicate: Storing a small number of images: blob or fs?
Exact Duplicate: store image in filesystem or database?
Moving your images into a database and writing the code to extract the image may be more hassle than it's worth. It will all go back to the business requirements surrounding the need to protect the images, or the requirement for performance.
I'd suggest sticking to the tried and true system of storing the filepath or directory location in the DB, and keeping the files on disk. Here's why:
A filesystem is easier to maintain. Some thought has to be put into the structure and organization of the images. i.e. a directory for each customer, and a subdirectory for each [Attribute-X] and another subfolder for each [Attribute-Y]. Keeping too many images in one directory will end up slowing down the file access (i.e. hundreds of thousands)
If the idea of storing in a DB is a counter-measure against filesystem loss, (i.e. a disk goes down, or a directory is deleted by accident), then I'd counter with the suggestions that when you use source control, it's no problem to retrieve any lost/missing/delete files.
If you ever need to scale and move to a content distribution scenario, you'd have to move out back to the filesystem or perform a big extract to push out to the providers.
It also goes with the saying: "keep structured data in a database". Microsoft has an article on Managing Unstructured Data.
If security is an issue to be addressed, the filesystem has a whole structure with ACLs. Reinventing the wheel on security may be out of scope in the business requirements.
A large amount of discussion for this topic is also found at:
Question 3748
Question 561447
Having your images stored as varbinary or a blob of some kind (depending on your platform), I'd suggest it's more hassle than it's worth. The effort that you'll need to extend means more code that you'll have to maintain, unit test, and defend against defects.
If your environment can support SQL Server 2008, you can have the best of both worlds with their new FileStream datatype in SQL 2008.
An MSDN article is touting the FileStream datatype in SQL 2008 as high performance.
SQL Skills has a great article with some SQL 2008 Filestream performance measurements.
Here is an article addressing varbinary vs. FileStream and performance of both datatypes.
If you are a SQL Mag subscriber, you can see a great article at SQL Mag on SQL 2008 FileStream.
Microsoft Research article:To Blob or Not To Blob
I'd love to see studies in real-world scenarios with large user bases like Flickr or Facebook.
Again, it all goes back to your business requirements. Good luck!
It doesn't matter where you store them in terms of preventing "theft". If you deliver the bytestream to a browser, it can be intercepted and saved by anyone. There's no way to prevent that (I'm assuming you're talking about delivering the images to a browser).
If you're just talking about securing images on the machine itself, it also doesn't matter. The operating system can be locked down as securely as the database, preventing anyone from getting at the images.
In terms of performance (when presenting images to a browser), I personally think it'll be faster serving from a filesystem. You have to present the images in separate HTTP transactions anyway, which would almost certainly require multiple trips to the database. I suspect (although I have no hard data) that it would be faster to store the image URLs in the database which point to static images on the file system - then the act of getting an image is a simple file open by the web server rather than running code to access the database.
You're probably going to have to get a whole ton of "but the filesystem is a DB" answers. This isn't one of them.
The filesystem option depends on many factors, for example, does the server have write premissisons to the directory? (And yes, I have seen servers where apache couldn't write to DocumentRoot.)
If you want 100% cross-compatibility across platforms, then the Database option is the best way to go. It'll also let you store image-related metadata such as a user ID, the date uploaded, and even alternate versions of the same image (such as cached thumbnails).
On the down side, you need to write custom code to read images from the DB and serve them to the user, while any standard web server would just let you send the images as they are.
When it comes to the bottom line, though, you should just choose the option that fits your project, and your server configuration.
Store them in FileSystem, store the file path in the DB.
Of course you can make this scalable and distributed, you just need to keep the images dirs synched between them (for JackM). Or use a shared storage connected to multiple web frontend servers.
Anyway, the stealing part was covered in your other question and is basically impossible. People that can access the images will always be able (with more or less work) to save them locally ... even if it means "print-screen" and paste into photoshop and saving.
It depends on how many images you expect to handle, and what you have to do with them. I have an application that needs to temporarily store between 100K and several million images a day. I write them in 2gb contiguous blocks to the filesystem, storing the image identifier, filename, beginning position and length in a database.
For retrieval I just keep the indices in memory, the files open read only and seek back and forth to fetch them. I could find no faster way to do it. It is far faster than finding and loading an individual file. Windows can get pretty slow once you've dumped that many individual files into the filesystem.
I suppose it could contribute to security, since the images would be somewhat difficult to retrieve without the index data.
For scalability, it would not take long to put a web service in front of it and distribute it across several machines.
For a web application I look after, we store the images in the database, but make sure they're well cached in the filesystem.
A request from one of the web server frontends for an image requires a quick memcache
check to see if the image has changed in the database and, if not, serves it from the filesystem. If it has changed it fetches it from the central database and puts a copy in the filesystem.
This gives most of the advantages of storing them in the filesystem while keeping some
of the advantages of database - we only have one location for all the data which makes
backups easier and means we can scale across quite a few machines without issue. It
also doesn't put excessive load on the database.
If you want your application to be scalable, do not use a file system on the actual web servers. You can store the location of files in a persistent datastore such as a database or a NoSQL solution.
For an AWS solution to this for example you should:
Store the images on S3
Save the S3 key to the database
Serve yourimages on S3 through cloudfront (Amazon CDN)
Saving your files to the DB will provide a some security in terms that another user would need access to the DB in order to retrieve the files, but, as far as efficiency goes, a sql query for every image loaded, leaving all the load to the server side. Do yourself a favor and find a way to protect your images inside the filesystem, they are many.
The biggest out-of-the-box advantage of a database is that it can be accessed from anywhere on the network, which is essential if you have more than one server.
If you want to access a filesystem from other machines you need to set up some kind of sharing scheme, which could add complexity and could have synchronization issues.
If you do go with storing images in the database, you can still use local (unshared) disks as caches to reduce the strain on the DB. You'd just need to query the DB for a timestamp or something to see if the file is still up-to-date (if you allow files that can change).
If the issue is scalability you'll take a massive loss by moving things into the database. You can round-robin webservers via DNS but adding the overhead of both a CGI process and a database lookup to each image is madness. It also makes your database that much harder to maintain and your images that much harder to process.
As to the other part of your question, you can secure access to a file as easily as a database record, but at the end of the day as long as there is an URL that returns a file you have limited options to prevent that URL being used (at least without making cookies and/or javascript compulsory).
Store files in a file server, and store primitive data in a database. While file servers (especially HTTP-based) scale well, database servers do not. Don't mix them together.
If you need to edit, manage, or otherwise maintain the images, you should store it outside the database.
Also, the filesystem has many security features that a database does not.
The database is good for storing pointers (file paths) to the actual data.

Storing Images in DB - Yea or Nay?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB.
What do you think are the pros/cons?
I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.
There are a couple of issues:
database storage is usually more expensive than file system storage
you can super-accelerate file system access with standard off the shelf products
for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
things like web servers, etc, need no special coding or processing to access images in the file system
databases win out where transactional integrity between the image and metadata are important.
it is more complex to manage integrity between db metadata and file system data
it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem
As with most issues, it's not as simple as it sounds. There are cases where it would make sense to store the images in the database.
You are storing images that are
changing dynamically, say invoices and you wanted
to get an invoice as it was on 1 Jan
2007?
The government wants you to maintain 6 years of history
Images stored in the database do not require a different backup strategy. Images stored on filesystem do
It is easier to control access to the images if they are in a database. Idle admins can access any folder on disk. It takes a really determined admin to go snooping in a database to extract the images
On the other hand there are problems associated
Require additional code to extract
and stream the images
Latency may be
slower than direct file access
Heavier load on the database server
File store. Facebook engineers had a great talk about it. One take away was to know the practical limit of files in a directory.
Needle in a Haystack: Efficient Storage of Billions of Photos
This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.
FileStream solves most of the problems around storing the files in the DB:
The Blobs are actually stored as files in a folder.
The Blobs can be accessed using either a database connection or over the filesystem.
Backups are integrated.
Migration "just works".
However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.
From the MSDN Article:
Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.
File paths in the DB is definitely the way to go - I've heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much.
In my experience, sometimes the simplest solution is to name the images according to the primary key. So it's easy to find the image that belongs to a particular record, and vice versa. But at the same time you're not storing anything about the image in the database.
The trick here is to not become a zealot.
One thing to note here is that no one in the pro file system camp has listed a particular file system. Does this mean that everything from FAT16 to ZFS handily beats every database?
No.
The truth is that many databases beat many files systems, even when we're only talking about raw speed.
The correct course of action is to make the right decision for your precise scenario, and to do that, you'll need some numbers and some use case estimates.
In places where you MUST guarantee referential integrity and ACID compliance, storing images in the database is required.
You cannot transactionaly guarantee that the image and the meta-data about that image stored in the database refer to the same file. In other words, it is impossible to guarantee that the file on the filesystem is only ever altered at the same time and in the same transaction as the metadata.
As others have said SQL 2008 comes with a Filestream type that allows you to store a filename or identifier as a pointer in the db and automatically stores the image on your filesystem which is a great scenario.
If you're on an older database, then I'd say that if you're storing it as blob data, then you're really not going to get anything out of the database in the way of searching features, so it's probably best to store an address on a filesystem, and store the image that way.
That way you also save space on your filesystem, as you are only going to save the exact amount of space, or even compacted space on the filesystem.
Also, you could decide to save with some structure or elements that allow you to browse the raw images in your filesystem without any db hits, or transfer the files in bulk to another system, hard drive, S3 or another scenario - updating the location in your program, but keep the structure, again without much of a hit trying to bring the images out of your db when trying to increase storage.
Probably, it would also allow you to throw some caching element, based on commonly hit image urls into your web engine/program, so you're saving yourself there as well.
Small static images (not more than a couple of megs) that are not frequently edited, should be stored in the database. This method has several benefits including easier portability (images are transferred with the database), easier backup/restore (images are backed up with the database) and better scalability (a file system folder with thousands of little thumbnail files sounds like a scalability nightmare to me).
Serving up images from a database is easy, just implement an http handler that serves the byte array returned from the DB server as a binary stream.
Here's an interesting white paper on the topic.
To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
The answer is "It depends." Certainly it would depend upon the database server and its approach to blob storage. It also depends on the type of data being stored in blobs, as well as how that data is to be accessed.
Smaller sized files can be efficiently stored and delivered using the database as the storage mechanism. Larger files would probably be best stored using the file system, especially if they will be modified/updated often. (blob fragmentation becomes an issue in regards to performance.)
Here's an additional point to keep in mind. One of the reasons supporting the use of a database to store the blobs is ACID compliance. However, the approach that the testers used in the white paper, (Bulk Logged option of SQL Server,) which doubled SQL Server throughput, effectively changed the 'D' in ACID to a 'd,' as the blob data was not logged with the initial writes for the transaction. Therefore, if full ACID compliance is an important requirement for your system, halve the SQL Server throughput figures for database writes when comparing file I/O to database blob I/O.
One thing that I haven't seen anyone mention yet but is definitely worth noting is that there are issues associated with storing large amounts of images in most filesystems too. For example if you take the approach mentioned above and name each image file after the primary key, on most filesystems you will run into issues if you try to put all of the images in one big directory once you reach a very large number of images (e.g. in the hundreds of thousands or millions).
Once common solution to this is to hash them out into a balanced tree of subdirectories.
Something nobody has mentioned is that the DB guarantees atomic actions, transactional integrity and deals with concurrency. Even referentially integrity is out of the window with a filesystem - so how do you know your file names are really still correct?
If you have your images in a file-system and someone is reading the file as you're writing a new version or even deleting the file - what happens?
We use blobs because they're easier to manage (backup, replication, transfer) too. They work well for us.
The problem with storing only filepaths to images in a database is that the database's integrity can no longer be forced.
If the actual image pointed to by the filepath becomes unavailable, the database unwittingly has an integrity error.
Given that the images are the actual data being sought after, and that they can be managed easier (the images won't suddenly disappear) in one integrated database rather than having to interface with some kind of filesystem (if the filesystem is independently accessed, the images MIGHT suddenly "disappear"), I'd go for storing them directly as a BLOB or such.
At a company where I used to work we stored 155 million images in an Oracle 8i (then 9i) database. 7.5TB worth.
Normally, I'm storngly against taking the most expensive and hardest to scale part of your infrastructure (the database) and putting all load into it. On the other hand: It greatly simplifies backup strategy, especially when you have multiple web servers and need to somehow keep the data synchronized.
Like most other things, It depends on the expected size and Budget.
We have implemented a document imaging system that stores all it's images in SQL2005 blob fields. There are several hundred GB at the moment and we are seeing excellent response times and little or no performance degradation. In addition, fr regulatory compliance, we have a middleware layer that archives newly posted documents to an optical jukebox system which exposes them as a standard NTFS file system.
We've been very pleased with the results, particularly with respect to:
Ease of Replication and Backup
Ability to easily implement a document versioning system
If this is web-based application then there could be advantages to storing the images on a third-party storage delivery network, such as Amazon's S3 or the Nirvanix platform.
Assumption: Application is web enabled/web based
I'm surprised no one has really mentioned this ... delegate it out to others who are specialists -> use a 3rd party image/file hosting provider.
Store your files on a paid online service like
Amazon S3
Moso Cloud Storage
Another StackOverflow threads talking about this here.
This thread explains why you should use a 3rd party hosting provider.
It's so worth it. They store it efficiently. No bandwith getting uploaded from your servers to client requests, etc.
If you're not on SQL Server 2008 and you have some solid reasons for putting specific image files in the database, then you could take the "both" approach and use the file system as a temporary cache and use the database as the master repository.
For example, your business logic can check if an image file exists on disc before serving it up, retrieving from the database when necessary. This buys you the capability of multiple web servers and fewer sync issues.
I'm not sure how much of a "real world" example this is, but I currently have an application out there that stores details for a trading card game, including the images for the cards. Granted the record count for the database is only 2851 records to date, but given the fact that certain cards have are released multiple times and have alternate artwork, it was actually more efficient sizewise to scan the "primary square" of the artwork and then dynamically generate the border and miscellaneous effects for the card when requested.
The original creator of this image library created a data access class that renders the image based on the request, and it does it quite fast for viewing and individual card.
This also eases deployment/updates when new cards are released, instead of zipping up an entire folder of images and sending those down the pipe and ensuring the proper folder structure is created, I simply update the database and have the user download it again. This currently sizes up to 56MB, which isn't great, but I'm working on an incremental update feature for future releases. In addition, there is a "no images" version of the application that allows those over dial-up to get the application without the download delay.
This solution has worked great to date since the application itself is targeted as a single instance on the desktop. There is a web site where all of this data is archived for online access, but I would in no way use the same solution for this. I agree the file access would be preferable because it would scale better to the frequency and volume of requests being made for the images.
Hopefully this isn't too much babble, but I saw the topic and wanted to provide some my insights from a relatively successful small/medium scale application.
SQL Server 2008 offers a solution that has the best of both worlds : The filestream data type.
Manage it like a regular table and have the performance of the file system.
It depends on the number of images you are going to store and also their sizes. I have used databases to store images in the past and my experience has been fairly good.
IMO, Pros of using database to store images are,
A. You don't need FS structure to hold your images
B. Database indexes perform better than FS trees when more number of items are to be stored
C. Smartly tuned database perform good job at caching the query results
D. Backups are simple. It also works well if you have replication set up and content is delivered from a server near to user. In such cases, explicit synchronization is not required.
If your images are going to be small (say < 64k) and the storage engine of your db supports inline (in record) BLOBs, it improves performance further as no indirection is required (Locality of reference is achieved).
Storing images may be a bad idea when you are dealing with small number of huge sized images. Another problem with storing images in db is that, metadata like creation, modification dates must handled by your application.
I have recently created a PHP/MySQL app which stores PDFs/Word files in a MySQL table (as big as 40MB per file so far).
Pros:
Uploaded files are replicated to backup server along with everything else, no separate backup strategy is needed (peace of mind).
Setting up the web server is slightly simpler because I don't need to have an uploads/ folder and tell all my applications where it is.
I get to use transactions for edits to improve data integrity - I don't have to worry about orphaned and missing files
Cons:
mysqldump now takes a looooong time because there is 500MB of file data in one of the tables.
Overall not very memory/cpu efficient when compared to filesystem
I'd call my implementation a success, it takes care of backup requirements and simplifies the layout of the project. The performance is fine for the 20-30 people who use the app.
Im my experience I had to manage both situations: images stored in database and images on the file system with path stored in db.
The first solution, images in database, is somewhat "cleaner" as your data access layer will have to deal only with database objects; but this is good only when you have to deal with low numbers.
Obviously database access performance when you deal with binary large objects is degrading, and the database dimensions will grow a lot, causing again performance loss... and normally database space is much more expensive than file system space.
On the other hand having large binary objects stored in file system will cause you to have backup plans that have to consider both database and file system, and this can be an issue for some systems.
Another reason to go for file system is when you have to share your images data (or sounds, video, whatever) with third party access: in this days I'm developing a web app that uses images that have to be accessed from "outside" my web farm in such a way that a database access to retrieve binary data is simply impossible. So sometimes there are also design considerations that will drive you to a choice.
Consider also, when making this choice, if you have to deal with permission and authentication when accessing binary objects: these requisites normally can be solved in an easier way when data are stored in db.
I once worked on an image processing application. We stored the uploaded images in a directory that was something like /images/[today's date]/[id number]. But we also extracted the metadata (exif data) from the images and stored that in the database, along with a timestamp and such.
In a previous project i stored images on the filesystem, and that caused a lot of headaches with backups, replication, and the filesystem getting out of sync with the database.
In my latest project i'm storing images in the database, and caching them on the filesystem, and it works really well. I've had no problems so far.
Second the recommendation on file paths. I've worked on a couple of projects that needed to manage large-ish asset collections, and any attempts to store things directly in the DB resulted in pain and frustration long-term.
The only real "pro" I can think of regarding storing them in the DB is the potential for easy of individual image assets. If there are no file paths to use, and all images are streamed straight out of the DB, there's no danger of a user finding files they shouldn't have access to.
That seems like it would be better solved with an intermediary script pulling data from a web-inaccessible file store, though. So the DB storage isn't REALLY necessary.
The word on the street is that unless you are a database vendor trying to prove that your database can do it (like, let's say Microsoft boasting about Terraserver storing a bajillion images in SQL Server) it's not a very good idea. When the alternative - storing images on file servers and paths in the database is so much easier, why bother? Blob fields are kind of like the off-road capabilities of SUVs - most people don't use them, those who do usually get in trouble, and then there are those who do, but only for the fun of it.
Storing an image in the database still means that the image data ends up somewhere in the file system but obscured so that you cannot access it directly.
+ves:
database integrity
its easy to manage since you don't have to worry about keeping the filesystem in sync when an image is added or deleted
-ves:
performance penalty -- a database lookup is usually slower that a filesystem lookup
you cannot edit the image directly (crop, resize)
Both methods are common and practiced. Have a look at the advantages and disadvantages. Either way, you'll have to think about how to overcome the disadvantages. Storing in database usually means tweaking database parameters and implement some kind of caching. Using filesystem requires you to find some way of keeping filesystem+database in sync.

Resources