Save image files using hibernate - database

I searched for this but only found ways to save the image as a blob to the database.
What I would like to do is to save the image location in the database and then retrieve the file automatically from the location instead of saving it as a blob into the database.
Does this make any sense at all? Or is it better and faster and cheaper to just save the image files into the database as a blob?
Depending on the locale i might have to get a different picture.
Thanks for your help!

I have experience with both (saving an image as a blob in a RDBMS) and only storing the link to it in a filesystem/url manner. What I have come to realize is the first approach is plain no scalable.
Here is a rather biased list of things about each approach.
Approach 1. Saving images as blobs:
Cons:
When the number of images increase, so does your database size and
you are limited to the filesystem your RDBMS engine runs on.
When you want to retrieve a large number of these blobs, and if they
are big in size, you waste IO/bandwdith and put a strain on your
RDBMS engine. You ideally want it to have short queries that execute
fast and move a little amount of bytes around. You just can't get
that if you save the data as a BLOB in your relational database.
While some might argue that for repeatable queries caching will
help, I will argue that I if those huge chunks of data weren't there
in the first place, I wouldn't have to put them in cache.
There is no reliable way for a db admin/ content manager to easily
retrieve the contents a blob that is in db, for example, to verify
if an image is broken. He would have to connect to the db and
extract the BLOB bytes in some format and then view it. Or
alternatively you can build some page to do that for him but that
would be a badly put together gimmicks in my honest opinion.
Pros:
You don't have to rely on file systems being available or external
systems on which you host your images to be available. You would
probably write a bit less code and you will have more control over
your code since all the stuff you want is in your RDBMS.
Approach 2. Saving images as a link to a filesystem/urls
Pros:
Greatly alleviates performance strain on your RDBMS engine.
If you store the images as links, a system admin/ content manager
can easily check them by just copying the link in a browser and
verifying it renders properly.
If you don't use an external image hosting service but rather an
internal, you still retain a great amount of control while having
the possibility in future to add more image hosting servers/
filesystems.
If you have a large amount of pictures being retrieved and they are
not hosted by you, you can distribute a lot of network load thus
making load times snappier.
Cons:
Things will be a bit decentralized adding some complexity to your application. If you are using an external hosting service, it might be down and you can have no control over it.
In conclusion, I wholeheartedly recommend using the second approach.

In general I agree with #baba's answer.
However it really depends on the number of and sizes of the images. If all the images are small thumbnails then I would store them in the database, so that everything is in one place.
It's also possible to do both...as long as the storage space is available for both the database and filesystem. This gives you the best of both worlds and a built-in backup.

Related

Design Solution For Storing-Fetching Images

This is a design doubt am facing, I have a collection of 1500 images which are to be displayed on an asp.net page, the images to be displayed differ from one page to another, the count of these images will increase in the time to come,
a.) is it a good idea to have the images on the database, but the round trip time to fetch the images from the database might be high.
b.) is it good to have all the images on a directory, and have a virtual file system over it, and the application will access the images from the directory
Do we have in particular any design strategy in a traditional database for fetching images with the least round trip time, does any solution other than usage of a traditional database exists?
Edit 1:
Each image is replaced by its new entry for every 12 hours, so having them on the database might not be a good idea as far I can think of, but how better will it be to use a data-store and index these images?
Edit 2:
Yes, we are planning to run the application on a cluster. And if we try to use a datastore (if it is a good option to go with) then is it compatible with C# & ASP.NET?
ps: I use SQL Server to store these images.
All of the previous comments are really good... In the absence of very specific requirements we have to make broad generalizations to illustrate your options. Here are a few examples.
If raw speed is what you need, then flat files are a clear winner. Whether your using Apache or IIS, they're both optimized to serve static file-based content very fast. High performance sites all know this, and will store much of their content with dynamic handling in mind but will then "publish" select pieces of their dynamic content; in static versions; to their web farm on a periodic or event-driven basis. Although it requires a bit of orchestration, it can be done cheaply and can really reduce the load of your database server, backend network, etc. As a simple example, publish to a folder with the root of that structure being dynamically assessed. when you're ready to publish updates, write a new folder, and then change the root path. No down time, cheap, and easy. On a related note, pulling all this information from a backend store will require you to load these things into memory. This will ultimately translate to more time in Garbage Collection and consequently will mean your application is slower, even if you're using multi-processor/core gardening.
If you need fine-grained control over how images are organized/exposed, then folders may not be the most appropriate. If for example, you need to organize an individual users images and you need to track a lot of meta data around the images then a database may be a good fit. With that said, your database team will probably hate you for it, because this presents a number of challenges from a database management perspective. Additionally, if you're using an ORM you may have some trouble making this work and may find your memory footprint grows to unacceptable levels due to hidden proxy objects, second-level caching, etc. This can all be mitigated so just watch out and make sure you profile your application. With that said, a structured store (like a DB) is more ideal for this use case.
Considering security... Depending on what these images represent, flat files inevitably lead to concerns about canonicalization attacks, brute-force enumeration of browsable folder structures, replaying cookies, urls, viewstate, etc. If you're using a custom Role-based or Claims-based security model you may find using flat files becomes somewhat painful since you'll have to map filesystem security constraints to logical/contextual security constraints. Issues like these often lead me to favoring a structured store.
The aformentioned cache idea is a good one and can help create some middle-ground with respect to how often you actually hit your database, although it will not help with concerns related to memory consumption, GC, etc... You could employ built-in caching mechanisms although a Cache/Grid that supports backing stores would be much better if you can afford it (Ex. NCache, ScaleOut, etc.). These provide nice scalability/redundency, and can also be used to offload storage of session state, viewstate, and a lot more.
Hope this helps.
You basically have 2 options
1) Store the binary in the database. VARBINARY(MAX) field will be a good choice of datatype.
2) Store the path to the image stored on disk in the database. NVARCHAR(MAX) will be a good choice for datatype.
There are of course pro's and con's of both solutions. Without knowing more about your requirements Its hard to advise which is the best way.
I prefer not to store images in the database - Instead just store a link(path/filename/id etc) to the correct image.
Then if you implement a HttpHandler to serve up the images, you can store them in whatever location you like. Heres a very basic implementation:
public class myPhototHandler: IHttpHandler
{
public bool IsReusable {
get { return true; }
}
public void ProcessRequest(System.Web.HttpContext context)
{
if (Context.User.Identity.IsAuthenticated) {
var filename = context.Request.QueryString("f") ?? String.Empty;
string completePath = context.Server.MapPath(string.Format("~/App_Data/Photos/{0}", filename));
context.Response.ContentType = "image/jpeg";
context.Response.WriteFile(completePath);
}
}
}
For a great resource on setting up a handler check out this blog post and the other related posts.
I wouldn't overcomplicate your solution. The downside to storing images in a database is the database bloat and storage requirements for backups, especially if the images are only good for 12 hrs. If in doubt, keep it simple, so when requirements change, you haven't invested that much time anyways.
This is how I did it on my site.
Store the images in a folder.
If you want control over the filename the user sees, or employ conditional logic, use an HttpHandler to serve up the image, otherwise just use its full path and filename in an img tag.
If you are talking high-volume mega site, perhaps consider using a content delivery network.
Have you considered caching your images to mitigate the round-trip time to SQL server? Caching might be appropriate at the browser (via HTTP Headers) and/or the HTTP handler serving the image (via System.Web.Caching).
Storing images in SQL server can be convenient because you don't have to worry about maintaining pointers to the file system. However, the size of your database will obviously be much larger which can make backups and maintenance more complex. You might consider using a different file group within your database for your image tables, or a separate database altogether, so that you can maintain row data separate from image data.
Using SQL server also means you'll have easy options for concurrency control, partitioning and replication, should they be appropriate to your application.

Storing Images : DB or File System -

I read some post in this regard but I still don't understand what's the best solution in my case.
I'm start writing a new webApp and the backend is going to provide about 1-10 million images. (average size 200-500kB for a single image)
My site will provide content and images to 100-1000 users at the same time.
I'd like also to keep Provider costs as low as possible (but this is a secondary requirement).
I'm thinking that File System space is less expensive if compared to the cost of DB size.
Personally I like the idea of having all my images in the DB but any suggestion will be really appreciated :)
Do you think that in my case the DB approach is the right choice?
Putting all of those images in your database will make it very, very large. This means your DB engine will be busy caching all those images (a task it's not really designed for) when it could be caching hot application data instead.
Leave the file caching up to the OS and/or your reverse proxy - they'll be better at it.
Some other reasons to store images on the file system:
Image servers can run even when the database is busy or down.
File systems are made to store files and are quite efficient at it.
Dumping data in your database means slower backups and other operations.
No server-side coded needed to serve up an image, just plain old IIS/Apache.
You can scale up faster with dirt-cheap web servers, or potentially to a CDN.
You can perform related work (generating thumbnails, etc.) without involving the database.
Your database server can keep more of the "real" table data in memory, which is where you get your database speed for queries. If it uses its precious memory to keep image files cached, that doesn't buy you hardly anything speed-wise versus having more of the photo index in memory.
Most large sites use the filesystem.
See Store pictures as files or in the database for a web app?
When dealing with binary objects, follow a document centric approach for architecture, and not store documents like pdf's and images in the database, you will eventually have to refactor it out when you start seeing all kinds of performance issues with your database. Just store the file on the file system and have the path inside a table of your databse. There is also a physical limitation on the size of the data type that you will use to serialize and save it in the database. Just store it on the file system and access it.
Your first sentence says that you've read some posts on the subject, so I won't bother putting in links to articles that cover this. In my experience, and based on what you've posted as far as the number of images and sizes of the images, you're going to pay dearly in DB performance if you store them in the DB. I'd store them on the file system.
What database are you using? MS SQL Server 2008 provides FILESTREAM storage
allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system. It covers choices for BLOB storage, configuring Windows and SQL Server for using FILESTREAM data, considerations for combining FILESTREAM with other features, and implementation details such as partitioning and performance.
details
We use FileNet, a server optimized for imaging. It's very expensive. A cheaper solution is to use a file server.
Please don't consider storing large files on a database server.
As others have mentioned, store references to the large files in the database.

Saving images: files or blobs?

When you save your images (supose you have lots of them) do you store then as blobs in your Database, or as files? Why?
Duplicate of: Storing Images in DB - Yea or Nay?
I usually go with storing them as files, and store the path in the database. To me, it's a much easier and more natural approach than pushing them into the database as blobs.
One argument for storing them in the database: much easier to do full backups, but that depends on your needs. If you need to be able to easily take full snapshots of your database (including the images), then storing them as blobs in the database is probably the way to go. Otherwise you have to pair your database backup with a file backup, and somehow try to associate the two, so that if you have to do a restore, you know which pair to restore.
It depends on the size of the image.
Microsoft Research has an interesting document on the subject
I've tried to use the db (SQL Server and MySQL) to store medium (< 5mb) files, and what I got was tons of trouble.
1) Some DBs (SQL Server Express) have size limits;
2) Some DBs (MySQL) become mortally slow;
3) When you have to display a list of object, if you inadvertedly do SELECT * FROM table, tons of data will try to go up and down from the db, resulting in a deadly slow response or memory fail;
4) Some frontends (ruby ActiveRecord) have very big troubles handling blobs.
Just use files. Don't store them all in the same directory, use some technique to put them on several dirs (for instance, you could use last two chars of a GUID or last two digits of an int id) and then store the path on db.
The performance hit of a database server is a moot issue. If you need the performance benefits of a file system, you simply cache it there on the first request. Subsequent requests can then be served directly from the file system by a direct link (which, in case of a web app, you could rewrite the HTML with before flushing the output buffer).
This provides the best of both worlds:
The authoritative store is the
database, keeping transactional and
referential integrity
You can deploy all user data by
simply deploying the database
Emptying this cache (e.g. by adding a
web server) would only cause a
temporary performance hit while it is
refilled automatically.
There is no need to constantly hammer the database for things that won't change all the time, but the important thing is that the user data is all there and not scattered around different places, making multi-server operation and deployment a total mess.
I'm always advocating the "database as the user data store, unless" approach, because it is better architecturally, and not necessarily slower with effective caching.
Having said that, a good reason to use the file system as the authoritative store would be when you really need to use external independent tools for accessing it, e.g. SFTP and whatnot.
Given that you might want to save an image along with a name, brief description, created date, created by, etc., you might find it better to save in a database. That way, everything is together. If you saved this same info and stored the image as a file, you would have to retrieve the whole "image object" from two places...and down the road, you might find yourself having syncing issues (some images not being found). Hopefully this makes sense.
By saving you mean to use them to show in a webpage or something like that?
If it's the case, the better option will be to use files, if you use a database it will be constantly hammered with the request for photos. And it's a situation that doesn't scale too well.
The question is, does your application handle BLOBS or other files like other application data? Do your users upload images alongside other data? If so, then you ought to store the BLOBs in the database. It makes it easier to back up the database and, in the event of a problem, to recover to a transactionally consistent state.
But if you mean images which are part of the application infratstructure rather than user data then probably the answer is, No.
If I'm running on one web server and will only ever be running on one web server, I store them as files. If I'm running across multiple webheads, I put the reference instance of the image in a database BLOB and cache it as a file on the webheads.
Blobs can be heavy on the db/scripts, why not just store paths. The only reason we've ever used blobs is if it needs to be merge replicated or super tight security for assets (as in cant pull image unless logged in or something)
I would suggest to go for File systems. First, let's discuss why not Blob? So to answer that, we need to think what advantages DB provides us over File system?
Mutability: We can modify the data once stored. Not Applicable in case of images. Images are just a series of 1s and 0s. Whenever we changes an image, it wouldn't be a matter of few 1s and 0s altered and hence, modifying the same image content doesn't make sense. It's better to delete the old one, and store new.
Indexing: We can create indexes for faster searching. But it doesn't apply on images as images are just 1s and 0s and we can't index that.
Then why File systems?
Faster access: If we are storing images in Blob inside our DB, then a query to fetch the complete record (select *) will result in a very poor performance of the query as a lots and lots of data will be going to and from the DB. Instead if we just store the URL of images in DB and store images in a distributed file system (DFS), it will be much faster.
Size limit: If DBs are storing images, a lot and lot of images then it might face performance issues and also, reach its memory limit (few DBs do have it).
Using file System is better as the basic feature you would be provided with while storing images as a blob would be
1. mutability which is not needed for an image as we won't be changing the binary data of images, we will be removing images as whole only
2. Indexed searching :which is not needed for image as the content of images can't be indexed and indexed searching searches the content of the BLOB.
Using file system is beneficial here because
1. its cheaper
2. Using CDN for fast access
hence one way forward could be to store the images as a file and provide its path in database

How important is a database in managing information?

I have been hired to help write an application that manages certain information for the end user. It is intended to manage a few megabytes of information, but also manage scanned images in full resolution. Should this project use a database, and why or why not?
Any question "Should I use a certain tool?" comes down to asking exactly what you want to do. You should ask yourself - "Do I want to write my own storage for this data?"
Most web based applications are written against a database because most databases support many "free" features - you can have multiple webservers. You can use standard tools to edit, verify and backup your data. You can have a robust storage solution with transactions.
The database won't help you much in dealing with the image data itself, but anything that manages a bunch of images is going to have meta-data about the images that you'll be dealing with. Depending on the meta-data and what you want to do with it, a database can be quite helpful indeed with that.
And just because the database doesn't help you much with the image data, that doesn't mean you can't store the images in the database. You would store them in a BLOB column of a SQL database.
If the amount of data is small, or installed on many client machines, you might not want the overhead of a database.
Is it intended to be installed on many users machines? Adding the overhead of ensuring you can run whatever database engine you choose on a client installed app is not optimal. Since the amount of data is small, I think XML would be adequate here. You could Base64 encode the images and store them as CDATA.
Will the application be run on a server? If you have concurrent users, then databases have concepts for handling these scenarios (transactions), and that can be helpful. And the scanned image data would be appropriate for a BLOB.
You shouldn't store images in the database, as is the general consensus here.
The file system is just much better at storing images than your database is.
You should use a database to store meta information about those images, such as a title, description, etc, and just store a URL or path to the images.
When it comes to storing images in a database I try to avoid it. In your case from what I can gather of your question there is a possibilty for a subsantial number of fairly large images, so I would probably strong oppose it.
If this is a web application I would use a database for quick searching and indexing of images using keywords and other parameters. Then have a column pointing to the location of the image in a filesystem if possible with some kind of folder structure to help further decrease the image load time.
If you need greater security due to the directory being available (network share) and the application is local then you should probably bite the bullet and store the images in the database.
My gut reaction is "why not?" A database is going to provide a framework for storing information, with all of the input/output/optimization functions provided in a documented format. You can go with a server-side solution, or a local database such as SQLite or the local version of SQL Server. Either way you have a robust, documented data management framework.
This post should give you most of the opinions you need about storing images in the database. Do you also mean 'should I use a database for the other information?' or are you just asking about the images?
A database is meant to manage large volumes of data, and are supposed to give you fast access to read and write that data in spite of the size. Put simply, they manage scale for data - scale that you don't want to deal with. If you have only a few users (hundreds?), you could just as easily manage the data on disk (say XML?) and keep the data in memory. The images should clearly not go in to the database so the question is how much data, or for how many users are you maintaining this database instance?
If you want to have a structured way to store and retrieve information, a database is most definitely the way to go. It makes your application flexible and more powerful, and lets you focus on the actual application rather than incidentals like trying to write your own storage system.
For individual applications, SQLite is great. It fits right in an app as a file; no need for a whole DRBMS juggernaut.
There are a lot of factors to this. But, being a database weenie, I would err on the side of having a database. It just makes life easier when things changes. and things will change.
Depending on the images, you might store them on the file system or actually blob them and put them in the database (Not supported in all DBMS's). If the files are very small, then I would blob them. If they are big, then I would keep them on he file system and manage them yourself.
There are so many free or cheap DBMS's out there that there really is no excuse not to use one. I'm a SQL Server guy, but f your application is that simple, then the free version of mysql should do the job. In fact, it has some pretty cool stuff in there.
Our CMS stores all of the check images we process. It uses a database for metadata and lets the file system handle the scanned images.
A simple database like SQLite sounds appropriate - it will let you store file metadata in a consistent, transactional way. Then store the path to each image in the database and let the file system do what it does best - manage files.
SQL Server 2008 has a new data type built for in-database files, but before that BLOB was the way to store files inside the database. On a small scale that would work too.

Storing Images in DB - Yea or Nay?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB.
What do you think are the pros/cons?
I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.
There are a couple of issues:
database storage is usually more expensive than file system storage
you can super-accelerate file system access with standard off the shelf products
for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
things like web servers, etc, need no special coding or processing to access images in the file system
databases win out where transactional integrity between the image and metadata are important.
it is more complex to manage integrity between db metadata and file system data
it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem
As with most issues, it's not as simple as it sounds. There are cases where it would make sense to store the images in the database.
You are storing images that are
changing dynamically, say invoices and you wanted
to get an invoice as it was on 1 Jan
2007?
The government wants you to maintain 6 years of history
Images stored in the database do not require a different backup strategy. Images stored on filesystem do
It is easier to control access to the images if they are in a database. Idle admins can access any folder on disk. It takes a really determined admin to go snooping in a database to extract the images
On the other hand there are problems associated
Require additional code to extract
and stream the images
Latency may be
slower than direct file access
Heavier load on the database server
File store. Facebook engineers had a great talk about it. One take away was to know the practical limit of files in a directory.
Needle in a Haystack: Efficient Storage of Billions of Photos
This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.
FileStream solves most of the problems around storing the files in the DB:
The Blobs are actually stored as files in a folder.
The Blobs can be accessed using either a database connection or over the filesystem.
Backups are integrated.
Migration "just works".
However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.
From the MSDN Article:
Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.
File paths in the DB is definitely the way to go - I've heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much.
In my experience, sometimes the simplest solution is to name the images according to the primary key. So it's easy to find the image that belongs to a particular record, and vice versa. But at the same time you're not storing anything about the image in the database.
The trick here is to not become a zealot.
One thing to note here is that no one in the pro file system camp has listed a particular file system. Does this mean that everything from FAT16 to ZFS handily beats every database?
No.
The truth is that many databases beat many files systems, even when we're only talking about raw speed.
The correct course of action is to make the right decision for your precise scenario, and to do that, you'll need some numbers and some use case estimates.
In places where you MUST guarantee referential integrity and ACID compliance, storing images in the database is required.
You cannot transactionaly guarantee that the image and the meta-data about that image stored in the database refer to the same file. In other words, it is impossible to guarantee that the file on the filesystem is only ever altered at the same time and in the same transaction as the metadata.
As others have said SQL 2008 comes with a Filestream type that allows you to store a filename or identifier as a pointer in the db and automatically stores the image on your filesystem which is a great scenario.
If you're on an older database, then I'd say that if you're storing it as blob data, then you're really not going to get anything out of the database in the way of searching features, so it's probably best to store an address on a filesystem, and store the image that way.
That way you also save space on your filesystem, as you are only going to save the exact amount of space, or even compacted space on the filesystem.
Also, you could decide to save with some structure or elements that allow you to browse the raw images in your filesystem without any db hits, or transfer the files in bulk to another system, hard drive, S3 or another scenario - updating the location in your program, but keep the structure, again without much of a hit trying to bring the images out of your db when trying to increase storage.
Probably, it would also allow you to throw some caching element, based on commonly hit image urls into your web engine/program, so you're saving yourself there as well.
Small static images (not more than a couple of megs) that are not frequently edited, should be stored in the database. This method has several benefits including easier portability (images are transferred with the database), easier backup/restore (images are backed up with the database) and better scalability (a file system folder with thousands of little thumbnail files sounds like a scalability nightmare to me).
Serving up images from a database is easy, just implement an http handler that serves the byte array returned from the DB server as a binary stream.
Here's an interesting white paper on the topic.
To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
The answer is "It depends." Certainly it would depend upon the database server and its approach to blob storage. It also depends on the type of data being stored in blobs, as well as how that data is to be accessed.
Smaller sized files can be efficiently stored and delivered using the database as the storage mechanism. Larger files would probably be best stored using the file system, especially if they will be modified/updated often. (blob fragmentation becomes an issue in regards to performance.)
Here's an additional point to keep in mind. One of the reasons supporting the use of a database to store the blobs is ACID compliance. However, the approach that the testers used in the white paper, (Bulk Logged option of SQL Server,) which doubled SQL Server throughput, effectively changed the 'D' in ACID to a 'd,' as the blob data was not logged with the initial writes for the transaction. Therefore, if full ACID compliance is an important requirement for your system, halve the SQL Server throughput figures for database writes when comparing file I/O to database blob I/O.
One thing that I haven't seen anyone mention yet but is definitely worth noting is that there are issues associated with storing large amounts of images in most filesystems too. For example if you take the approach mentioned above and name each image file after the primary key, on most filesystems you will run into issues if you try to put all of the images in one big directory once you reach a very large number of images (e.g. in the hundreds of thousands or millions).
Once common solution to this is to hash them out into a balanced tree of subdirectories.
Something nobody has mentioned is that the DB guarantees atomic actions, transactional integrity and deals with concurrency. Even referentially integrity is out of the window with a filesystem - so how do you know your file names are really still correct?
If you have your images in a file-system and someone is reading the file as you're writing a new version or even deleting the file - what happens?
We use blobs because they're easier to manage (backup, replication, transfer) too. They work well for us.
The problem with storing only filepaths to images in a database is that the database's integrity can no longer be forced.
If the actual image pointed to by the filepath becomes unavailable, the database unwittingly has an integrity error.
Given that the images are the actual data being sought after, and that they can be managed easier (the images won't suddenly disappear) in one integrated database rather than having to interface with some kind of filesystem (if the filesystem is independently accessed, the images MIGHT suddenly "disappear"), I'd go for storing them directly as a BLOB or such.
At a company where I used to work we stored 155 million images in an Oracle 8i (then 9i) database. 7.5TB worth.
Normally, I'm storngly against taking the most expensive and hardest to scale part of your infrastructure (the database) and putting all load into it. On the other hand: It greatly simplifies backup strategy, especially when you have multiple web servers and need to somehow keep the data synchronized.
Like most other things, It depends on the expected size and Budget.
We have implemented a document imaging system that stores all it's images in SQL2005 blob fields. There are several hundred GB at the moment and we are seeing excellent response times and little or no performance degradation. In addition, fr regulatory compliance, we have a middleware layer that archives newly posted documents to an optical jukebox system which exposes them as a standard NTFS file system.
We've been very pleased with the results, particularly with respect to:
Ease of Replication and Backup
Ability to easily implement a document versioning system
If this is web-based application then there could be advantages to storing the images on a third-party storage delivery network, such as Amazon's S3 or the Nirvanix platform.
Assumption: Application is web enabled/web based
I'm surprised no one has really mentioned this ... delegate it out to others who are specialists -> use a 3rd party image/file hosting provider.
Store your files on a paid online service like
Amazon S3
Moso Cloud Storage
Another StackOverflow threads talking about this here.
This thread explains why you should use a 3rd party hosting provider.
It's so worth it. They store it efficiently. No bandwith getting uploaded from your servers to client requests, etc.
If you're not on SQL Server 2008 and you have some solid reasons for putting specific image files in the database, then you could take the "both" approach and use the file system as a temporary cache and use the database as the master repository.
For example, your business logic can check if an image file exists on disc before serving it up, retrieving from the database when necessary. This buys you the capability of multiple web servers and fewer sync issues.
I'm not sure how much of a "real world" example this is, but I currently have an application out there that stores details for a trading card game, including the images for the cards. Granted the record count for the database is only 2851 records to date, but given the fact that certain cards have are released multiple times and have alternate artwork, it was actually more efficient sizewise to scan the "primary square" of the artwork and then dynamically generate the border and miscellaneous effects for the card when requested.
The original creator of this image library created a data access class that renders the image based on the request, and it does it quite fast for viewing and individual card.
This also eases deployment/updates when new cards are released, instead of zipping up an entire folder of images and sending those down the pipe and ensuring the proper folder structure is created, I simply update the database and have the user download it again. This currently sizes up to 56MB, which isn't great, but I'm working on an incremental update feature for future releases. In addition, there is a "no images" version of the application that allows those over dial-up to get the application without the download delay.
This solution has worked great to date since the application itself is targeted as a single instance on the desktop. There is a web site where all of this data is archived for online access, but I would in no way use the same solution for this. I agree the file access would be preferable because it would scale better to the frequency and volume of requests being made for the images.
Hopefully this isn't too much babble, but I saw the topic and wanted to provide some my insights from a relatively successful small/medium scale application.
SQL Server 2008 offers a solution that has the best of both worlds : The filestream data type.
Manage it like a regular table and have the performance of the file system.
It depends on the number of images you are going to store and also their sizes. I have used databases to store images in the past and my experience has been fairly good.
IMO, Pros of using database to store images are,
A. You don't need FS structure to hold your images
B. Database indexes perform better than FS trees when more number of items are to be stored
C. Smartly tuned database perform good job at caching the query results
D. Backups are simple. It also works well if you have replication set up and content is delivered from a server near to user. In such cases, explicit synchronization is not required.
If your images are going to be small (say < 64k) and the storage engine of your db supports inline (in record) BLOBs, it improves performance further as no indirection is required (Locality of reference is achieved).
Storing images may be a bad idea when you are dealing with small number of huge sized images. Another problem with storing images in db is that, metadata like creation, modification dates must handled by your application.
I have recently created a PHP/MySQL app which stores PDFs/Word files in a MySQL table (as big as 40MB per file so far).
Pros:
Uploaded files are replicated to backup server along with everything else, no separate backup strategy is needed (peace of mind).
Setting up the web server is slightly simpler because I don't need to have an uploads/ folder and tell all my applications where it is.
I get to use transactions for edits to improve data integrity - I don't have to worry about orphaned and missing files
Cons:
mysqldump now takes a looooong time because there is 500MB of file data in one of the tables.
Overall not very memory/cpu efficient when compared to filesystem
I'd call my implementation a success, it takes care of backup requirements and simplifies the layout of the project. The performance is fine for the 20-30 people who use the app.
Im my experience I had to manage both situations: images stored in database and images on the file system with path stored in db.
The first solution, images in database, is somewhat "cleaner" as your data access layer will have to deal only with database objects; but this is good only when you have to deal with low numbers.
Obviously database access performance when you deal with binary large objects is degrading, and the database dimensions will grow a lot, causing again performance loss... and normally database space is much more expensive than file system space.
On the other hand having large binary objects stored in file system will cause you to have backup plans that have to consider both database and file system, and this can be an issue for some systems.
Another reason to go for file system is when you have to share your images data (or sounds, video, whatever) with third party access: in this days I'm developing a web app that uses images that have to be accessed from "outside" my web farm in such a way that a database access to retrieve binary data is simply impossible. So sometimes there are also design considerations that will drive you to a choice.
Consider also, when making this choice, if you have to deal with permission and authentication when accessing binary objects: these requisites normally can be solved in an easier way when data are stored in db.
I once worked on an image processing application. We stored the uploaded images in a directory that was something like /images/[today's date]/[id number]. But we also extracted the metadata (exif data) from the images and stored that in the database, along with a timestamp and such.
In a previous project i stored images on the filesystem, and that caused a lot of headaches with backups, replication, and the filesystem getting out of sync with the database.
In my latest project i'm storing images in the database, and caching them on the filesystem, and it works really well. I've had no problems so far.
Second the recommendation on file paths. I've worked on a couple of projects that needed to manage large-ish asset collections, and any attempts to store things directly in the DB resulted in pain and frustration long-term.
The only real "pro" I can think of regarding storing them in the DB is the potential for easy of individual image assets. If there are no file paths to use, and all images are streamed straight out of the DB, there's no danger of a user finding files they shouldn't have access to.
That seems like it would be better solved with an intermediary script pulling data from a web-inaccessible file store, though. So the DB storage isn't REALLY necessary.
The word on the street is that unless you are a database vendor trying to prove that your database can do it (like, let's say Microsoft boasting about Terraserver storing a bajillion images in SQL Server) it's not a very good idea. When the alternative - storing images on file servers and paths in the database is so much easier, why bother? Blob fields are kind of like the off-road capabilities of SUVs - most people don't use them, those who do usually get in trouble, and then there are those who do, but only for the fun of it.
Storing an image in the database still means that the image data ends up somewhere in the file system but obscured so that you cannot access it directly.
+ves:
database integrity
its easy to manage since you don't have to worry about keeping the filesystem in sync when an image is added or deleted
-ves:
performance penalty -- a database lookup is usually slower that a filesystem lookup
you cannot edit the image directly (crop, resize)
Both methods are common and practiced. Have a look at the advantages and disadvantages. Either way, you'll have to think about how to overcome the disadvantages. Storing in database usually means tweaking database parameters and implement some kind of caching. Using filesystem requires you to find some way of keeping filesystem+database in sync.

Resources