How can I go about storing images on a database - database

I'm wondering how could I go about storing images to a database. Is there a way to store the actual image to a sql or nosql database and then just query the image or would I need to only send the path of a image and then store all images in a folder and direct the image tag to that image. Is there any technologies that will allow me to put the actual image in the cloud and make a get call to it?

Some general direction: If you are already using a SQL database, don't want to change the database architecture, and everything else about that database is fine - then you may want to use a few option (remember this is if you are not used a NoSQL DB):
a table of paths to the location of the image on your server
a table of URLs to static URLs of where the images are being stored
If you are using a NoSQL then storing compressed images in your database may be fine. Some thing to keep in mind:
when storing images directly in your database, you can start to add a lot of load on your engine
images are large (> 1 MB average) and will easily start to occupy space fast
There are many other practices to keep in mind when choosing. I personally would use some cloud service to reference a static URL that links to my images. This creates a bit more portability and reduces the total size of the table greatly. Check out this discussion for some more details.

Related

Save image files using hibernate

I searched for this but only found ways to save the image as a blob to the database.
What I would like to do is to save the image location in the database and then retrieve the file automatically from the location instead of saving it as a blob into the database.
Does this make any sense at all? Or is it better and faster and cheaper to just save the image files into the database as a blob?
Depending on the locale i might have to get a different picture.
Thanks for your help!
I have experience with both (saving an image as a blob in a RDBMS) and only storing the link to it in a filesystem/url manner. What I have come to realize is the first approach is plain no scalable.
Here is a rather biased list of things about each approach.
Approach 1. Saving images as blobs:
Cons:
When the number of images increase, so does your database size and
you are limited to the filesystem your RDBMS engine runs on.
When you want to retrieve a large number of these blobs, and if they
are big in size, you waste IO/bandwdith and put a strain on your
RDBMS engine. You ideally want it to have short queries that execute
fast and move a little amount of bytes around. You just can't get
that if you save the data as a BLOB in your relational database.
While some might argue that for repeatable queries caching will
help, I will argue that I if those huge chunks of data weren't there
in the first place, I wouldn't have to put them in cache.
There is no reliable way for a db admin/ content manager to easily
retrieve the contents a blob that is in db, for example, to verify
if an image is broken. He would have to connect to the db and
extract the BLOB bytes in some format and then view it. Or
alternatively you can build some page to do that for him but that
would be a badly put together gimmicks in my honest opinion.
Pros:
You don't have to rely on file systems being available or external
systems on which you host your images to be available. You would
probably write a bit less code and you will have more control over
your code since all the stuff you want is in your RDBMS.
Approach 2. Saving images as a link to a filesystem/urls
Pros:
Greatly alleviates performance strain on your RDBMS engine.
If you store the images as links, a system admin/ content manager
can easily check them by just copying the link in a browser and
verifying it renders properly.
If you don't use an external image hosting service but rather an
internal, you still retain a great amount of control while having
the possibility in future to add more image hosting servers/
filesystems.
If you have a large amount of pictures being retrieved and they are
not hosted by you, you can distribute a lot of network load thus
making load times snappier.
Cons:
Things will be a bit decentralized adding some complexity to your application. If you are using an external hosting service, it might be down and you can have no control over it.
In conclusion, I wholeheartedly recommend using the second approach.
In general I agree with #baba's answer.
However it really depends on the number of and sizes of the images. If all the images are small thumbnails then I would store them in the database, so that everything is in one place.
It's also possible to do both...as long as the storage space is available for both the database and filesystem. This gives you the best of both worlds and a built-in backup.

Storing Images : DB or File System -

I read some post in this regard but I still don't understand what's the best solution in my case.
I'm start writing a new webApp and the backend is going to provide about 1-10 million images. (average size 200-500kB for a single image)
My site will provide content and images to 100-1000 users at the same time.
I'd like also to keep Provider costs as low as possible (but this is a secondary requirement).
I'm thinking that File System space is less expensive if compared to the cost of DB size.
Personally I like the idea of having all my images in the DB but any suggestion will be really appreciated :)
Do you think that in my case the DB approach is the right choice?
Putting all of those images in your database will make it very, very large. This means your DB engine will be busy caching all those images (a task it's not really designed for) when it could be caching hot application data instead.
Leave the file caching up to the OS and/or your reverse proxy - they'll be better at it.
Some other reasons to store images on the file system:
Image servers can run even when the database is busy or down.
File systems are made to store files and are quite efficient at it.
Dumping data in your database means slower backups and other operations.
No server-side coded needed to serve up an image, just plain old IIS/Apache.
You can scale up faster with dirt-cheap web servers, or potentially to a CDN.
You can perform related work (generating thumbnails, etc.) without involving the database.
Your database server can keep more of the "real" table data in memory, which is where you get your database speed for queries. If it uses its precious memory to keep image files cached, that doesn't buy you hardly anything speed-wise versus having more of the photo index in memory.
Most large sites use the filesystem.
See Store pictures as files or in the database for a web app?
When dealing with binary objects, follow a document centric approach for architecture, and not store documents like pdf's and images in the database, you will eventually have to refactor it out when you start seeing all kinds of performance issues with your database. Just store the file on the file system and have the path inside a table of your databse. There is also a physical limitation on the size of the data type that you will use to serialize and save it in the database. Just store it on the file system and access it.
Your first sentence says that you've read some posts on the subject, so I won't bother putting in links to articles that cover this. In my experience, and based on what you've posted as far as the number of images and sizes of the images, you're going to pay dearly in DB performance if you store them in the DB. I'd store them on the file system.
What database are you using? MS SQL Server 2008 provides FILESTREAM storage
allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system. It covers choices for BLOB storage, configuring Windows and SQL Server for using FILESTREAM data, considerations for combining FILESTREAM with other features, and implementation details such as partitioning and performance.
details
We use FileNet, a server optimized for imaging. It's very expensive. A cheaper solution is to use a file server.
Please don't consider storing large files on a database server.
As others have mentioned, store references to the large files in the database.

What storage location, SQL Server or file system, would result in better performance in saving tiff images?

Our system needs to store tiff images of ~3k in size. We receive ~300 images at a given time and need to quickly process them. Once ~100,000 images are received, the images are transferred off our system to another archival system or purged.
I am looking for best performance in regards to the initial save of the image files. The task of transferring the images for archival is less performance critical.
What storage location, SQL Server or file system, would result in better performance in saving tiff images?
Are there any other considerations or gotchas to be aware of?
Storing the images in the filesystem will give you better performance. You just need to put an entry into a relevant database table for the tiff image attachments - and use that to get the path of the image on the filesystem.
You might want to further boost performance by hosting the images on a web server - IIS (if relevant) and have your client applications (again if relevant) retrieve them directly frmo there instead.
In my experience SQL Server has been decent with storing blobs into the database. As long as I follow Best Practices related to queries, normalization, etc. I have found them to work well.
For some reason, I personally do not want to store huge PDF and DOC and JPG files in my database, but then, that is exactly what Microsoft SharePoint does, and does well.
I'd definitely consider putting blobs in my db.
The SQL Server 2008 version has a new feature called FILESTREAM. Part of their documentation also has a section on best practices, in which the MS folks state that FILESTREAM should come into play if the BLOB objects are typically larger 1 MB.
That MSDN page states:
When to Use FILESTREAM If the
following conditions are true, you
should consider using FILESTREAM:
- Objects that are being stored are, on average, larger than 1 MB. For
smaller objects, storing
varbinary(max) BLOBs in the database
often provides better streaming
performance.
So I guess with a 3 KB TIFF, you could store that nicely inside a VARBINARY(MAX) field in your SQL Server 2005 table. Since it's even smaller than the 8k page size for SQL Server, that'll fit nicely!
You might also want to consider putting your BLOBs into their own table and reference your "base" data row from there. That way, if you only need to query the base data (your ints, varchars etc.), your query won't be bogged down by BLOBs being stored intermingled with other stuff.
Marc
The satellite catalog system at INPE/Brazil stores a reference of tiff images stored in filesystem. But the images are a little bigger - +/- 100 MB. If the file must be displayed at browser, the php code reads the tiff content at disk and draw it.

Store images(jpg,gif,png) in filesystem or DB? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
Which is more secure: filesystem or database?
User images - database vs. filesystem storage
store image in database or in a system file ?
I can't decide which one I should follow. Can you guys give some opinions? Should I store my images in the file-system or DB? (I would like to prevent others from stealing my images)
When you answer this question, please include comparisons of the security, performances etc.
Thanks.
Exact Duplicate: User Images: Database or filesystem storage?
Exact Duplicate: Storing images in database: Yea or nay?
Exact Duplicate: Should I store my images in the database or folders?
Exact Duplicate: Would you store binary data in database or folders?
Exact Duplicate: Store pictures as files or or the database for a web app?
Exact Duplicate: Storing a small number of images: blob or fs?
Exact Duplicate: store image in filesystem or database?
Moving your images into a database and writing the code to extract the image may be more hassle than it's worth. It will all go back to the business requirements surrounding the need to protect the images, or the requirement for performance.
I'd suggest sticking to the tried and true system of storing the filepath or directory location in the DB, and keeping the files on disk. Here's why:
A filesystem is easier to maintain. Some thought has to be put into the structure and organization of the images. i.e. a directory for each customer, and a subdirectory for each [Attribute-X] and another subfolder for each [Attribute-Y]. Keeping too many images in one directory will end up slowing down the file access (i.e. hundreds of thousands)
If the idea of storing in a DB is a counter-measure against filesystem loss, (i.e. a disk goes down, or a directory is deleted by accident), then I'd counter with the suggestions that when you use source control, it's no problem to retrieve any lost/missing/delete files.
If you ever need to scale and move to a content distribution scenario, you'd have to move out back to the filesystem or perform a big extract to push out to the providers.
It also goes with the saying: "keep structured data in a database". Microsoft has an article on Managing Unstructured Data.
If security is an issue to be addressed, the filesystem has a whole structure with ACLs. Reinventing the wheel on security may be out of scope in the business requirements.
A large amount of discussion for this topic is also found at:
Question 3748
Question 561447
Having your images stored as varbinary or a blob of some kind (depending on your platform), I'd suggest it's more hassle than it's worth. The effort that you'll need to extend means more code that you'll have to maintain, unit test, and defend against defects.
If your environment can support SQL Server 2008, you can have the best of both worlds with their new FileStream datatype in SQL 2008.
An MSDN article is touting the FileStream datatype in SQL 2008 as high performance.
SQL Skills has a great article with some SQL 2008 Filestream performance measurements.
Here is an article addressing varbinary vs. FileStream and performance of both datatypes.
If you are a SQL Mag subscriber, you can see a great article at SQL Mag on SQL 2008 FileStream.
Microsoft Research article:To Blob or Not To Blob
I'd love to see studies in real-world scenarios with large user bases like Flickr or Facebook.
Again, it all goes back to your business requirements. Good luck!
It doesn't matter where you store them in terms of preventing "theft". If you deliver the bytestream to a browser, it can be intercepted and saved by anyone. There's no way to prevent that (I'm assuming you're talking about delivering the images to a browser).
If you're just talking about securing images on the machine itself, it also doesn't matter. The operating system can be locked down as securely as the database, preventing anyone from getting at the images.
In terms of performance (when presenting images to a browser), I personally think it'll be faster serving from a filesystem. You have to present the images in separate HTTP transactions anyway, which would almost certainly require multiple trips to the database. I suspect (although I have no hard data) that it would be faster to store the image URLs in the database which point to static images on the file system - then the act of getting an image is a simple file open by the web server rather than running code to access the database.
You're probably going to have to get a whole ton of "but the filesystem is a DB" answers. This isn't one of them.
The filesystem option depends on many factors, for example, does the server have write premissisons to the directory? (And yes, I have seen servers where apache couldn't write to DocumentRoot.)
If you want 100% cross-compatibility across platforms, then the Database option is the best way to go. It'll also let you store image-related metadata such as a user ID, the date uploaded, and even alternate versions of the same image (such as cached thumbnails).
On the down side, you need to write custom code to read images from the DB and serve them to the user, while any standard web server would just let you send the images as they are.
When it comes to the bottom line, though, you should just choose the option that fits your project, and your server configuration.
Store them in FileSystem, store the file path in the DB.
Of course you can make this scalable and distributed, you just need to keep the images dirs synched between them (for JackM). Or use a shared storage connected to multiple web frontend servers.
Anyway, the stealing part was covered in your other question and is basically impossible. People that can access the images will always be able (with more or less work) to save them locally ... even if it means "print-screen" and paste into photoshop and saving.
It depends on how many images you expect to handle, and what you have to do with them. I have an application that needs to temporarily store between 100K and several million images a day. I write them in 2gb contiguous blocks to the filesystem, storing the image identifier, filename, beginning position and length in a database.
For retrieval I just keep the indices in memory, the files open read only and seek back and forth to fetch them. I could find no faster way to do it. It is far faster than finding and loading an individual file. Windows can get pretty slow once you've dumped that many individual files into the filesystem.
I suppose it could contribute to security, since the images would be somewhat difficult to retrieve without the index data.
For scalability, it would not take long to put a web service in front of it and distribute it across several machines.
For a web application I look after, we store the images in the database, but make sure they're well cached in the filesystem.
A request from one of the web server frontends for an image requires a quick memcache
check to see if the image has changed in the database and, if not, serves it from the filesystem. If it has changed it fetches it from the central database and puts a copy in the filesystem.
This gives most of the advantages of storing them in the filesystem while keeping some
of the advantages of database - we only have one location for all the data which makes
backups easier and means we can scale across quite a few machines without issue. It
also doesn't put excessive load on the database.
If you want your application to be scalable, do not use a file system on the actual web servers. You can store the location of files in a persistent datastore such as a database or a NoSQL solution.
For an AWS solution to this for example you should:
Store the images on S3
Save the S3 key to the database
Serve yourimages on S3 through cloudfront (Amazon CDN)
Saving your files to the DB will provide a some security in terms that another user would need access to the DB in order to retrieve the files, but, as far as efficiency goes, a sql query for every image loaded, leaving all the load to the server side. Do yourself a favor and find a way to protect your images inside the filesystem, they are many.
The biggest out-of-the-box advantage of a database is that it can be accessed from anywhere on the network, which is essential if you have more than one server.
If you want to access a filesystem from other machines you need to set up some kind of sharing scheme, which could add complexity and could have synchronization issues.
If you do go with storing images in the database, you can still use local (unshared) disks as caches to reduce the strain on the DB. You'd just need to query the DB for a timestamp or something to see if the file is still up-to-date (if you allow files that can change).
If the issue is scalability you'll take a massive loss by moving things into the database. You can round-robin webservers via DNS but adding the overhead of both a CGI process and a database lookup to each image is madness. It also makes your database that much harder to maintain and your images that much harder to process.
As to the other part of your question, you can secure access to a file as easily as a database record, but at the end of the day as long as there is an URL that returns a file you have limited options to prevent that URL being used (at least without making cookies and/or javascript compulsory).
Store files in a file server, and store primitive data in a database. While file servers (especially HTTP-based) scale well, database servers do not. Don't mix them together.
If you need to edit, manage, or otherwise maintain the images, you should store it outside the database.
Also, the filesystem has many security features that a database does not.
The database is good for storing pointers (file paths) to the actual data.

Saving images: files or blobs?

When you save your images (supose you have lots of them) do you store then as blobs in your Database, or as files? Why?
Duplicate of: Storing Images in DB - Yea or Nay?
I usually go with storing them as files, and store the path in the database. To me, it's a much easier and more natural approach than pushing them into the database as blobs.
One argument for storing them in the database: much easier to do full backups, but that depends on your needs. If you need to be able to easily take full snapshots of your database (including the images), then storing them as blobs in the database is probably the way to go. Otherwise you have to pair your database backup with a file backup, and somehow try to associate the two, so that if you have to do a restore, you know which pair to restore.
It depends on the size of the image.
Microsoft Research has an interesting document on the subject
I've tried to use the db (SQL Server and MySQL) to store medium (< 5mb) files, and what I got was tons of trouble.
1) Some DBs (SQL Server Express) have size limits;
2) Some DBs (MySQL) become mortally slow;
3) When you have to display a list of object, if you inadvertedly do SELECT * FROM table, tons of data will try to go up and down from the db, resulting in a deadly slow response or memory fail;
4) Some frontends (ruby ActiveRecord) have very big troubles handling blobs.
Just use files. Don't store them all in the same directory, use some technique to put them on several dirs (for instance, you could use last two chars of a GUID or last two digits of an int id) and then store the path on db.
The performance hit of a database server is a moot issue. If you need the performance benefits of a file system, you simply cache it there on the first request. Subsequent requests can then be served directly from the file system by a direct link (which, in case of a web app, you could rewrite the HTML with before flushing the output buffer).
This provides the best of both worlds:
The authoritative store is the
database, keeping transactional and
referential integrity
You can deploy all user data by
simply deploying the database
Emptying this cache (e.g. by adding a
web server) would only cause a
temporary performance hit while it is
refilled automatically.
There is no need to constantly hammer the database for things that won't change all the time, but the important thing is that the user data is all there and not scattered around different places, making multi-server operation and deployment a total mess.
I'm always advocating the "database as the user data store, unless" approach, because it is better architecturally, and not necessarily slower with effective caching.
Having said that, a good reason to use the file system as the authoritative store would be when you really need to use external independent tools for accessing it, e.g. SFTP and whatnot.
Given that you might want to save an image along with a name, brief description, created date, created by, etc., you might find it better to save in a database. That way, everything is together. If you saved this same info and stored the image as a file, you would have to retrieve the whole "image object" from two places...and down the road, you might find yourself having syncing issues (some images not being found). Hopefully this makes sense.
By saving you mean to use them to show in a webpage or something like that?
If it's the case, the better option will be to use files, if you use a database it will be constantly hammered with the request for photos. And it's a situation that doesn't scale too well.
The question is, does your application handle BLOBS or other files like other application data? Do your users upload images alongside other data? If so, then you ought to store the BLOBs in the database. It makes it easier to back up the database and, in the event of a problem, to recover to a transactionally consistent state.
But if you mean images which are part of the application infratstructure rather than user data then probably the answer is, No.
If I'm running on one web server and will only ever be running on one web server, I store them as files. If I'm running across multiple webheads, I put the reference instance of the image in a database BLOB and cache it as a file on the webheads.
Blobs can be heavy on the db/scripts, why not just store paths. The only reason we've ever used blobs is if it needs to be merge replicated or super tight security for assets (as in cant pull image unless logged in or something)
I would suggest to go for File systems. First, let's discuss why not Blob? So to answer that, we need to think what advantages DB provides us over File system?
Mutability: We can modify the data once stored. Not Applicable in case of images. Images are just a series of 1s and 0s. Whenever we changes an image, it wouldn't be a matter of few 1s and 0s altered and hence, modifying the same image content doesn't make sense. It's better to delete the old one, and store new.
Indexing: We can create indexes for faster searching. But it doesn't apply on images as images are just 1s and 0s and we can't index that.
Then why File systems?
Faster access: If we are storing images in Blob inside our DB, then a query to fetch the complete record (select *) will result in a very poor performance of the query as a lots and lots of data will be going to and from the DB. Instead if we just store the URL of images in DB and store images in a distributed file system (DFS), it will be much faster.
Size limit: If DBs are storing images, a lot and lot of images then it might face performance issues and also, reach its memory limit (few DBs do have it).
Using file System is better as the basic feature you would be provided with while storing images as a blob would be
1. mutability which is not needed for an image as we won't be changing the binary data of images, we will be removing images as whole only
2. Indexed searching :which is not needed for image as the content of images can't be indexed and indexed searching searches the content of the BLOB.
Using file system is beneficial here because
1. its cheaper
2. Using CDN for fast access
hence one way forward could be to store the images as a file and provide its path in database

Resources