This question already has answers here:
Storing Images in DB - Yea or Nay?
(56 answers)
Closed 7 months ago.
I am building a social media application that uses multiple user-uploaded images. I was told that the best tool for handling user-uploading images is Cloudinary, but if possible I want to directly store images in my database. I heard that databases have poor horizontal scaling, which is why solutions like cloudinary are pushed. Is it true that it is a bad idea to have images stored in mongodb? I do not want to use other APIs like cloudinary.
Ussually image URL is stored in database because it always takes less space than whole image data. For example Wordpress stores images on server in uploads folder and in database you can only find URL plus title, type, and some additional data, but not whole image file.
You don't have to use cloud services to store your images, but it can be faster than loading images from your server. Saving every image in database is definetly not a good idea.
It appears good to store image or any BLOB in database in development, as it would organise very well with other associated data or transaction, However, there are caveats of storing BOLOB (binary, image, ...) in database,
non-indexable.
non-searchable.
non-compressible.
Manipulations not easy.
Still, you can store but it will perform poorly at scale and in that case you should avoid storing image in database.
Storing image in any could provider's CDN service becomes best choice for performance including other features of image optimisations.
I'm developing an application that extracts some information for every image in a dataset of images, and store these data for future use. The problem I have is how to properly store these data. Is it better to create a single annotation file (I use JSON files) for each image in the dataset or to create a big unique file that contains all the extracted data?
The kind of information I'm extracting is similar from image to image but not equal. The dataset of images can be huge, >1milion images.
If relevant, I'm using Python on Linux or MacOS.
I would use a single document (file or in a NoSQL database) per dataset.
If you have > 1 million images, single file per image will mean > 1 million files/documents.
Not something that will be easy to manage or manipulate.
A single file/document is much easier to manage and search.
I'd also consider using a NoSQL database to store the JSON documents.
EDIT:
After considering the comments, I'd have to say that you might need to cut off a JSON file at a certain amount of data, resulting in a few files per dataset.
As for files getting corrupted that's a risk you run on any storage, even database files, that's why we have backups and replicas.
You can always run a NoSQL database locally, but again, this will need some computing resources.
The project is required to store large weather data (http://www1.ncdc.noaa.gov/pub/data/igra/)
into file system with JPA. I mean disk files.
How to store those data. For example, how to organize the files? So that we can retrieve those files for retrieving.
I had a quick look at the data, and the description of what they contain, I don't think it's practical to keep those data in disk files if you want to extract information from them. You'd probably be better off to design a couple of simple database tables in which to store these data, and query this database to get the data for your calculations, maybe with JPA.
A library that may help you to parse the data files is JFileHelpers, it makes working with fixed-width and delimited files a lot easier.
Hope this helps to get you started.
I have program that saves unique images that are related to unique database fields to database as binary fields or save in folders as image files?
It depends on what you need to do with the images, how often you access them and how often they change.
There is no right answer for this one - it really depends on what you are trying to achieve.
I would rather use a distinct key/value storage if available. This can be hosted by your own or something like Amazon S3. If not, better save the images as files and organize meta data and path information in your database.
When you save your images (supose you have lots of them) do you store then as blobs in your Database, or as files? Why?
Duplicate of: Storing Images in DB - Yea or Nay?
I usually go with storing them as files, and store the path in the database. To me, it's a much easier and more natural approach than pushing them into the database as blobs.
One argument for storing them in the database: much easier to do full backups, but that depends on your needs. If you need to be able to easily take full snapshots of your database (including the images), then storing them as blobs in the database is probably the way to go. Otherwise you have to pair your database backup with a file backup, and somehow try to associate the two, so that if you have to do a restore, you know which pair to restore.
It depends on the size of the image.
Microsoft Research has an interesting document on the subject
I've tried to use the db (SQL Server and MySQL) to store medium (< 5mb) files, and what I got was tons of trouble.
1) Some DBs (SQL Server Express) have size limits;
2) Some DBs (MySQL) become mortally slow;
3) When you have to display a list of object, if you inadvertedly do SELECT * FROM table, tons of data will try to go up and down from the db, resulting in a deadly slow response or memory fail;
4) Some frontends (ruby ActiveRecord) have very big troubles handling blobs.
Just use files. Don't store them all in the same directory, use some technique to put them on several dirs (for instance, you could use last two chars of a GUID or last two digits of an int id) and then store the path on db.
The performance hit of a database server is a moot issue. If you need the performance benefits of a file system, you simply cache it there on the first request. Subsequent requests can then be served directly from the file system by a direct link (which, in case of a web app, you could rewrite the HTML with before flushing the output buffer).
This provides the best of both worlds:
The authoritative store is the
database, keeping transactional and
referential integrity
You can deploy all user data by
simply deploying the database
Emptying this cache (e.g. by adding a
web server) would only cause a
temporary performance hit while it is
refilled automatically.
There is no need to constantly hammer the database for things that won't change all the time, but the important thing is that the user data is all there and not scattered around different places, making multi-server operation and deployment a total mess.
I'm always advocating the "database as the user data store, unless" approach, because it is better architecturally, and not necessarily slower with effective caching.
Having said that, a good reason to use the file system as the authoritative store would be when you really need to use external independent tools for accessing it, e.g. SFTP and whatnot.
Given that you might want to save an image along with a name, brief description, created date, created by, etc., you might find it better to save in a database. That way, everything is together. If you saved this same info and stored the image as a file, you would have to retrieve the whole "image object" from two places...and down the road, you might find yourself having syncing issues (some images not being found). Hopefully this makes sense.
By saving you mean to use them to show in a webpage or something like that?
If it's the case, the better option will be to use files, if you use a database it will be constantly hammered with the request for photos. And it's a situation that doesn't scale too well.
The question is, does your application handle BLOBS or other files like other application data? Do your users upload images alongside other data? If so, then you ought to store the BLOBs in the database. It makes it easier to back up the database and, in the event of a problem, to recover to a transactionally consistent state.
But if you mean images which are part of the application infratstructure rather than user data then probably the answer is, No.
If I'm running on one web server and will only ever be running on one web server, I store them as files. If I'm running across multiple webheads, I put the reference instance of the image in a database BLOB and cache it as a file on the webheads.
Blobs can be heavy on the db/scripts, why not just store paths. The only reason we've ever used blobs is if it needs to be merge replicated or super tight security for assets (as in cant pull image unless logged in or something)
I would suggest to go for File systems. First, let's discuss why not Blob? So to answer that, we need to think what advantages DB provides us over File system?
Mutability: We can modify the data once stored. Not Applicable in case of images. Images are just a series of 1s and 0s. Whenever we changes an image, it wouldn't be a matter of few 1s and 0s altered and hence, modifying the same image content doesn't make sense. It's better to delete the old one, and store new.
Indexing: We can create indexes for faster searching. But it doesn't apply on images as images are just 1s and 0s and we can't index that.
Then why File systems?
Faster access: If we are storing images in Blob inside our DB, then a query to fetch the complete record (select *) will result in a very poor performance of the query as a lots and lots of data will be going to and from the DB. Instead if we just store the URL of images in DB and store images in a distributed file system (DFS), it will be much faster.
Size limit: If DBs are storing images, a lot and lot of images then it might face performance issues and also, reach its memory limit (few DBs do have it).
Using file System is better as the basic feature you would be provided with while storing images as a blob would be
1. mutability which is not needed for an image as we won't be changing the binary data of images, we will be removing images as whole only
2. Indexed searching :which is not needed for image as the content of images can't be indexed and indexed searching searches the content of the BLOB.
Using file system is beneficial here because
1. its cheaper
2. Using CDN for fast access
hence one way forward could be to store the images as a file and provide its path in database