Why not store files in the database? - database

In an app that will be deployed on heroku....
I need to allow users to upload thumbnail images.
A heroku-deployed app of course has no persistent local file storage.
The typical thing to do here, googling around, seems to be storing the files in Amazon S3, or possibly other AWS-hosted cloud storage.
But what if I just stick the images in a postgres blob column?
What are the downsides of doing this? The upsides are, don't have to pay for other storage, don't have to deal with an additional external system with more opportunities for bugs and outages. But there must be some good reasons nobody seems to do this, what are they?

A database and S3 are two different storage mechanisms. How are they different?
About S3
Amazon S3 is a highly specialized file storage system. It minimizes what you can do to basic read/write/delete file operations and optimizes caching for serving data of the entire file stored.
About Postgres
Postgres is a SQL relational database with massive flexibility for storing and indexing data for a variety of operations. You can very well cram binary image data into a row within Postgres.
Comparing Cost/Scalability
Why would you choose S3 over Postgres? Cost and scalability. Postgres is an expensive, highly skilled generalist. On Heroku, running a Postgres database could cost you hundreds or thousands of dollars/month based on the amount of data and scale of traffic.
Amazon S3 is an inexpensive and highly scalable solution that perfectly matches your needs (write an image, serve up the image in the context of a web page).
TL;DR: Amazon S3 is highly optimized for files like images, highly scalable, and relatively inexpensive.

Related

Storage for pdf, docx, jpg

I have application with monolithic architecture and with PostgreSQL as a main storage. There are two docker images, one for db and one for application server. There is high probability that application will be splitted to few services in the near future and it will evolve to microservice architeture. Also, there is high probability that solution will be part of private cloud. Currently, there is requirments to read/store different files though the application, like: pdf, jpg, docx, etc.. And I am on crossroad what will be better to choose in current situation as file storage.
I see few options for the moment:
Object Storage Server (For instance: MinIO which is compatible with Amazon S3 cloud storage service)
PostgreSQL (To store files as BLOB)
File System (To store files on host machine of docker containers)
I read multiple posts where DB solution was compared with File System, but I do not find any comparation when some Object Storage Server was taken into account.
https://dba.stackexchange.com/questions/2445/should-binary-files-be-stored-in-the-database
What is difference between storing data in a blob, vs. storing a pointer to a file?
Please advise which option would be good to choose or please point me to some comparation post where it was already asked
The future direction you mentioned will benefit from having storage as a service, where multiple containers might access the same files. It will give flexibility if you need write/update operations in future.
Some points for trade-off:
If you go with database, you will have to write that service yourself (1) and it will be a custom one, not a widely common online like S3 (2). Contrast to that if you allow direct SQL access to database for the files, it would make your solution brittle because of lack of encapsulation (3). Blob storage in db works (ACID operations), but I have seen db storage management becoming a hassle for DBAs (4).

What is a cost effective solution to store pdf files on cloud when I am using AWS infrastructure?

I am specifically looking to host it on AWS infrastructure. I have already been using AWS S3 and Postgres. Will any of these serve the purpose without costing too much for the storage.
Much depend on total pdf files size. If total pdf files size is miniscule compared to your total database size then go for database but if it is sizable compared to database size then opt for AWS S3. In that scenario, you can evaluate and set object lifecycle policy based on usage. Also you can choose appropriate storage class like standard or Standard-IA or use intelligent_tiering to further reduce cost.
Adding another AWS product will add another cost, so evaluate if it can be done with Postgres only.
Hope it helps.
S3 will always be a cheaper option than a database. To answer your question, continue dumping the PDF files in S3.

How to store high volume images on cloud?

I need to store a large number of images in the cloud (Amazon EC2). They are already stored on NFS (as a prototype). However, my questions are:
Is it better to store them in any db(e.g. NoSQL) or NFS is a good option. (Is it easily scalable?)
I need to query these images based on their metadata and make them accessible for users based on query results. Can you compare db and NFS based on accessibility and performance?
Is there any appropriate db for this purpose?
You probably want to store your images in Amazon S3 if you want to have them accessible in the cloud. Databases either SQL or NoSQL are generally not a common option to store images.
SQL or NoSQL database are generally used to store data or "metadata" so you could store your images (jpg, gif, tiff, bitmap) on Amazon S3 and and your metadata that points to image file in a SQL or NoSQL database. As another option, you could also store your metadata on files in Amazon S3 if all your metadata is in files.
NFS across servers in a share LAN has decent performance but it really depends on how much time and money you want to spend in making your storage system reliable, scalable, etc. (also, if you want to covert it into some sort of object storage mechanism like Amazon S3) Why reinvent the wheel initially when Amazon S3 provides that for you? As your data grows you can probably experiment with having your custom storage solution.
Hope this helps.
I had to do exactly this for a job a year ago and we decided to store them in S3 and then store their metadata (including the s3 link to the image) in a datastore like DynamoDB (if you won't query on arbitrary metadata) or SimpleDB (if you want to query on any metadata fields).
The S3 Bucket size is theoretically unlimited so you will never run out of space. But by storing the metadata in a faster data store you can write more expressive queries, get better performance and limit the costs of your S3 downloads.

Using Amazon S3 for image storage. Where should I store my Metadata?

Background:
I am new to cloud computing and large scale DB design. I have to find a storage facility for a large number of images that have a lot of metadata associated with each image. I am going to use Amazon S3 to store my image files and I need a cloud based database solution to store metadata and reference to each image. I need this so I can query a DB for customer request and pull images and their metadata and insert new data as well via some web and mobile application interface I will create.
Research done:
I found the S3 is a raw data storage solution. I found many good discussions here on bucket naming conventions and I see many people use S3 as binary storage and use a DB for metadata. I've done some research on mongoDB, dynamoDB, and other database solutions.
Question:
I need a direction of where I can find an inexpensive and reliable Database that will work well with Amazon S3, that is ideal for large amount of metadata storage.
Well if you are not looking for a relational DB, why not try http://aws.amazon.com/simpledb/
and if you want RDMS how about http://aws.amazon.com/rds/

Storing Images : DB or File System -

I read some post in this regard but I still don't understand what's the best solution in my case.
I'm start writing a new webApp and the backend is going to provide about 1-10 million images. (average size 200-500kB for a single image)
My site will provide content and images to 100-1000 users at the same time.
I'd like also to keep Provider costs as low as possible (but this is a secondary requirement).
I'm thinking that File System space is less expensive if compared to the cost of DB size.
Personally I like the idea of having all my images in the DB but any suggestion will be really appreciated :)
Do you think that in my case the DB approach is the right choice?
Putting all of those images in your database will make it very, very large. This means your DB engine will be busy caching all those images (a task it's not really designed for) when it could be caching hot application data instead.
Leave the file caching up to the OS and/or your reverse proxy - they'll be better at it.
Some other reasons to store images on the file system:
Image servers can run even when the database is busy or down.
File systems are made to store files and are quite efficient at it.
Dumping data in your database means slower backups and other operations.
No server-side coded needed to serve up an image, just plain old IIS/Apache.
You can scale up faster with dirt-cheap web servers, or potentially to a CDN.
You can perform related work (generating thumbnails, etc.) without involving the database.
Your database server can keep more of the "real" table data in memory, which is where you get your database speed for queries. If it uses its precious memory to keep image files cached, that doesn't buy you hardly anything speed-wise versus having more of the photo index in memory.
Most large sites use the filesystem.
See Store pictures as files or in the database for a web app?
When dealing with binary objects, follow a document centric approach for architecture, and not store documents like pdf's and images in the database, you will eventually have to refactor it out when you start seeing all kinds of performance issues with your database. Just store the file on the file system and have the path inside a table of your databse. There is also a physical limitation on the size of the data type that you will use to serialize and save it in the database. Just store it on the file system and access it.
Your first sentence says that you've read some posts on the subject, so I won't bother putting in links to articles that cover this. In my experience, and based on what you've posted as far as the number of images and sizes of the images, you're going to pay dearly in DB performance if you store them in the DB. I'd store them on the file system.
What database are you using? MS SQL Server 2008 provides FILESTREAM storage
allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system. It covers choices for BLOB storage, configuring Windows and SQL Server for using FILESTREAM data, considerations for combining FILESTREAM with other features, and implementation details such as partitioning and performance.
details
We use FileNet, a server optimized for imaging. It's very expensive. A cheaper solution is to use a file server.
Please don't consider storing large files on a database server.
As others have mentioned, store references to the large files in the database.

Resources