I am learning the GCP fundamental course, and I am confused about the difference between Storage and Databases in cloud services. Because data can also be considered a kind of file, can data also be stored in Storage?
Could you help explain the difference between these two concepts with an example? Thank you so much.
Storage is for file storage such as images and pdfs.
Database is basically a storage but it stores data records which can be queried using a query language.
So in the storage, you can store files such as images. You can then store the link used to access that in a database. In a database, you store values such as users, usernames, and passwords. Imagine having a database with users, you would have both username and password text in the database and have another value with the image. The image is stored in the storage and in the database we just have the link as image value which sends you to the location of the stored image.
Related
What should be the ideal database-schema if you are building an Oauth Provider and store session timings?
https://auth0.com/blog/how-we-store-data-in-the-cloud-at-auth0/ Auth0 uses MongoDB.
I assume it has everything to do with the 'More reads less writes' motto that defines NoSQL. Plus many mongo specific features like Replication, which help keep the integrity of the user data.
For logging specifically, Microsoft recommends Azure Blob Storage ABS for .NET and other providers like Serilog.
NoSQL db's like MongoDB don't follow a rigid schema, and logs could be stored in an unstructured data storage, but the modern way is using Object Storage which various cloud providers provide. Unstructured data is simply storing data without any schema, relation or clear structure. Object Storage is storing the same things without any folders or subdirectories, plus defining some metadata for each file and is a better way of storing logs like session-timings.
I have application with monolithic architecture and with PostgreSQL as a main storage. There are two docker images, one for db and one for application server. There is high probability that application will be splitted to few services in the near future and it will evolve to microservice architeture. Also, there is high probability that solution will be part of private cloud. Currently, there is requirments to read/store different files though the application, like: pdf, jpg, docx, etc.. And I am on crossroad what will be better to choose in current situation as file storage.
I see few options for the moment:
Object Storage Server (For instance: MinIO which is compatible with Amazon S3 cloud storage service)
PostgreSQL (To store files as BLOB)
File System (To store files on host machine of docker containers)
I read multiple posts where DB solution was compared with File System, but I do not find any comparation when some Object Storage Server was taken into account.
https://dba.stackexchange.com/questions/2445/should-binary-files-be-stored-in-the-database
What is difference between storing data in a blob, vs. storing a pointer to a file?
Please advise which option would be good to choose or please point me to some comparation post where it was already asked
The future direction you mentioned will benefit from having storage as a service, where multiple containers might access the same files. It will give flexibility if you need write/update operations in future.
Some points for trade-off:
If you go with database, you will have to write that service yourself (1) and it will be a custom one, not a widely common online like S3 (2). Contrast to that if you allow direct SQL access to database for the files, it would make your solution brittle because of lack of encapsulation (3). Blob storage in db works (ACID operations), but I have seen db storage management becoming a hassle for DBAs (4).
how big a data is too big to go into a database?
If optimised, database access can be faster than simple file system access.
But assuming the server is running on either:
a conventional budget home server
AWS
Is there any reason to use database for storing things larger than some articles?
The power of X-sendfile made me decide to move some data to the filesystem, but to what extent should I do this?
The only data I wouldn't store on a database are files.
Files whose content I do not need to search.
Like images, videos etc.
Any other data, regardless of the size will go into a database.
If I have a JSON file it will go into a NoSQL database that can search and index JSON file.
If I have a Gigabytes of data on anything, anything at all it will go into a database.
Databases have much better mechanisms than anything anyone can rig for files in a filesystem.
You question lacks context for me to give a better answer.
I need to store a large number of images in the cloud (Amazon EC2). They are already stored on NFS (as a prototype). However, my questions are:
Is it better to store them in any db(e.g. NoSQL) or NFS is a good option. (Is it easily scalable?)
I need to query these images based on their metadata and make them accessible for users based on query results. Can you compare db and NFS based on accessibility and performance?
Is there any appropriate db for this purpose?
You probably want to store your images in Amazon S3 if you want to have them accessible in the cloud. Databases either SQL or NoSQL are generally not a common option to store images.
SQL or NoSQL database are generally used to store data or "metadata" so you could store your images (jpg, gif, tiff, bitmap) on Amazon S3 and and your metadata that points to image file in a SQL or NoSQL database. As another option, you could also store your metadata on files in Amazon S3 if all your metadata is in files.
NFS across servers in a share LAN has decent performance but it really depends on how much time and money you want to spend in making your storage system reliable, scalable, etc. (also, if you want to covert it into some sort of object storage mechanism like Amazon S3) Why reinvent the wheel initially when Amazon S3 provides that for you? As your data grows you can probably experiment with having your custom storage solution.
Hope this helps.
I had to do exactly this for a job a year ago and we decided to store them in S3 and then store their metadata (including the s3 link to the image) in a datastore like DynamoDB (if you won't query on arbitrary metadata) or SimpleDB (if you want to query on any metadata fields).
The S3 Bucket size is theoretically unlimited so you will never run out of space. But by storing the metadata in a faster data store you can write more expressive queries, get better performance and limit the costs of your S3 downloads.
Background:
I am new to cloud computing and large scale DB design. I have to find a storage facility for a large number of images that have a lot of metadata associated with each image. I am going to use Amazon S3 to store my image files and I need a cloud based database solution to store metadata and reference to each image. I need this so I can query a DB for customer request and pull images and their metadata and insert new data as well via some web and mobile application interface I will create.
Research done:
I found the S3 is a raw data storage solution. I found many good discussions here on bucket naming conventions and I see many people use S3 as binary storage and use a DB for metadata. I've done some research on mongoDB, dynamoDB, and other database solutions.
Question:
I need a direction of where I can find an inexpensive and reliable Database that will work well with Amazon S3, that is ideal for large amount of metadata storage.
Well if you are not looking for a relational DB, why not try http://aws.amazon.com/simpledb/
and if you want RDMS how about http://aws.amazon.com/rds/