Hyperspectral image storage - database

I would like to save hyperspectral images using Python, but I don't know where I can persist the data. I have thought about HDFS. I need to do it on my local server without using cloud providers
Is there a way to make it easy, and do you recommend any particular database?

HDFS generally requires you to build and administer your own Hadoop cluster.
I'd consider cloud object storage such as AWS S3 or Google Cloud Storage for the following reasons:
Relatively cheap
Fully managed
No restriction on file size or number of files
Easy Python APIs
Durable - data can be replicated across multiple regions (all handled automatically for you), so you don't need to worry about losing anything if a server dies.

Related

Storage for pdf, docx, jpg

I have application with monolithic architecture and with PostgreSQL as a main storage. There are two docker images, one for db and one for application server. There is high probability that application will be splitted to few services in the near future and it will evolve to microservice architeture. Also, there is high probability that solution will be part of private cloud. Currently, there is requirments to read/store different files though the application, like: pdf, jpg, docx, etc.. And I am on crossroad what will be better to choose in current situation as file storage.
I see few options for the moment:
Object Storage Server (For instance: MinIO which is compatible with Amazon S3 cloud storage service)
PostgreSQL (To store files as BLOB)
File System (To store files on host machine of docker containers)
I read multiple posts where DB solution was compared with File System, but I do not find any comparation when some Object Storage Server was taken into account.
https://dba.stackexchange.com/questions/2445/should-binary-files-be-stored-in-the-database
What is difference between storing data in a blob, vs. storing a pointer to a file?
Please advise which option would be good to choose or please point me to some comparation post where it was already asked
The future direction you mentioned will benefit from having storage as a service, where multiple containers might access the same files. It will give flexibility if you need write/update operations in future.
Some points for trade-off:
If you go with database, you will have to write that service yourself (1) and it will be a custom one, not a widely common online like S3 (2). Contrast to that if you allow direct SQL access to database for the files, it would make your solution brittle because of lack of encapsulation (3). Blob storage in db works (ACID operations), but I have seen db storage management becoming a hassle for DBAs (4).

Is it possible to run Postgres in Google App Engine Flexible?

Is it possible to run postgres (essentially, a non-HTTP service) in a custom Google App Engine Flexible container? Or will I be forced to use Google's Cloud SQL solution?
TL;DR: You could do that, but don’t. It’s better to externalize the persistent data storage.
Yes, it is possible to run a PostgreSQL database as a microservice (named simply a 'service' in Google Cloud Platform) in a custom Google App Engine Flexible container. However, that raises another important question, namely why would you like to run an SQL database inside a container. This is a risky solution, unless you are perfectly sure about what you are doing and how to manage that.
Typical container orchestration is based on stateless services which means that they are not intended to store persistent data. This kind of containers do have some form of storage sometimes, like NoSQL databases for cache or user session information. This data is not persistent, it can be lost during restarts or destruction of instances in an agile containerized application environment. PostgreSQL databases are rather used as stateful services and do not suit the aforementioned model. Putting such database into a container, one can run into problems like data corruption or direct concurrency when accessing some shared data directory. Also, in Google App Engine Flexible it’s not possible to add a shared persistent disk, the volumes are attached to instances and destroyed together with them. Much safer solution is keeping the SQL database in an external, durable storage, as Cloud SQL that you have mentioned. There are numerous blog posts and articles that elaborate this issue with the stateless/stateful services, like this one.
It should be mentioned that if you are to use the container in a local environment or for test/development (and you are not looking for a durable state of the database), putting a PostgreSQL inside a container should be perfectly ok. Also, if you design a special way of splitting your data across instances this could work fine, as the guys did with their MySQL servers in this article. So once again, the idea of putting a PostgreSQL database in a container should be carefully thought-out, especially that there are so many options of a safe externalization of such a service.
And just as a side note, you are not forced to use Cloud SQL. The database can be hosted on Compute Engine, another cloud provider, on premises, or can be managed by a third-party vendor. In case of hosting it in Compute Engine the application is able to communicate with the database inside the same project using the internal IP of the Compute Engine instance. Using Cloud Launcher you can quickly deploy PostgreSQL and other popular databases to Compute Engine. Check these Google docs for more information about using third-party databases.

Why not store files in the database?

In an app that will be deployed on heroku....
I need to allow users to upload thumbnail images.
A heroku-deployed app of course has no persistent local file storage.
The typical thing to do here, googling around, seems to be storing the files in Amazon S3, or possibly other AWS-hosted cloud storage.
But what if I just stick the images in a postgres blob column?
What are the downsides of doing this? The upsides are, don't have to pay for other storage, don't have to deal with an additional external system with more opportunities for bugs and outages. But there must be some good reasons nobody seems to do this, what are they?
A database and S3 are two different storage mechanisms. How are they different?
About S3
Amazon S3 is a highly specialized file storage system. It minimizes what you can do to basic read/write/delete file operations and optimizes caching for serving data of the entire file stored.
About Postgres
Postgres is a SQL relational database with massive flexibility for storing and indexing data for a variety of operations. You can very well cram binary image data into a row within Postgres.
Comparing Cost/Scalability
Why would you choose S3 over Postgres? Cost and scalability. Postgres is an expensive, highly skilled generalist. On Heroku, running a Postgres database could cost you hundreds or thousands of dollars/month based on the amount of data and scale of traffic.
Amazon S3 is an inexpensive and highly scalable solution that perfectly matches your needs (write an image, serve up the image in the context of a web page).
TL;DR: Amazon S3 is highly optimized for files like images, highly scalable, and relatively inexpensive.

Ways to use own data-storage in popular BaaS

We are developing an application which requires the client (mobile device) to send files of size 5MB or more to the server component for processing and we would like some advice on the following:
Is there any way to combine a Backend-as-a-Service (BaaS) platform with our own data-storage (hosted in our particular case in AWS)? We essentially would prefer if the files from the client are sent directly to our own database in the cloud rather than be stored in the BaaS servers.
In other words, we need a BaaS platform or a solution that allows unbundling/bypassing its data-storage feature so that we can use the BaaS only for the rest of its facilities (such as the client authentication, the REST API etc).
We have our own servers in EC2 which are needed for the main processing part of the files and only need the BaaS platform for conveniences that will kick-start our application in a short amount of time. Pulling the files from the BaaS platform's own data-storage to the EC2 servers would induce overall latency overhead as well as extra bandwidth cost in most cases.
I'd faced a similar dilemma while building my app. In my case, I had to upload and store photos uploaded by users somewhere AND I didn't want to build a backend myself. So, I decided to use Amazon S3 to store the photos uploaded by the user and used SimpleDB as it offered me greater flexibility and ease of use than using a MySQL backend. Now, obviously, SimpleDB is not a Backend-as-a-Service platform but I was looking for the same convenience as you are.
So what I'm suggesting is that you use a Backend-as-a-Service platform like Parse (which has an excellent freemium model), CloudMine (another great service but with tight limitations on the freemium model i.e only 500 free users/month) or Kinvey (which markets itself as the first BaaS platform, I don't have much information about it but it's definitely worth a look). And use S3 for your data storage. This way you can use the BaaS for client authentication, the REST API etc as you mentioned and you can continue using S3. All you need to do is create an appropriate naming scheme for your S3 buckets and objects such that you can easily identify which object belongs to which user, this can be done easily using a prefix-based naming scheme (seeing as S3 doesn't offer the ability to create sub-folders in buckets). Now whenever you need to pull some client information you can make a call to your BaaS with the client authenticated details and whenever you need access to your data-storage you can make a call to S3 using the Android SDK provided by AWS to retrieve the objects that belong to that particular user. Seeing as you plan on using EC2 to process those files transferring those files from S3 to EC2 should not cost you any extra bandwidth (I might be wrong here because I haven't looked into it but as far as I can remember transferring data within AWS is free).
Do let me know if you have additional questions.

What's a good cloud based file storage platform to use with Silverlight?

I'm working on a Silverlight app that would allow a user to upload a few gigs of files to a hypothetical cloud based file store, then allow the user to view some data about those files later (more functionality than a file store). Ideally I'd like to use a free, per-user store such as SkyDrive but I can't seem to find an API for that service (and read elsewhere on stack overflow that programmatic access violates their TOS). Do any services fit this bill? I've heard of Amazon S3 but I understand that'll cost some money - is anything free?
EDIT: Could Mesh be an option?
What is LiveMesh Object and its connection with Silverlight 3.0
You could look at using Azure as it offers a blob and table storage cloud infrastrucutre and will happily run silverlight applications in an azure web role. Currently there is no cost but this will change once it RTW's.
More info at http://www.azure.com/
AFAIK, nothing in this world is free when you're dealing with gigabytes of storage, plus the bandwith to put them in the cloud.
Amazon S3 is quite reasonable on its pricing.

Resources