Cloud solutions eg Snowflake for server data filling up fast - snowflake-cloud-data-platform

Firstly I'm new to development and currently I have a problem with server data filling up rapidly. I'm looking at solutions such as watcher programs to help me detect when the server data is reaching the limit but I wanted to know if cloud solutions could help in this regard. Additionally I also wanted to know if companies such as Snowflake can help to handle fast growing data and in what way can a developer use it or will it be too costly to use this approach from an enterprise point of view.
I have tried to look up the documentations of Snowflake but I am unable to reach any conclusions as to whether it can help me. I could just see articles about storage and that they store data by compressing it but I wanted more clarity on this solution.

Snowflake stores the data using Cloud Storege Services (AWS S3, Google Cloud Storage, or Microsoft Azure), so you can't fill the server data in normal conditions (never heard that S3 is full on any region).
Check the pricing page to see if it will be costly for you (or not):
https://www.snowflake.com/pricing/

Related

Kubernetes and Cloud Databases

Could someone explain the benefits/issues with hosting a database in Kubernetes via a persistent volume claim combined with a storage volume over using an actual cloud database resource?
It's essentially a trade-off: convenience vs control. Take a concrete example: let's say you pay Amazon money to use Athena, which is really just a nicely packaged version of Facebook Presto which AWS kindly operates for you in exchange for $$$. You could run Presto on EKS yourself, but why would you.
Now, let's say you want to or need to use Apache Drill or Apache Impala. Amazon doesn't offer it. Nor does any of the other big public cloud providers at time of writing, as far as I know.
Another thought: what if you want to migrate off of AWS? Your data has gravity as well.
Could someone explain the benefits/issues with hosting a database in Kubernetes ... over using an actual cloud database resource?
As previous excellent answer noted:
It's essentially a trade-off: convenience vs control
In addition to previous example (Athena), take a look at RDS as well and see what you would need to handle yourself (why would you, as said already):
Automatic backups
Multizone deployments
Snapshots
Engine upgrades
Read replicas
and other bells and whistles that come with managed service opposed to self-hosted/managed one.
But there is more to it than just convenience/control that this post I trying to shed light onto:
Kubernetes is adding another layer of abstraction there (pods, services...), and depending on way of handling storage (persistent volumes) you can have two additional considerations:
Access speed (depending on your use case this can be negligent or show stopper).
Storage that you have at hand might not be optimized for relational database type of I/O (or restrict you to schedule pods efficiently). The very same reasons you are not advised to run db on NFS for example.
There are several recent conference talks on kubernetes pointing out that database is big no-no for kubernetes (although this is highly opinionated, we do run average load mysql and postgresql databases in k8s), and large load/fast I/O is somewhat challenge to get right on k8s as opposed to somebody already fine tuned everything for you in managed cloud solution.
In conclusion:
It is all about convenience, controls and capabilities.

Load data from firebase to amazon redshift

I have around 500MB data in firebase and I want to move it to amazon redshift on daily basis. what is the best way for above problem.
thanks in advance.
What is "the best way" depends on your criteria, and often highly subjective. But a few pointers may help you get started:
don't download the entire data with a single ref.once('value'. Loading that much data will take time and all your regular users will be blocked while your read is being fulfilled.
do consider using Firebase's private backups. These are coming out of a different data stream, so will not interfere with your regular users. But the downside is that you'll need to a paid app to be able to use this feature.
do consider how you can make your backup process streaming, instead of daily. Firebase is a real-time database, and typically works best when you consider the data flow to be real-time too.

Is an intermediate server for communication between Cloudant server and mobile device advisable?

I am new to servers and online databases, so please bear with me.
I have a question regarding database server communication on mobile devices as follows:
I am currently developing a game application on iOS. I have set up a non-SQL database on Cloudant and I would like to access that data on my iOS device. I have to update multiple database entries each time I complete a round, and I also need to read multiple entries on my database to refresh the leaderboard. I have tried to access multiple entries on Cloudant individually via device before, but most of them returned as timeout.
Thus, right now I have written several PHP scripts on my application server so that my device only needs to access the script once, and do multiple updates on my database or filter through the data I require from Cloudant. However, this means I need an additional server, meaning higher costs. I feel there should be a better or more elegant solution out there, and thus I would like to ask for help from everybody out here. Is it better to do all the updates directly from the device, or to enlist the help of a 3rd party?
Thanks for your time!
For security reasons alone it is necessary to use a server in front of the cloudant database. I assume you don't want every user of your app to be able to access the whole database. Also, the reasons you gave seem valid to me. It's generally a good idea to reduce the number and size of requests for a mobile application. Also, this might allow you to do some caching in the PHP server, ultimately reducing your costs.

GoogleApps Datastore Cons and Pros

I've been reading more about Google AppEngine and learned python in the past couple of weeks, including working with MongoDB. What I need the most is a scalable database solution. Before discovering Google AppEngine, the only three DB solutions I find useful are DynamoDB, MongoDb and BigCouch.
I find out how that I really like python language, and for one coming from ASP.NET development, I've decided to switch and develop my app using python. My first choice was to develop my application using python + bottle + mongoDB. The problem is that DynamoDB is very expensive, and the lack of easy to use backup/restore options made me pass Amazon's offering.
Google AppEngine datastore is much more affordable. However, I still can't find information regarding some specific question on Google's website
Here are some of the questions I need answer to:
Does Google Datastore support backup/restore within the administration console?
If I want to backup/restore 50TB of data, how much time it takes to backup/restore the data? Where it is stored? what are the costs?
How much time it takes to backup 1TB of data for example?
Does DataStore support caching in the database layer
Any cons that I should be aware of?
Those some of the question that I need to get answers to. MongoDB is an excellent product and developing web app using Mongo + Python + bottle is fun fun fun. However, I prefer a full DB hosted solution like one offered by Google. But before I do that, I need to be sure that I'm not missing anything.
Here are some of the questions I need answer to:
Does Google Datastore support backup/restore within the administration
console?
No. Yes. You can back up and restore data from within the Administration Console by enabling datastore_admin for an application (Thanks to Idan Shechter for pointing this out!) More info can be found here: https://developers.google.com/appengine/docs/adminconsole/datastoreadmin
You can also download the data through the command line. See: https://developers.google.com/appengine/docs/python/tools/uploadingdata
If I want to backup/restore 50TB of data, how much time it takes to backup/restore the data?
It depends on where you back the data up to. Backing up to the Blobstore or Google Cloud storage will probably take much less time than backing up to your local machine. Transferring 50TBs to your local machine will take a long time and depend on many factors including network speed.
Where it is stored?
If you use the Datastore Administration, you can backup to the Blobstore or to Google Cloud Storage. If you use the command line tools, it will be stored where you choose to download the data to.
what are the costs?
The Blobstore costs $0.13/GB/Month and gives you 5GB free. Google Cloud Storage is $0.12 per GB/Month up to the first TB. You can see more pricing info for Cloud Storage here:
https://developers.google.com/storage/docs/pricingandterms
Bandwidth costs are $0.12 per GB (The first GB is free). More details on pricing can be seen here:
https://cloud.google.com/pricing/
How much time it takes to backup 1TB of data for example?
Again, it depends on where you back up to and your transfer speeds.
Does DataStore support caching in the database layer Any cons that I should be aware of?
No, it does not support database layer caching.

Back up AppEngine database (Google cloud storage?)

I have an AppEngine application that currently has about 15GB of data, and it seems to me that it is impractical to use the current AppEngine bulk loader tools to back up datasets of this size. Therefore, I am starting to investigate other ways of backing up, and would be interested in hearing about practical solutions that people may have used for backing up their AppEngine Data.
As an aside, I am starting to think that the Google Cloud Storage might be a good choice. I am curious to know if anyone has experience using the Google Cloud Storage as a backup for their AppEngine data, and what their experience has been, and if there are any pointers or things that I should be aware of before going down this path.
No matter which solution I end up with, I would like a backup solution to meet the following requirements:
1) Reasonably fast to backup, and reasonably fast to restore (ie. if a serious error/data deletion/malicious attack hits my website, I don't want to have to bring it down for multiple days while restoring the database - by fast I mean hours, as opposed to days).
2) A separate location and account from my AppEngine data - ie. I don't want someone with admin access to my AppEngine data to necessarily have write/delete access to the backup data location - for example if my AppEngine account is compromised by a hacker, or if a disgruntled employee were to decide to delete all my data, I would like to have backups that are separate from the AppEngine administrator accounts.
To summarize, given that getting the data out of the cloud seems slow/painful, what I would like is a cloud-based backup solution that emulates the role that tape backups would have served in the past - if I were to have a backup tape, nobody else could modify the contents of that tape - but since I can't get a tape, can I store a secure copy of my data somewhere, that only I have access to?
Kind Regards
Alexander
There are a few options here, though none are (currently) quite what you're looking for.
With the latest release of version 1.5.5 of the SDK, we now support interfacing with Google Storage directly - you can see how, here. With this you can write data to Google Storage, but to the best of my knowledge there's no way to write a file that the app will then be unable to delete.
To actually gather the data, you could use the App Engine mapreduce API. It has built in support for writing to the App Engine blobstore; writing to Google Storage would require you to implement your own output writer, currently.
Another option, as WoLpH suggests, is to use the Datastore Admin tool to back up data to another app. With a little extra effort you could modify the remote_api stub to prohibit deletes to the target (backup) app.
One thing you should definitely do regardless is to enable two-factor authentication for your Google account; this makes it a lot harder for anyone to get control of your account, even if they discover your password.
The bulkloader is probably one of the fastest way to backup/restore your data.
The problem with the AppEngine is that you have to do everything through views. So you have the restrictions that views have... the result is that a fast backup/restore still has to use the same API's as the rest of your app. So the bulkloader (possibly with a few modifications) is definately your best option here.
Perhaps though... (haven't tried it yet), you can use the new Datastore Admin to copy the data to another app. One which only you control. That way you can copy it back from the other app when needed.

Resources