Filer for Google Cloud Computing - filesystems

I am looking for a filer for my GCE project. It should serve as a shared storage provider for some of my VMs.
Of course it should be fail-safe, H/A, high-performance, … ;-)
I was reading through https://cloud.google.com/solutions/filers-on-compute-engine, but this document is very vague. I was hoping to find some kind of best practices or recommendation on the web - but found nothing.

A file server, or storage filer in the linked document, is a program you can install and run on Google Compute Engine as they provide you with a full linux experience.
To leverage a cloud storage solution you don't need to go to this far as you can use cloud storage which should suit your need for long term storage of data.
If the data needs to be mounted (read only preferred) as a disk on every vm you can use persistend disk.
In the docs there is a flowchart to pick the best option for your use case, you can find it on the storage options page.

Related

Google Cloud Machine Learning with Decision Tree

We have a Google App Engine application consist of several modules and we are storing our user's data in the Google Cloud DataStore.
Now we are going to implement some machine learning algorithms on this data and we are going to use DecisionTree algorithm.
We're looking to solve this by using one of the below methods:
Export the datas in the datastore to CSV file so we can use tools like Weka.
Process the data in the datastore and run google cloud's machine learning techniques. (But when I looked at the Google Cloud ML documents I couldnt find anything about running decision tree on datastore)
So does anyone know is it possible to accomplish the above methods in Google Cloud. If its can you show me a specific documentation or can you describe me the way to do it?
Based on your use case, I would say the best approach for your scenario is to use the new Beta release of Cloud ML Engine for scikit-learn. As you may already know, scikit-learn is a Machine Learning library for Python, and among its wide variety of possibilities, it includes Decision Trees. Note that this is a Beta release and therefore there may still be some rough edges, but I definitely think it should be a good option for you.
Cloud ML Engine has a tight integration with Google Cloud Storage, as it is the required storage option for input and output data, models, etc. That is why, regarding your mentioning of the storage of your data, I would say that the first option you mentioned "1. Export the data in the Datastore to CSV file so we can use tools like Weka" is the most suitable one. You will have to export your data to CSV files, upload them to Cloud Storage, and use ML Engine.
Finally, let me share with you some additional documentation pages that may be helpful to start working with ML Engine and scikit-learn:
Cloud ML Engine and scikit-learn quickstart
Using scikit-learn pipelines
scikit-learn documentation page

Is it better to store a physical Image into datastore or a link to it?

I need to store images and I have 2 options:
store the image into GAE datastore.
store the image somewhere (maybe also on Dropbox or another website) and store its link into GAE datastore.
What's the best practice when we need to store an image into DB, in the hypotesis that each image is bijectivelly linked to a specific element of the datastore?
I think it depends heavily on the use case.
I have a small company website running on appengine and the content images are all stored in the datastore and for that application it works well (they are all relatively small images).
If you have a high traffic site you may find storing them in GCS, or some other mechanism that supports a more cost effective CDN will be more appropriate.
If the images are large (more than 1MB) then the datastore isn't a practical solution.
There will be no hard and fast rule. Understand your use cases, your cost structure, how complex the solution will be to manage, and then choose the most appropriate solution.
Neither of the above. Google's cloud platform includes a service specifically for storing files, Google Cloud Storage, which is well integrated into GAE. You should use that.

GoogleApps Datastore Cons and Pros

I've been reading more about Google AppEngine and learned python in the past couple of weeks, including working with MongoDB. What I need the most is a scalable database solution. Before discovering Google AppEngine, the only three DB solutions I find useful are DynamoDB, MongoDb and BigCouch.
I find out how that I really like python language, and for one coming from ASP.NET development, I've decided to switch and develop my app using python. My first choice was to develop my application using python + bottle + mongoDB. The problem is that DynamoDB is very expensive, and the lack of easy to use backup/restore options made me pass Amazon's offering.
Google AppEngine datastore is much more affordable. However, I still can't find information regarding some specific question on Google's website
Here are some of the questions I need answer to:
Does Google Datastore support backup/restore within the administration console?
If I want to backup/restore 50TB of data, how much time it takes to backup/restore the data? Where it is stored? what are the costs?
How much time it takes to backup 1TB of data for example?
Does DataStore support caching in the database layer
Any cons that I should be aware of?
Those some of the question that I need to get answers to. MongoDB is an excellent product and developing web app using Mongo + Python + bottle is fun fun fun. However, I prefer a full DB hosted solution like one offered by Google. But before I do that, I need to be sure that I'm not missing anything.
Here are some of the questions I need answer to:
Does Google Datastore support backup/restore within the administration
console?
No. Yes. You can back up and restore data from within the Administration Console by enabling datastore_admin for an application (Thanks to Idan Shechter for pointing this out!) More info can be found here: https://developers.google.com/appengine/docs/adminconsole/datastoreadmin
You can also download the data through the command line. See: https://developers.google.com/appengine/docs/python/tools/uploadingdata
If I want to backup/restore 50TB of data, how much time it takes to backup/restore the data?
It depends on where you back the data up to. Backing up to the Blobstore or Google Cloud storage will probably take much less time than backing up to your local machine. Transferring 50TBs to your local machine will take a long time and depend on many factors including network speed.
Where it is stored?
If you use the Datastore Administration, you can backup to the Blobstore or to Google Cloud Storage. If you use the command line tools, it will be stored where you choose to download the data to.
what are the costs?
The Blobstore costs $0.13/GB/Month and gives you 5GB free. Google Cloud Storage is $0.12 per GB/Month up to the first TB. You can see more pricing info for Cloud Storage here:
https://developers.google.com/storage/docs/pricingandterms
Bandwidth costs are $0.12 per GB (The first GB is free). More details on pricing can be seen here:
https://cloud.google.com/pricing/
How much time it takes to backup 1TB of data for example?
Again, it depends on where you back up to and your transfer speeds.
Does DataStore support caching in the database layer Any cons that I should be aware of?
No, it does not support database layer caching.

Back up AppEngine database (Google cloud storage?)

I have an AppEngine application that currently has about 15GB of data, and it seems to me that it is impractical to use the current AppEngine bulk loader tools to back up datasets of this size. Therefore, I am starting to investigate other ways of backing up, and would be interested in hearing about practical solutions that people may have used for backing up their AppEngine Data.
As an aside, I am starting to think that the Google Cloud Storage might be a good choice. I am curious to know if anyone has experience using the Google Cloud Storage as a backup for their AppEngine data, and what their experience has been, and if there are any pointers or things that I should be aware of before going down this path.
No matter which solution I end up with, I would like a backup solution to meet the following requirements:
1) Reasonably fast to backup, and reasonably fast to restore (ie. if a serious error/data deletion/malicious attack hits my website, I don't want to have to bring it down for multiple days while restoring the database - by fast I mean hours, as opposed to days).
2) A separate location and account from my AppEngine data - ie. I don't want someone with admin access to my AppEngine data to necessarily have write/delete access to the backup data location - for example if my AppEngine account is compromised by a hacker, or if a disgruntled employee were to decide to delete all my data, I would like to have backups that are separate from the AppEngine administrator accounts.
To summarize, given that getting the data out of the cloud seems slow/painful, what I would like is a cloud-based backup solution that emulates the role that tape backups would have served in the past - if I were to have a backup tape, nobody else could modify the contents of that tape - but since I can't get a tape, can I store a secure copy of my data somewhere, that only I have access to?
Kind Regards
Alexander
There are a few options here, though none are (currently) quite what you're looking for.
With the latest release of version 1.5.5 of the SDK, we now support interfacing with Google Storage directly - you can see how, here. With this you can write data to Google Storage, but to the best of my knowledge there's no way to write a file that the app will then be unable to delete.
To actually gather the data, you could use the App Engine mapreduce API. It has built in support for writing to the App Engine blobstore; writing to Google Storage would require you to implement your own output writer, currently.
Another option, as WoLpH suggests, is to use the Datastore Admin tool to back up data to another app. With a little extra effort you could modify the remote_api stub to prohibit deletes to the target (backup) app.
One thing you should definitely do regardless is to enable two-factor authentication for your Google account; this makes it a lot harder for anyone to get control of your account, even if they discover your password.
The bulkloader is probably one of the fastest way to backup/restore your data.
The problem with the AppEngine is that you have to do everything through views. So you have the restrictions that views have... the result is that a fast backup/restore still has to use the same API's as the rest of your app. So the bulkloader (possibly with a few modifications) is definately your best option here.
Perhaps though... (haven't tried it yet), you can use the new Datastore Admin to copy the data to another app. One which only you control. That way you can copy it back from the other app when needed.

What's a good cloud based file storage platform to use with Silverlight?

I'm working on a Silverlight app that would allow a user to upload a few gigs of files to a hypothetical cloud based file store, then allow the user to view some data about those files later (more functionality than a file store). Ideally I'd like to use a free, per-user store such as SkyDrive but I can't seem to find an API for that service (and read elsewhere on stack overflow that programmatic access violates their TOS). Do any services fit this bill? I've heard of Amazon S3 but I understand that'll cost some money - is anything free?
EDIT: Could Mesh be an option?
What is LiveMesh Object and its connection with Silverlight 3.0
You could look at using Azure as it offers a blob and table storage cloud infrastrucutre and will happily run silverlight applications in an azure web role. Currently there is no cost but this will change once it RTW's.
More info at http://www.azure.com/
AFAIK, nothing in this world is free when you're dealing with gigabytes of storage, plus the bandwith to put them in the cloud.
Amazon S3 is quite reasonable on its pricing.

Resources