Google Cloud Machine Learning with Decision Tree - google-app-engine

We have a Google App Engine application consist of several modules and we are storing our user's data in the Google Cloud DataStore.
Now we are going to implement some machine learning algorithms on this data and we are going to use DecisionTree algorithm.
We're looking to solve this by using one of the below methods:
Export the datas in the datastore to CSV file so we can use tools like Weka.
Process the data in the datastore and run google cloud's machine learning techniques. (But when I looked at the Google Cloud ML documents I couldnt find anything about running decision tree on datastore)
So does anyone know is it possible to accomplish the above methods in Google Cloud. If its can you show me a specific documentation or can you describe me the way to do it?

Based on your use case, I would say the best approach for your scenario is to use the new Beta release of Cloud ML Engine for scikit-learn. As you may already know, scikit-learn is a Machine Learning library for Python, and among its wide variety of possibilities, it includes Decision Trees. Note that this is a Beta release and therefore there may still be some rough edges, but I definitely think it should be a good option for you.
Cloud ML Engine has a tight integration with Google Cloud Storage, as it is the required storage option for input and output data, models, etc. That is why, regarding your mentioning of the storage of your data, I would say that the first option you mentioned "1. Export the data in the Datastore to CSV file so we can use tools like Weka" is the most suitable one. You will have to export your data to CSV files, upload them to Cloud Storage, and use ML Engine.
Finally, let me share with you some additional documentation pages that may be helpful to start working with ML Engine and scikit-learn:
Cloud ML Engine and scikit-learn quickstart
Using scikit-learn pipelines
scikit-learn documentation page

Related

Filer for Google Cloud Computing

I am looking for a filer for my GCE project. It should serve as a shared storage provider for some of my VMs.
Of course it should be fail-safe, H/A, high-performance, … ;-)
I was reading through https://cloud.google.com/solutions/filers-on-compute-engine, but this document is very vague. I was hoping to find some kind of best practices or recommendation on the web - but found nothing.
A file server, or storage filer in the linked document, is a program you can install and run on Google Compute Engine as they provide you with a full linux experience.
To leverage a cloud storage solution you don't need to go to this far as you can use cloud storage which should suit your need for long term storage of data.
If the data needs to be mounted (read only preferred) as a disk on every vm you can use persistend disk.
In the docs there is a flowchart to pick the best option for your use case, you can find it on the storage options page.

How to choose between Google Cloud Functions and Google App Engine?

Google Cloud Functions seems very interesting as it is serverless and zero maintenance solution. But, when is it appropriate to use Google Cloud Functions over Google App Engine?
Update:
As of June 12, 2018, Node.js 8.x is supported in Google App
Engine Standard environment along with the Flexible Environment.
Short answer: It depends upon your need.
Long answer: Here's the checklist
Runtime
Cloud Functions supports only Node.js at the moment and there aren't any plans, as far as I know, to introduce new runtimes there. If you're good with that, you can put Cloud Functions in your options.
App Engine does support Node.js, although it's only available in the Flexible environment. App Engine Standard Environment supports Python 2.7, Java 8, Java 7, PHP 5.5, Go 1.8 and 1.6, while App Engine Flexible Environment supports Python, Java, Node.js, Go, Ruby, PHP, or .NET. You can also provide your own runtime using a dockerfile in Flexible environment. So if you want to develop your application in anything other than Node.js, App Engine is the better option there.
Serverless Architecture
Are you looking for a serverless architecture? Are you frustrated with managing instances and having them scale up or down? Do you want to spend no time to manage your server? Go for Cloud functions if you answer yes to all of these questions.
Are you looking for fine grain control on no. of instances and billing of those. Do you want to have separate versions and want better control of those. Look for App Engine in this case.
Microservice
Can you break your code into smaller independant functions? Go for Cloud Functions.
App Engine do support Microservice architecture using same code base, but different yaml files to split the services, but it's upto you if you want to break them into services or not. We are running all our code into one monolithic application for last few years and it's still working good on App Engine.
Database
Is your app data stored in Firebase? Then Cloud functions can be used easily there. If not, App Engine is the better alternative. App Engine can connect to Firebase too, in case you're wondering.
There're other things to consider too, such as pricing and if you're looking to migrate existing application or if you're writing things from scratch. You can in fact, use both of the options. We are using App Engine (Python) Standard Environment for our application, but we have recently migrated few of our long running tasks on Cloud functions and they are working amazingly.
In my opinion App Engine is the answer to most of the things, where as Cloud Functions are made for specific requirements.
When what you desire is to execute a function (some logic of some sort) in response to an event originated in the cloud and you don't want to build (and be billed for) a full web application for just that.
From Product Overview:
Cloud computing has made possible fully serverless models of computing
where logic can be spun up on-demand in response to events originating
from anywhere. Construct applications from bite-sized business logic
billed to the nearest 100 milliseconds, only while your code is
running. Serve users from zero to planet-scale, all without managing
any infrastructure.
From What are Google Cloud Functions?
Google Cloud Functions is a serverless execution environment for
building and connecting cloud services. With Cloud Functions you write
simple, single-purpose functions that are attached to events emitted
from your cloud infrastructure and services. Your Cloud Function is
triggered when an event being watched is fired. Your code executes in
a fully managed environment. There is no need to provision any
infrastructure or worry about managing any servers.
If you already have a GAE app related to the piece of logic you want to implement it's probably simpler to just do it inside the app :)

Google cloud architecture for new project

I am working on a project that we are going to put on Google Cloud.
There will be a member requirement so logins and profiles to store. Members will make projects that will be linked to their accounts. Other members can join these projects etc. Its not overly complex but I need it to be fast and scalable from the off.
I have a few (simple) questions about the best setup to go for.
Do I have a PHP front end if PHP is only in beta? Do I just use Python for the front end? Is there a better framework than others to use?
Do I create an App Engine API for the front end to call using Python or Java or something else?
Which database do I use? Do I go down the Compute Engine/MongoDB approach or just go straight for Google datastore? (MySQL is disregarded here)
Do I use a shared memcache or get a dedicated one?
These sort of things. It appears using Google Cloud is 'fairly' straight forward but would appreciate some pointers from those in the know who have already get their hands dirty, in a virtual sense of course!
Many thanks in advance
You appear to have four many-faceted Qs -- and apparently you aren't taking them to Google Groups so let me do my best here.
Do I have a PHP front end if PHP is only in beta? Do I just use Python
for the front end? Is there a better framework than others to use?
For guaranteed solidity use Python or Java - PHP and Go aren't quite as mature yet. Many Python frameworks are fine, from the very-lightweight webapp2 that comes with App Engine, through intermediate-weight ones such as "flask", all the way to rich "django". I'm personally a "frameworks shd stay out of my way!" guy so webapp2 is my own favorite.
Do I create an App Engine API for the front end to call using Python
or Java or something else?
Python and Java are both fully supported and stable. I personally of course prefer Python, but, hey!, that's just me! Endpoints, if that's what you mean by "an App Engine API", is also equally well supported each way, with Python perhaps a tad ahead in integration with the datastore thanks to https://github.com/GoogleCloudPlatform/endpoints-proto-datastore/tree/master/endpoints_proto_datastore .
Which database do I use? Do I go down the Compute Engine/MongoDB
approach or just go straight for Google datastore? (MySQL is
disregarded here)
I think the GAE datastore (with add-ons as needed, e.g to shunt images and videos off to Cloud Storage, or structured data for search including geo functionality to the Search API) is going to serve you fine.
Do I use a shared memcache or get a dedicated one?
Start with the shared (free) variety, then once you have it all working design and run stress load-tests and check how they perform with that vs a dedicated (paid) version. Do data-based decisions -- let the numbers guide you: how much better are you getting by paying $X/month for dedicated cache? Decide accordingly!-)

Geospatial Database Cloud Server

Are there any cloud hosting solutions for geospatial data? I am currently writing a directory style app where businesses can sign up and then users can find nearby ones.
I am considering Google App Engine for this, but from what I can tell the GeoModel code is quite expensive (up to tens of thousands of dollars a year) to run since Google updated the pricing of App Engine. It doesn't seem like App Engine's database is really suited to this kind of query (though the SQL solution may be an answer).
I was hoping to find a service where I could send off a HTTP request to add data (a business' id, name and icon url) to a database, and then another one to find a list of businesses that are nearby to a given point. A service is preferable as this is work done for a client and we would like the solution to be managed with as little interaction from us or the client needed as possible.
EDIT:
I just found cartodb.com which uses PostgreSQL and is reasonably priced. Are the any other alternatives?
The App Engine Search API (currently in Experimental) supports GeoPoints and geosearch, and is great for exactly the kind of query that you describe.
See the Google Developers Academy (GDA) App Engine Search API classes for a bit more info and an example as well.
http://www.iriscouch.com/ is a cloud-based host for CouchDB and they support the geocouch extensions for CouchDB to store geoJSON data and perform spatial queries.
We have decided to go with cartodb.com because it looks like they have a good price to ease of use ratio.
You mentioned going with CartoDB, which is a good choice with a nice UI.
Just adding, if you were just looking for a scalable backend, you could use StormDB. It is a cloud hosted SQL database with geospatial extensions. You data is automatically distributed amongst multiple nodes for write, read, and parallel query scalability.

Google App Engine and key-value stores

I am looking at various options for developing a web app in the cloud and have been looking at GAE using Python. It has everything I need to develop the application. But I can not find a key-value store for it. In particular a key-value store that can scale. I have been looking at Redis but it is used for rails....
I have 2 questions regarding this.
1) Is a key-value store really needed for a high performance web app running on GAE?
2) Are there any well supported key-value stores for GAE?
All help will be greatly appreciated!
Google App Engine comes with a high performance, scalable Datastore implementation.
It also has support for a Memcached (key-value) implementation. It is up to your requirements for scale and functionality of the application which of the two options you chose.

Resources