Google cloud architecture for new project - google-app-engine

I am working on a project that we are going to put on Google Cloud.
There will be a member requirement so logins and profiles to store. Members will make projects that will be linked to their accounts. Other members can join these projects etc. Its not overly complex but I need it to be fast and scalable from the off.
I have a few (simple) questions about the best setup to go for.
Do I have a PHP front end if PHP is only in beta? Do I just use Python for the front end? Is there a better framework than others to use?
Do I create an App Engine API for the front end to call using Python or Java or something else?
Which database do I use? Do I go down the Compute Engine/MongoDB approach or just go straight for Google datastore? (MySQL is disregarded here)
Do I use a shared memcache or get a dedicated one?
These sort of things. It appears using Google Cloud is 'fairly' straight forward but would appreciate some pointers from those in the know who have already get their hands dirty, in a virtual sense of course!
Many thanks in advance

You appear to have four many-faceted Qs -- and apparently you aren't taking them to Google Groups so let me do my best here.
Do I have a PHP front end if PHP is only in beta? Do I just use Python
for the front end? Is there a better framework than others to use?
For guaranteed solidity use Python or Java - PHP and Go aren't quite as mature yet. Many Python frameworks are fine, from the very-lightweight webapp2 that comes with App Engine, through intermediate-weight ones such as "flask", all the way to rich "django". I'm personally a "frameworks shd stay out of my way!" guy so webapp2 is my own favorite.
Do I create an App Engine API for the front end to call using Python
or Java or something else?
Python and Java are both fully supported and stable. I personally of course prefer Python, but, hey!, that's just me! Endpoints, if that's what you mean by "an App Engine API", is also equally well supported each way, with Python perhaps a tad ahead in integration with the datastore thanks to https://github.com/GoogleCloudPlatform/endpoints-proto-datastore/tree/master/endpoints_proto_datastore .
Which database do I use? Do I go down the Compute Engine/MongoDB
approach or just go straight for Google datastore? (MySQL is
disregarded here)
I think the GAE datastore (with add-ons as needed, e.g to shunt images and videos off to Cloud Storage, or structured data for search including geo functionality to the Search API) is going to serve you fine.
Do I use a shared memcache or get a dedicated one?
Start with the shared (free) variety, then once you have it all working design and run stress load-tests and check how they perform with that vs a dedicated (paid) version. Do data-based decisions -- let the numbers guide you: how much better are you getting by paying $X/month for dedicated cache? Decide accordingly!-)

Related

Google Cloud Machine Learning with Decision Tree

We have a Google App Engine application consist of several modules and we are storing our user's data in the Google Cloud DataStore.
Now we are going to implement some machine learning algorithms on this data and we are going to use DecisionTree algorithm.
We're looking to solve this by using one of the below methods:
Export the datas in the datastore to CSV file so we can use tools like Weka.
Process the data in the datastore and run google cloud's machine learning techniques. (But when I looked at the Google Cloud ML documents I couldnt find anything about running decision tree on datastore)
So does anyone know is it possible to accomplish the above methods in Google Cloud. If its can you show me a specific documentation or can you describe me the way to do it?
Based on your use case, I would say the best approach for your scenario is to use the new Beta release of Cloud ML Engine for scikit-learn. As you may already know, scikit-learn is a Machine Learning library for Python, and among its wide variety of possibilities, it includes Decision Trees. Note that this is a Beta release and therefore there may still be some rough edges, but I definitely think it should be a good option for you.
Cloud ML Engine has a tight integration with Google Cloud Storage, as it is the required storage option for input and output data, models, etc. That is why, regarding your mentioning of the storage of your data, I would say that the first option you mentioned "1. Export the data in the Datastore to CSV file so we can use tools like Weka" is the most suitable one. You will have to export your data to CSV files, upload them to Cloud Storage, and use ML Engine.
Finally, let me share with you some additional documentation pages that may be helpful to start working with ML Engine and scikit-learn:
Cloud ML Engine and scikit-learn quickstart
Using scikit-learn pipelines
scikit-learn documentation page

What shall I use: Google Datastore or Cloud SQL?

I am developing a project on my Final year at uni and this will be an Android application.
Basically, the "company" updates the database with jobs to be done around the country. Its field workers will use the app to display the jobs available in their location. Workers then select the jobs they are committing to do and send the selection back to database.
I would like to use Google App Engine for that and I am just studying it at the moment.
I came across two methods how to store the data on GAE: Datastore and Cloud SQL.
Personally, I would like to use NoSql Datastore in order to experiment and learn it.
What would you suggest me to use for my use case?
What are the pros and cons of using both mentioned methods?
If I go with Google Datastore, is this guide good for me to start with? https://developers.google.com/appengine/docs/java/datastore/
I would say both will work. If you want to discover the Google Datastore then go for it.
But I would suggest you have a look at Objectify, this library is excellent to make you the things easier with this technology.
go with Google App engine Database. its very efficient to use. yes tat document is enough to start.

Grails on google app engine

What is the current status of grails and google app engine deployment. I am new to app engine but wonder worth exploring it. Some specific qns are
the latest plugin, which has high user rating, has any restrictions? or it work seamlessly with all gorm features
is there any issue with high startup time for grails application. How is it in real world scenario? (with a typical small and large scale application)
what about other grails plugins (like, shiro, joda time, nimble etc). I guess they wont play well. So using those libraries directly is the better option
If decided to give up goole-app as a deployment option, how easy to switch to a normal environment. The JPA support ensures the compatibility with other traditional DBs?
Not sure what else are major issues.. probably, this is the foundation for a good discussion.
thanks.
I got few good response from grails mailing list, and the conclusion shares the comment by David. see the thread here
Couple of relevant responses:
From Tomas Lin:
I would suggest looking into Gaelyk if you really want to build a
project on the App Engine. It is built from the ground up with the App
Engine as the target engine, so it can bypass problems like long
loadtimes due to Spring and Hibernate. The newly introduced plugin
mechanism guarantees that your Gaelyk applications can be extended in
a way guaranteed to work on GAE.
Gaelyk has it's own native entity persistence DSL, which is a little
cleaner that the JPA/JDO abstractions on top of the App Engine.
I currently see many HardDeadlineExceeded exceptions with the App
Engine and Grails. It is just not designed to work well with Spring
right now. Hopefully this will improve with the later releases of
Groovy, Grails and the Spring / Google partnership for GAE for
business, but I wouldn't consider Grails on GAE production ready.
Even with Gaelyk, there are reports of slow performance. So imagine
the difficulties that arise with the much bigger Grails stack.
The app-engine comes with it's own implementation of a user / security
management system based on GMail accounts. If you just want to provide
an admin / non-admin implementation, this is supported in the
appengine configuration. Cannot comment on Shiro.
Be aware that one of the major restrictions of the App Engine is the
inability to write a file, so even basic file uploading in Spring
becomes problematic since the default mechanism writes to a temporary
file. I would imagine that most of the plugins would not work out of
the box without digging into their code and changing it.
I think the biggest issue here is lack of support for native JDBC. JPA
is not as well supported as plain JDBC GORM, things like named queries
would probably not work out of the box without retrofitting. If you
want to use the latest and greatest parts of Grails, it might be
worthwhile to consider other hosting solutions.
From Aaron Eischeid
1.The GAE plugin and the JPA-GORM plugins combined do not get you all GORM features seamlessly. Though you should get basics like .save(), .delete(), and maybe .list() the dynamic finders etc. are going to be out (at least for now). I could be way off here, but I think most/all Hibernate dependent features are out or replaced by something else (since it relies on SQL under the hood and GAE doesn't currently have SQL based DB...) so for example any criteria builders are a no go. It is unclear to me how much of the dot drilling you can do on objects. For example, not sure if you could do something like:
def b = new Book()
def stores = b.authors.publishers.bookstores
One place I could use some pointers is how to use JPA in the domain classes. I am sure there is good info out there, but I just haven't found it yet.
unsure
grails plugins that include domain classes or manipulate your current domain classes are bound to have issues since you have to construct your domain classes differently to play nice with JPA which is necessary because Googles Datastore isn't quite like a relational DB. On the flip side. you can use Google's built in security so you shouldn't necessarily need plugins like Acegi or Shiro.
This probably boils down to the different levels of GORM that you can use in controllers and services and the different ways you define domain classes. Some refactoring seems inevitable unless JPA plays just as nice with SQL DB's as it does with Googles Datastore. If JPA can move like that then transferring should be easy, but by using JPA-GORM you give up some stuff you would probably want if you weren't benefiting from due to being on GAE.
Eager to hear what others have to say,
Aaron

How difficult is it to migrate away from Google App Engine?

I am thinking of making an (initially) small Web Application, which would eventually have a potential to grow. All things considered Google App Engine seems like a very attractive option. Say, user base and complexity grows and for one or other reason I needed to leave GAE behind. How difficult would it be to migrate away?
1) Does GAE provide a way to export the database? What format would it be? Would it be difficult to put it under MySQL (or similar)?
2) In which areas (ex. database access, others?) would I have to use GAE API? I.e. which parts of implementation would have to be abstracted away / interfaced?
Edit: 3) Alternatively, is it even worth to abstract away GAE API?
For question #1: I don't know if GAE specifically supports exports of a database but you can always roll your own, worst case scenario. If you are in a position where you need to, you'll probably have the resources to do it, too.
For question #2: You can and should always encapsulate those kinds of outside dependencies anyway. It doesn't matter whether or not they provide interfaces. Coupling to those interfaces should be kept to an absolute minimum.
For question #3: This question is not really super-clear so I cannot answer it.
I'm speaking strictly from a java webapp point of view...
Google App Engine for python has a backup/restore utility:
http://code.google.com/appengine/articles/gae_backup_and_restore.html
There is a huge interested in porting this to the java flavor.
You can use the higher level standard database apis (JDO/JPS) to allow you to move your app away from google's database services. I suggest purchasing the data nucleus tools in order to smooth the transition from big tables to something like mysql or oracle.
The packaged services GAE provides are enumerated at
http://code.google.com/appengine/docs/java/javadoc/
The stock JRE should handle porting of the urlfetch, mail, and memcache api packages.
You'll have to find a substitute technology for the users, blobstore, xmpp, and taskqueue packages.

Web Scraping with Google App Engine

I am trying to scrape some website and republish the data as a RSS feed. How hard is this to setup with Google App Engine? Disadvantages and Advantages using GAE. Any recommendations and guidelines greatly appreciated!
Google AppEngine offers much more functionality (and complexity) than you will need if truly all you will want to do is republish some structured data as RSS.
Personally, I would use something like Yahoo pipes for a task like this.
That being said... if you want/need to get your feet wet with GAE, go for it!
Working with Google App Engine is pretty straight forward. I would recommend going through the Getting Started guide. It's short and simple and touches on essential GAE topics. There are more pros and cons than I will list here.
Pros:
In general, App Engine is designed for high traffic web applications that need to scale. Furthermore, it is designed from a programmer's perspective. Much of the scalability issues (database optimization, server administration, etc) are dealt with by Google. Having said that, I find it to be a nice platform. It is still being actively developed by Google engineers, and scheduling of tasks (a feature that has been long requested) is in the current road map.
Cons:
Perhaps the biggest downside right now is again the lack of official scheduling support and the quota limits currently set for free accounts. However you can't complain much if its free. Currently it only supports Python as a programming interface (although a new language [Java I predict] is coming soon). Furthermore, Python 2.6 (and 3.0 for that matter) are not yet supported. In addition, Django 1.0 is not officially supported in App Engine (although you can package Django 1.0 with your application).
Harder than it would be in most other technologies.
GAE can sort of do scheduled batch stuff like this now, but it's really not intended for that type of thing. Pick pretty much any other language and platform for this particular task, and you'll make your life a lot easier.
I think BeautifulSoup could run on GAE, so all your scraping needs are handled :D
Also, GAE has a geturl thingy. The only problem I think you might have is not having enough time to get the data (30 secs limitation).
I am working on a same project and I've decided that it's easier to prepare the data on another server and push them to GAE.
You might also want to look into Yahoo! Query Language (YQL)

Resources