Is there a detailed analysis of Google App Engine's datastore architecture somewhere? What I am looking for is a resource that can help me:
Understand why a particular restriction exist in datastore (e.g transactions requiring same entity group)
Build a deeper understanding of storage mechanics that help me mentally visualize efficiency of a particular data model.
GAE documentation have some good individual articles, but I am looking for more detail treatment, a book perhaps.
You could start with presentation from Google I/O:
App Engine Datastore Under The Covers
The presentation on Appstats also gives some insight into how datastore (and RPC calls in general) work, as well as giving some optimization tips.
Related
For those who integrated drools on GAE, can you please give some feedback on
the memory consumption (does it work with a F1 instance)?
startup time (initialization of KnowledgeBase)
do you serialize your KnowledgeBase objects to datastore?
do they fit in 1MB blobs?
And more generally, I'd like to know if it's a good idea to use drools on gae
Drools seems to run on GAE with some modifications as described here: https://code.google.com/p/red-piranha/wiki/ModifyDroolsRunInGoogleAppEngine#Modifications_to_allow_Drools_to_run_in_Google_App_Engine
For the statefulSession persistence, JPA seams to do the job but http://blog.athico.com/2013/05/creating-your-own-drools-and-jbpm.html explains the structure if I'll ever need to adapt (optimize for reading).
Finally, KnowledgeBase is serializable and can be saved as a blob in a desperate solution.
I have a traditional RDBMS based PHP app that I need to convert over to GAE and would like to properly learn how BigTable works prior to doing this. However, I'd kinda like to do it through sample problems or examples that show the maximal way to think about and utilize a non RDBMS platform such as BigTable...
It seems that this would be the best route to take prior to just jumping in and screwing some things up in a one-to-one conversion that would likely happen by the both feet in first method.
Anyone able to recommend a good starting path that perhaps helped you or something of this nature that will properly initiate someone with App Engine and BigTable?
A good way is to see the sources codes of a good projects running in GAE like jaikuengine and rietveld.
For articles, Google IO 2009 and 2010 and GAE articles offer a good resource.
Also you can learn a Column-oriented Database in Wikipedia and see all other projects like cassandra...
I would recommend having a play with the App Engine Cookbook to see how things work. It has some really good examples and has helped me a lot when trying to understand the DataStore
http://appengine-cookbook.appspot.com/cat/?id=ahJhcHBlbmdpbmUtY29va2Jvb2tyFwsSCENhdGVnb3J5IglEYXRhc3RvcmUM
It's been noted that Google App Engine is moving its datastore implementation from BigTable to MegaStore. What's the difference between the two?
As this article explains, "Megastore is a transactional indexed record manager built by Google on top of BigTable".
What Megastore adds on top of BigTable, again according to the URL I gave (of course I cannot discuss anything that Google hasn't yet made public!), is stuff that might not be easy to see from an App Engine app's viewpoint, depending on what App Engine may have already added on its own on top of BigTable. E.g., Megastore adds entity groups for transactional behavior... but App Engine has had those for a while. Do you really care how App Engine internally implements, or will implement in the future, identical APIs...?
Megastore supports schemas... but who's to know whether they'll be made available to App Engine apps (so that puts with the wrong combination of types will raise exceptions instead of silently succeeding), which so far have always been schemaless except for whatever you, yourself, or Google's open-source app-level code, implemented at application level.
Now more details of Megastore have emerged, including James Hamilton's summary which links to the CIDR 2011 paper.
the existing answers have described the differences between bigtable and megastore pretty well. i'll just add one thing: app engine isn't moving from bigtable to megastore. it's been on megastore since the beginning. (ok, well, at least since very very early in development, years before it launched publicly.)
for example, see this sept 2009 app engine blog post about megastore replication.
Any suggestions on where to find examples, tutorials, and more thorough documentation on how to use the Google App Engine Datastore "Low Level Api" for Java?
I know this basic documentation page exists, but it just tells me whats in the Api and doesn't say much about how to actually use it:
http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/package-summary.html
Thanks!
Chris
Two nice articles I found:
Introduction to working with App Engine’s low-level datastore API and
Issuing App Engine datastore queries with the Low-Level API
I don't know if this articles fit here, but covers in a very technical way the datastore.
There are many pages at: http://code.google.com/appengine/docs/java/datastore/
For example the following pages are all low-level API:
Entities,
Queries,
Transactions,
Metadata Queries,
Statistics,
Async API
Slim3 is supposed to be a thin wrapper around the low level datastore.
Since this is open source, you could either study it for the code or as a wrapper.
I am trying to scrape some website and republish the data as a RSS feed. How hard is this to setup with Google App Engine? Disadvantages and Advantages using GAE. Any recommendations and guidelines greatly appreciated!
Google AppEngine offers much more functionality (and complexity) than you will need if truly all you will want to do is republish some structured data as RSS.
Personally, I would use something like Yahoo pipes for a task like this.
That being said... if you want/need to get your feet wet with GAE, go for it!
Working with Google App Engine is pretty straight forward. I would recommend going through the Getting Started guide. It's short and simple and touches on essential GAE topics. There are more pros and cons than I will list here.
Pros:
In general, App Engine is designed for high traffic web applications that need to scale. Furthermore, it is designed from a programmer's perspective. Much of the scalability issues (database optimization, server administration, etc) are dealt with by Google. Having said that, I find it to be a nice platform. It is still being actively developed by Google engineers, and scheduling of tasks (a feature that has been long requested) is in the current road map.
Cons:
Perhaps the biggest downside right now is again the lack of official scheduling support and the quota limits currently set for free accounts. However you can't complain much if its free. Currently it only supports Python as a programming interface (although a new language [Java I predict] is coming soon). Furthermore, Python 2.6 (and 3.0 for that matter) are not yet supported. In addition, Django 1.0 is not officially supported in App Engine (although you can package Django 1.0 with your application).
Harder than it would be in most other technologies.
GAE can sort of do scheduled batch stuff like this now, but it's really not intended for that type of thing. Pick pretty much any other language and platform for this particular task, and you'll make your life a lot easier.
I think BeautifulSoup could run on GAE, so all your scraping needs are handled :D
Also, GAE has a geturl thingy. The only problem I think you might have is not having enough time to get the data (30 secs limitation).
I am working on a same project and I've decided that it's easier to prepare the data on another server and push them to GAE.
You might also want to look into Yahoo! Query Language (YQL)