pros and cons of db and ndb in google app engine - google-app-engine

I have seen a little of this in stack overflow but I am wondering if there is any reason to use the DB entity model and what the specific pros and cons of using on or the other are.
I have read the ndb is a little faster and that it helps with caching. They have a good bit of info in the docs but don't really straight out say that ndb is better. At least I haven't found that yet.

As far as I can tell ndb is an evolution of db, kept seperate to maintain compatability.
Have a look at the cheat sheet, it details the main differences
https://docs.google.com/document/d/1AefylbadN456_Z7BZOpZEXDq8cR8LYu7QgI7bt5V0Iw/mobilebasic
But it does not mention the other features such as computed properties.
If you are starting a new project I see no reason not to use ndb and every reason to.
EDIT: Alt link for document: https://docs.google.com/document/d/1AefylbadN456_Z7BZOpZEXDq8cR8LYu7QgI7bt5V0Iw/edit#

Related

Ideal database for a minimalist blog engine

So I'm designing this blog engine and I'm trying to just keep my blog data without considering comments or membership system or any other type of multi-user data.
The blog itself is surrounded around 2 types of data, the first is the actual blog post entry which consists of: title, post body, meta data (mostly dates and statistics), so it's really simple and can be represented by simple json object. The second type of data is the blog admin configuration and personal information. Comment system and other will be implemented using disqus.
My main concern here is the ability of such engine to scale with spiked visits (I know you might argue this but lets take it for granted). So since I've started this project I'm moving well with the rest of my stack except the data layer. Now I've been having this dilemma choosing the database, I've considered MongoDB but some reviews and articles/benchmarking were suggesting slow reads after collections read certain size. Next I was looking at Redis and using its persistence features RDB and AOF, while Redis is good at both fast reading/writing I'm afraid of using it because I'm not familiar with it. And this whole search keeps going on to things like "PostgreSQL 9.4 is now faster than MongoDB for storing JSON documents" etc.
So is there any way I can settle this issue for good? considering that I only need to represent my data in key,value structure and only require fast reading but not writing and the ability to be fault tolerant.
Thank you
If I were you I would start small and not try to optimize for big data just yet. A lot of blogs you read about the downsides of a NoSQL solution are around large data sets - or people that are trying to do relational things with a database designed for de-normalized data.
My list of databases to consider:
Mongo. It has huge community support and based on recent funding - it's going to be around for a while. It runs very well on a single instance and a basic replica set. It's easy to set up and free, so it's worth spending a day or two running your own tests to settle the issue once and for all. Don't trust a blog.
Couchbase. Supports key/value storage and also has persistence to disk. http://www.couchbase.com/couchbase-server/features Also has had some recent funding so hopefully that means stability. =)
CouchDB/PouchDB. You can use PouchDB purely on the client side and it can connect to a server side CouchDB. CouchDB might not have the same momentum as Mongo or Couchbase, but it's an actively supported product and does key/value with persistence to disk.
Riak. http://basho.com/riak/. Another NoSQL that scales and is a key/value store.
You can install and run a proof-of-concept on all of the above products in a few hours. I would recommend this for the following reasons:
A given database might scale and hit your points, but be unpleasant to use. Consider picking a database that feels fun! Sort of akin to picking Ruby/Python over Java because the syntax is nicer.
Your use case and domain will be fairly unique. Worth testing various products to see what fits best.
Each database has quirks and you won't find those until you actually try one. One might have quirks that are passable, one will have quirks that are a show stopper.
The benefit of trying all of them is that they all support schemaless data, so if you write JSON, you can use all of them! No need to create objects in your code for each database.
If you abstract the database correctly in code, swapping out data stores won't be that painful. In other words, your code will be happier if you make it easy to swap out data stores.
This is only an option for really simple CMSes, but it sounds like that's what you're building.
If your blog is super-simple as you describe and your main concern is very high traffic then the best option might be to avoid a database entirely and have your CMS generate static files instead. By doing this, you eliminate all your database concerns completely.
It's not the best option if you're doing anything dynamic or complex, but in this small use case it might fit the bill.

Create new tables in Coldfusion ORM

I'm pretty new to Object Relational Mappings and, in general, to Coldfusion. I'm developing an application that will use a different table for each user, so is there a way to generate new ones every time a user registers (without using ORMReload() or restarting the whole Coldfusion service)?
Alternatively, since I don't need any complex relationship for my tables, should I use old-fashioned cfquerys, or do I get better performance by using ORM to read and update my database?
The first step of the "solution" here is for you to read up on how databases work.
Your question - as observed by the comments attached to it - demonstrates that you've got a fundamental gap in your understanding of how to approach this problem; and the gap is sufficiently broad as to not be the sort of thing a Q&A website like this is suited for.
I don't mean this to sound blunt (or unhelpful!), sorry.

Why Objectify instead of JDO?

I am approaching to Gwt + Gae world.
My essential need is to send over the Gwt-Rpc wire my Entity classes, without duplicating them into DTOs.
Objectify promise to do that pretty well.
It claims it will hide all the "Jdo complexity".
I never worked with Jpa or Jdo technologies.
Where's all the complexity?
I mean, can you provide me some simple examples about complex tasks in JDO, made trivial by Objectify?
Maybe relationships?
I think JDO/JPA are easy to play with on "Hello World" level. But it changes as soon as you need something more real such as composite keys, multiply relationships between entities and etc. JDO GAE implementation is quite complex and hard to grasp for beginners, partly due to unsupported features, workarounds and extensions. JDO is designed to work "everywhere", which means it is highly abstracted and very general in its nature. Great for portability, but it also means that it might not be a perfect match for a specific engine like GAE with its quite unique datastore. The Datanucleus/JDO/JPA jars are quite big (~2.3 mb in total), while Objectify's jar is pretty small. JDO/JPA might perform class path scanning at startup to find and register your entities, which could add to the load time. The time spent would be proportional to the number of classes in your project.
As per example I think in terms of amount of code JDO/JPA sample will appear simpler than lots of DAO classes for Objectify, but in general, maintenance of Objectify code will be easier for an engineer because you don't need to walk through minefield thinking what you can break in JDO :)
One example of JDO's complexity is seeing how many different states an entity can be in. As an example of how this can be overwhelming at first, scroll to the bottom of this page and look at that state diagram. Objectify does not need such a state diagram.
Another tricky part of JDO is all the 'magic' that happens behind the scenes, which sometimes made debugging difficult. Of course this is not actually magic, just bytecode rewriting, but that along is tricky enough.
Finally, JDO is a generic API. It is designed to work with object stores, SQL databases, and who knows what else. The connection between a certain JDO concept and what will actually be happening in the datastore is sometimes difficult to see. Objectify's API is closely aligned with the datastore, making it easier to understand what is going on.

Accessing the GAE Datastore: Use JDO, JPA or the low-level API?

Any recommendations on how to best access the Google App Engine Datastore? Via JDO, JPA or the native API?
The obvious advantages of JDO/JPA are portability to other database engines, but apart from that, any reason not to work directly with the Datastore API?
I don't know much about JPA, but I went with JDO, and if you're new to it I can say that it has a pretty steep learning curve and a lot of extraneous stuff that doesn't apply to GAE. What you do win is owned relationships, which allow you to have classes with actual references to each other instead of just datastore IDs. There are also some useful things JDO does via annotations, such as the #Element(dependent = "true") annotation, which saves you quite a bit of work as it allows you to delete a parent object and JDO will delete all its children. In general, the GAE docs miss a lot of things that you need to know to use JDO effectively, so I would say that its crucial to read the datanucleus docs, and pay particular attention to fetch groups.
You can also find a large collection of succinct examples for JDO and JPA that address almost every conceivable scenario here.
Finally I would look at the Objectify and Twig, two apparently popular alternative frameworks, which were mentioned in a question I asked when I was also trying to make this decision.
On a side note, as for portability to other databases, I think worrying about portability on GAE is a bit misguided. As much as Google wants us to think that GAE code is portable, I think its a pipe dream. You will end up coding to target the particular mix of APIs that Google offers, a mix that you probably won't see anywhere else, and also coding around GAE's many limitations and idiosyncracies, so I would forget about portability as a factor for settling on a data-access API. In fact, if I could remake my decision on this matter, I think I would use a data-access framework that's built specifically for GAE, such as objectify.
The low-level Datastore API is not designed to be used directly, but rather to provide an API for other frameworks to interact with the datastore.
This package contains a low-level API to the datastore that is intended primarily for framework authors. Applications authors should consider using either the provided JDO or JPA interfaces to the datastore.
(source)
One such framework is Objectify, a much simpler interface for the datastore than JDO or JPA, and one that's designed with only the datastore in mind.
I guess it is a matter of taste. An ORM solution (JDO/JPA) is usually the more comfortable one. On the other hand the low-level API allows for full flexiblity, you are not contrained by any limitations of the ORM. Of course you need to write more code and you might want to write your own datastore abstraction layer. But this might become handy if you need to optimize certain things later on.
But of course you can start using JDO/JPA and if you recognize that you need more flexibility you can still refactor certain parts of your code to use the low-level functions. As tempy mentioned, internally references are saved as IDs (same for keys of course).
In general (in the SQL world) a lot of people say that by using the low-level stuff you learn more about your database and therefore get a better feeling for optimizations. There are lots of people who use ORMs but use it very inefficently because they think the ORM does all the work for them. Therefore they run into performance or maintenance issues.
In the end I think either solution is a proper choice if you are not sure. But you should really check out the docs available and read (blog) articles to learn about the best practices, whether you choose JDO/JPA or low-level.
Philip

Google App Engine - Using managed relations

I am experimenting with App engine. One of my stumbling blocks has been the support for managed relations or lack there off, this is further compounded by the lack of join support.
Without getting into details of the issues I have run into (which I will post under different topic), I would like to ask two things.
1. Has any one of you used managed relations in something substantial. If so if you can share some best practices that will help.
2. Is there any good comprehensive example(s) that you have come across which you can point me at.
Thanks in advance.
I think this answer might disappoint you, but before you develop on the app engine you should read it anyway, and confirm this in the docs.
No. No one on the app engine has used managed relations for anything 'substantial', simply because Bigtable is not built for managed relations. It is a sharded and sorted array, and as such is a very different kind of data structure than what you would normally use.
Now there are attempts to set up managed relationships - the GAE/Java team is pushing JDO features that come close to this, and there's more info on this blog, but this simply isn't the natural state of things on the app engine, and you'll very quickly run into problems if you decide to spend too much time wrapping yourself in a leaky abstraction.
Its a lot easier to actually look at what bigtable really is - there are a ton of videos on the google i/o pages for 2010 and 2009 that do a fantastic job of explaining that, and then figure out ways to map your problem according to the capabilities of the datastore. It may sound unreasonable, but think about it... the GAE is a tool that can do certain things exceedingly well, and if you can figure out your problem in terms of ideas like object stores, sets, merge joins, task queues, pre-computation and caching, then you can use this tool to kick ass.

Resources