Why Objectify instead of JDO? - google-app-engine

I am approaching to Gwt + Gae world.
My essential need is to send over the Gwt-Rpc wire my Entity classes, without duplicating them into DTOs.
Objectify promise to do that pretty well.
It claims it will hide all the "Jdo complexity".
I never worked with Jpa or Jdo technologies.
Where's all the complexity?
I mean, can you provide me some simple examples about complex tasks in JDO, made trivial by Objectify?
Maybe relationships?

I think JDO/JPA are easy to play with on "Hello World" level. But it changes as soon as you need something more real such as composite keys, multiply relationships between entities and etc. JDO GAE implementation is quite complex and hard to grasp for beginners, partly due to unsupported features, workarounds and extensions. JDO is designed to work "everywhere", which means it is highly abstracted and very general in its nature. Great for portability, but it also means that it might not be a perfect match for a specific engine like GAE with its quite unique datastore. The Datanucleus/JDO/JPA jars are quite big (~2.3 mb in total), while Objectify's jar is pretty small. JDO/JPA might perform class path scanning at startup to find and register your entities, which could add to the load time. The time spent would be proportional to the number of classes in your project.
As per example I think in terms of amount of code JDO/JPA sample will appear simpler than lots of DAO classes for Objectify, but in general, maintenance of Objectify code will be easier for an engineer because you don't need to walk through minefield thinking what you can break in JDO :)

One example of JDO's complexity is seeing how many different states an entity can be in. As an example of how this can be overwhelming at first, scroll to the bottom of this page and look at that state diagram. Objectify does not need such a state diagram.
Another tricky part of JDO is all the 'magic' that happens behind the scenes, which sometimes made debugging difficult. Of course this is not actually magic, just bytecode rewriting, but that along is tricky enough.
Finally, JDO is a generic API. It is designed to work with object stores, SQL databases, and who knows what else. The connection between a certain JDO concept and what will actually be happening in the datastore is sometimes difficult to see. Objectify's API is closely aligned with the datastore, making it easier to understand what is going on.

Related

What is a good lightweight ORM for my need using Kotlin?

Scenario :
I am having an application where I am using AWS Lambdas which are written in Kotlin to query data from a relational DB residing in AWS.
--
My problem is that I want to use an ORM for firing these queries. I dont want to use hibernate as it is too heavy and takes too long to setup, and I need a solution that would take up the least time in setting up and firing from the Lambdas. I have looked upon multiple ORMSs like Exposed, Requery, Jooq, Ktorm and Squash.
Is anybody out there having experience with any of these libraries in the serverless context? What are your experiences with them and what would you suggest using in my scenario?
You can have a look at exposed, https://github.com/JetBrains/Exposed
I have been using Squash with the Hikari connection pool for some large projects and I have been very happy with it. I like that is is very extensible and my team has been able to solve any issues that come up, implementing extensions to the dialect and the simplicity of defining TableDefinition classes makes it work well for generating code. It is also very self contained with very few dependencies and light on reflection, so should be good for serverless though I have not personally used it for that.
Squash is less an ORM than an sql abstraction / translation layer that ties into entities and it doesn't try to solve all the problems that something like hibernate does. In my experience ORMs start as simple, efficient, and powerful projects and grow to heavyweight libraries that try to do too much and their complexity begins to cause issues when the developer cannot easily see what's going on in the chain from usage through to the database / storage mechanism.
One negative about squash that deserves mention is that, while it is a jetlbrains official library and created by a kotlin developer, support is limited as orangy, the creator, is quite busy and I have feature pull requests outstanding, with many more of them backed up currently. I chose it because I favored it's simplicity and extensibility among a small but advanced team of developers all capable of improving upon it.
Which ever library you choose I hope these factors assist you in making your decision at the least.

Ideal database for a minimalist blog engine

So I'm designing this blog engine and I'm trying to just keep my blog data without considering comments or membership system or any other type of multi-user data.
The blog itself is surrounded around 2 types of data, the first is the actual blog post entry which consists of: title, post body, meta data (mostly dates and statistics), so it's really simple and can be represented by simple json object. The second type of data is the blog admin configuration and personal information. Comment system and other will be implemented using disqus.
My main concern here is the ability of such engine to scale with spiked visits (I know you might argue this but lets take it for granted). So since I've started this project I'm moving well with the rest of my stack except the data layer. Now I've been having this dilemma choosing the database, I've considered MongoDB but some reviews and articles/benchmarking were suggesting slow reads after collections read certain size. Next I was looking at Redis and using its persistence features RDB and AOF, while Redis is good at both fast reading/writing I'm afraid of using it because I'm not familiar with it. And this whole search keeps going on to things like "PostgreSQL 9.4 is now faster than MongoDB for storing JSON documents" etc.
So is there any way I can settle this issue for good? considering that I only need to represent my data in key,value structure and only require fast reading but not writing and the ability to be fault tolerant.
Thank you
If I were you I would start small and not try to optimize for big data just yet. A lot of blogs you read about the downsides of a NoSQL solution are around large data sets - or people that are trying to do relational things with a database designed for de-normalized data.
My list of databases to consider:
Mongo. It has huge community support and based on recent funding - it's going to be around for a while. It runs very well on a single instance and a basic replica set. It's easy to set up and free, so it's worth spending a day or two running your own tests to settle the issue once and for all. Don't trust a blog.
Couchbase. Supports key/value storage and also has persistence to disk. http://www.couchbase.com/couchbase-server/features Also has had some recent funding so hopefully that means stability. =)
CouchDB/PouchDB. You can use PouchDB purely on the client side and it can connect to a server side CouchDB. CouchDB might not have the same momentum as Mongo or Couchbase, but it's an actively supported product and does key/value with persistence to disk.
Riak. http://basho.com/riak/. Another NoSQL that scales and is a key/value store.
You can install and run a proof-of-concept on all of the above products in a few hours. I would recommend this for the following reasons:
A given database might scale and hit your points, but be unpleasant to use. Consider picking a database that feels fun! Sort of akin to picking Ruby/Python over Java because the syntax is nicer.
Your use case and domain will be fairly unique. Worth testing various products to see what fits best.
Each database has quirks and you won't find those until you actually try one. One might have quirks that are passable, one will have quirks that are a show stopper.
The benefit of trying all of them is that they all support schemaless data, so if you write JSON, you can use all of them! No need to create objects in your code for each database.
If you abstract the database correctly in code, swapping out data stores won't be that painful. In other words, your code will be happier if you make it easy to swap out data stores.
This is only an option for really simple CMSes, but it sounds like that's what you're building.
If your blog is super-simple as you describe and your main concern is very high traffic then the best option might be to avoid a database entirely and have your CMS generate static files instead. By doing this, you eliminate all your database concerns completely.
It's not the best option if you're doing anything dynamic or complex, but in this small use case it might fit the bill.

Accessing the GAE Datastore: Use JDO, JPA or the low-level API?

Any recommendations on how to best access the Google App Engine Datastore? Via JDO, JPA or the native API?
The obvious advantages of JDO/JPA are portability to other database engines, but apart from that, any reason not to work directly with the Datastore API?
I don't know much about JPA, but I went with JDO, and if you're new to it I can say that it has a pretty steep learning curve and a lot of extraneous stuff that doesn't apply to GAE. What you do win is owned relationships, which allow you to have classes with actual references to each other instead of just datastore IDs. There are also some useful things JDO does via annotations, such as the #Element(dependent = "true") annotation, which saves you quite a bit of work as it allows you to delete a parent object and JDO will delete all its children. In general, the GAE docs miss a lot of things that you need to know to use JDO effectively, so I would say that its crucial to read the datanucleus docs, and pay particular attention to fetch groups.
You can also find a large collection of succinct examples for JDO and JPA that address almost every conceivable scenario here.
Finally I would look at the Objectify and Twig, two apparently popular alternative frameworks, which were mentioned in a question I asked when I was also trying to make this decision.
On a side note, as for portability to other databases, I think worrying about portability on GAE is a bit misguided. As much as Google wants us to think that GAE code is portable, I think its a pipe dream. You will end up coding to target the particular mix of APIs that Google offers, a mix that you probably won't see anywhere else, and also coding around GAE's many limitations and idiosyncracies, so I would forget about portability as a factor for settling on a data-access API. In fact, if I could remake my decision on this matter, I think I would use a data-access framework that's built specifically for GAE, such as objectify.
The low-level Datastore API is not designed to be used directly, but rather to provide an API for other frameworks to interact with the datastore.
This package contains a low-level API to the datastore that is intended primarily for framework authors. Applications authors should consider using either the provided JDO or JPA interfaces to the datastore.
(source)
One such framework is Objectify, a much simpler interface for the datastore than JDO or JPA, and one that's designed with only the datastore in mind.
I guess it is a matter of taste. An ORM solution (JDO/JPA) is usually the more comfortable one. On the other hand the low-level API allows for full flexiblity, you are not contrained by any limitations of the ORM. Of course you need to write more code and you might want to write your own datastore abstraction layer. But this might become handy if you need to optimize certain things later on.
But of course you can start using JDO/JPA and if you recognize that you need more flexibility you can still refactor certain parts of your code to use the low-level functions. As tempy mentioned, internally references are saved as IDs (same for keys of course).
In general (in the SQL world) a lot of people say that by using the low-level stuff you learn more about your database and therefore get a better feeling for optimizations. There are lots of people who use ORMs but use it very inefficently because they think the ORM does all the work for them. Therefore they run into performance or maintenance issues.
In the end I think either solution is a proper choice if you are not sure. But you should really check out the docs available and read (blog) articles to learn about the best practices, whether you choose JDO/JPA or low-level.
Philip

Google App Engine - Using managed relations

I am experimenting with App engine. One of my stumbling blocks has been the support for managed relations or lack there off, this is further compounded by the lack of join support.
Without getting into details of the issues I have run into (which I will post under different topic), I would like to ask two things.
1. Has any one of you used managed relations in something substantial. If so if you can share some best practices that will help.
2. Is there any good comprehensive example(s) that you have come across which you can point me at.
Thanks in advance.
I think this answer might disappoint you, but before you develop on the app engine you should read it anyway, and confirm this in the docs.
No. No one on the app engine has used managed relations for anything 'substantial', simply because Bigtable is not built for managed relations. It is a sharded and sorted array, and as such is a very different kind of data structure than what you would normally use.
Now there are attempts to set up managed relationships - the GAE/Java team is pushing JDO features that come close to this, and there's more info on this blog, but this simply isn't the natural state of things on the app engine, and you'll very quickly run into problems if you decide to spend too much time wrapping yourself in a leaky abstraction.
Its a lot easier to actually look at what bigtable really is - there are a ton of videos on the google i/o pages for 2010 and 2009 that do a fantastic job of explaining that, and then figure out ways to map your problem according to the capabilities of the datastore. It may sound unreasonable, but think about it... the GAE is a tool that can do certain things exceedingly well, and if you can figure out your problem in terms of ideas like object stores, sets, merge joins, task queues, pre-computation and caching, then you can use this tool to kick ass.

Data Dictionary or ORM?

I am in the processes of replacing the framework for a fairly complex business web application. Our application runs on a LAMP platform and the new framework will be an extension of CodeIgniter. In my research for framework design I decided to look into ORM, I have never done ORM before and I wanted to know if it would be valuable for our application. Then I stumbled on a very interesting blog post entitled "Why I Do Not Use ORM." This blog seemed to confirm many of my worries about using ORM and it also presented a solution similar to what I was already planning.
By "data dictionary" I plan to use this definition from "The Database Programmer" blog:
The term "data dictionary" is used by many, including myself, to denote a separate set of tables that describes the application tables. The Data Dictionary contains such information as column names, types, and sizes, but also descriptive information such as titles, captions, primary keys, foreign keys, and hints to the user interface about how to display the field.
So in choosing a "data dictionary" over ORM I may be exhibiting confirmation bias, regardless here are my reasons for being weary of ORM:
I have never used ORM before, I don't know much about it.
This framework needs to be built rather quickly, my boss has little time and I need to produce a working application that will allow for a smooth upgrade to a more modern framework.
My boss already thinks that I am over engineering this framework (trust me, I am no where close to that) and is paranoid about the framework preventing us from being able to do things that we need to, and creating bugs that we can't solve in the required amount of time. So far I have done a poor job of convincing him that change is good, I am not a very effective salesman and while the other developers can help me the boss still needs a lot of assurance.
Our old framework is procedural, our code is PHP, and our developers know SQL very well. ORM would be a big change.
Our database has dozens of tables, many with hundreds of thousands of entries running on a fairly old server. In the past we have been burned by code that repeated polls the database in a loop instead of doing one query to pull all of the needed data at once. Avoiding this problem with hand coded SQL is rather straight forward. Ensuring that this always happens where necessary with ORM is a huge unknown to me and appears to be risky.
Regardless, the solution of the data dictionary seems very promising to me as this blog post "Using A Data Dictionary" seems to provide a lot of useful features and some that are requirements of the new framework. Here are my reasons for preferring the data dictionary solution:
Implementing access control rules on the table rows themselves would be invaluable.
Auto-generating database changes, documentation, and schema checking would also be useful.
One requirement of the framework is a generic data history/auditing feature that can be applied to any sub-feature within our application. A data dictionary or an equivalent is essentially required to provide such a feature. The history must have detailed information about the structure and data types within the database.
Our systems hold a wide variety of data types that would more properly addressed if they treated as formal types within the application. For one, HTML fragments (of which we have many in our data, they are required) need to be encoded as entities in some cases, decoded as HTML in others, parsed for links and images in some cases, and always validated for correctness. Then there are dates, measurements, currency, and various other fields that could benefit from having a clear definition in the code of how this data should be manipulated.
The data dictionary idea that I would like to implement would be series of objects in separate PHP files, and there will be plenty of OOP, however it will be used as in a manner very similar to the data dictionary concept presented in "The Database Programmer" blog. It would be the single source definition of the complete database schema for the entire framework.
Now my question is, am I overlooking the value of ORM or is this a case where a data dictionary is the right tool for the job?
I think your question would be more interesting if you were making an initial architectural decision rather than refactoring an existing application. I don't see a single assertion in your question that suggests a problem that designing in an ORM would address; but several it would create. If two major stakeholder groups (owner and other developers are more comfortable with a more conventional design, it seems to me that an ORM would be swimming upstream.
I can imagine the (possibly undeserved) approbation that would be associated with the ORM as soon as a query is slow or transaction locking problems start emerging. Not to mention the impact on the development schedule. Why create an unquantified risk factor?
Do you have a framework which supports building applications using a "Data Dictionary"? If so, give it a try, it might solve your problems. If you haven't, then there are lots of good and working ORM frameworks out there which have large communities, which come with source (so you can fix bugs yourself even if the "vendor" refuses to help you).
If you want to get a quick glance at a nice web based ORM framework, I suggest Django or TurboGears. They are based on Python which will be a nice change after using PHP. I usually prefer TurboGears but Django seems to be more smooth at the moment. Both are easy to set up and you should be able to build a prototype in a day or two. That will give you an idea of the odds.
PS: I also don't think ORM tools are TEH SOLUT10N. I use Hibernate or SQL Alchemy when it makes sense but I often roll my own simple mappers.
I think that you have made a very good analysis for you situation. You know why you choose the Data Dictionary approach. So go for it.
Later on you might reconsider. If so, then there should be not a problem to use the Data Dictionary and a ORM for new developments in parallel. Both technologies are not mutual exclusive.
If you don't like the idea of mixing different technologies: Stick to a solid OOP design and separate concerns between domain logic and data access cleanly, then switching to an ORM shouldn't be that painful or at least possible.
Good luck!

Resources