Can I use JDO to serialize entities into byte[]? - google-app-engine

In Google App Engine, I can use JDO to persist Java objects into the data store.
Can I also use JDO to turn the object into a byte[], so that I can put it into memcache or send it over HTTP?
Clarification: I want to serialize classes that I have already annotated for JDO persistence. Having to use another serialization mechanism seems to needlessly duplicate the effort, and also potentially tricky since JDO/DataNucleus uses bytecode manipulation on its classes to provide features like lazy-loading.

JDO persists objects to datastores. As part of that it may perform serialisation when a field is marked as "serialized". But that is serialised when stored in the datastore, and deserialised when retrieved from it. If you want to serialise something why not just do it yourself ... why should you need JDO for that?

It appears you can mark JDO objects as detachable, implement serializable and then when you want to cache them just call detach() and then cache as normal. I haven't tried this but from the discussion groups it seems to work.
There's also some more general discussion of enabling the JDO level 2 cache with memcache. Whether that works is sort of inconclusive but would be a nice solution. More about that: http://groups.google.com/group/google-appengine-java/browse_thread/thread/13cb942ceb97dc/3ab7518edf6a8bc6

Related

Google Cloud Datastore/Objectify: are there any drawbacks of using EmbeddedEntity over direct serialization in Java using #Serizalize?

I am trying to store a map object in Java to Google Cloud Datastore. What are the disadvantages of treating it as EmbeddedEntity compared to directly using #Serizalize on the field?
We are using an EmbeddedEntity-based approach extensively in a large application and it works just fine, I really can't name any disadvantage.
I would generally avoid using Java serialization where possible, it can become really cumbersome if you try to update the 'schema' of the objects you store there.
The embedded entities can be indexed and based on the index you can query on subproperties. If you exclude an embedded entity from indexing, then all subproperties are also excluded from indexing. To know more about embedded entities you can go through this document.
With serializable objects the blob values are not indexed and hence can not be used in query filters and sort orders. To know more about serializable objects you can go through this document.
So there is no disadvantage as such while using embedded entities and it is better to use embedded entities as it provides an additional feature over serialization of having the option of index based querying.

manual serialization / deserialization of AppEngine Datastore objects

Is it possible to manually define the logic of the serialization used for AppEngine Datastore?
I am assuming Google is using reflection to do this in a generic way. This works but proves to be quite slow. I'd be willing to write (and maintain) quite some code to speed up the serialization / deserialization of datastore objects (I have large objects and this consumes quite some percentage of the time).
The datastore uses Protocol-Buffers internally, and there is no way round, as its the only way your application can communicate with the datastore.
(The implementation can be found in the SDK/google/appengine/datastore/entity_pb.py)
If you think, (de)serialization is too slow in your case, you probably have two choices
Move to a lower DB API. There's another API next to the two well-documented ext.db and ext.ndb APIs at google.appengine.datastore. This hasn't all the fancy model-stuff and provides a simple (and hopefully fast) dictionary-like api. This will keep your datastore-layout compatible with the other two DB APIs.
Serialize the object yourself, and store it in a dummy entry consisting just of a text-field. But you'll probably need to duplicate data into your base entry, as you cannot filter/sort by data inside your self-serialized text.

Changing 'schema' with Google AppEngine and Objectify

I am exploring web development with Google AppEngine (Java). My application has a very basic data storage requirement that is well suited to AppEngine's 'map' like datastore.
The basic unit is one class that will have member variables that will be written or read from the database per transaction (this is because it interacts with an Android app).
I am considering using Objectify for interfacing.
My questions are : What happens if I later change the size (number of variables) in my base class ? I know AppEngine isn't typed but will Objectify cause any problems if some variables are available for some keys and not for others ?
This is addressed extensively in the manual:
http://code.google.com/p/objectify-appengine/wiki/IntroductionToObjectify#Migrating_Schemas
The short answer is that you can add and remove fields at will. In addition, there are facilities for more sophisticated transformations of data.
If you decide to move from the Objectify framework to the Low-level API later, you won't have a problem. App engine's datastore is typed, but not with all the Java types. I don't know whether you'd be able to get JDO or JPA to work without reading and re-writing all your data, but I think you probably would.
Objectify 4's method of storing a Map is pretty nice - storing properties as something like "fieldname-mapkey"

Accessing the GAE Datastore: Use JDO, JPA or the low-level API?

Any recommendations on how to best access the Google App Engine Datastore? Via JDO, JPA or the native API?
The obvious advantages of JDO/JPA are portability to other database engines, but apart from that, any reason not to work directly with the Datastore API?
I don't know much about JPA, but I went with JDO, and if you're new to it I can say that it has a pretty steep learning curve and a lot of extraneous stuff that doesn't apply to GAE. What you do win is owned relationships, which allow you to have classes with actual references to each other instead of just datastore IDs. There are also some useful things JDO does via annotations, such as the #Element(dependent = "true") annotation, which saves you quite a bit of work as it allows you to delete a parent object and JDO will delete all its children. In general, the GAE docs miss a lot of things that you need to know to use JDO effectively, so I would say that its crucial to read the datanucleus docs, and pay particular attention to fetch groups.
You can also find a large collection of succinct examples for JDO and JPA that address almost every conceivable scenario here.
Finally I would look at the Objectify and Twig, two apparently popular alternative frameworks, which were mentioned in a question I asked when I was also trying to make this decision.
On a side note, as for portability to other databases, I think worrying about portability on GAE is a bit misguided. As much as Google wants us to think that GAE code is portable, I think its a pipe dream. You will end up coding to target the particular mix of APIs that Google offers, a mix that you probably won't see anywhere else, and also coding around GAE's many limitations and idiosyncracies, so I would forget about portability as a factor for settling on a data-access API. In fact, if I could remake my decision on this matter, I think I would use a data-access framework that's built specifically for GAE, such as objectify.
The low-level Datastore API is not designed to be used directly, but rather to provide an API for other frameworks to interact with the datastore.
This package contains a low-level API to the datastore that is intended primarily for framework authors. Applications authors should consider using either the provided JDO or JPA interfaces to the datastore.
(source)
One such framework is Objectify, a much simpler interface for the datastore than JDO or JPA, and one that's designed with only the datastore in mind.
I guess it is a matter of taste. An ORM solution (JDO/JPA) is usually the more comfortable one. On the other hand the low-level API allows for full flexiblity, you are not contrained by any limitations of the ORM. Of course you need to write more code and you might want to write your own datastore abstraction layer. But this might become handy if you need to optimize certain things later on.
But of course you can start using JDO/JPA and if you recognize that you need more flexibility you can still refactor certain parts of your code to use the low-level functions. As tempy mentioned, internally references are saved as IDs (same for keys of course).
In general (in the SQL world) a lot of people say that by using the low-level stuff you learn more about your database and therefore get a better feeling for optimizations. There are lots of people who use ORMs but use it very inefficently because they think the ORM does all the work for them. Therefore they run into performance or maintenance issues.
In the end I think either solution is a proper choice if you are not sure. But you should really check out the docs available and read (blog) articles to learn about the best practices, whether you choose JDO/JPA or low-level.
Philip

Google Web Toolkit (GWT) + Google App Engine (GAE) + Detached Data Persistence

I would like to develop a web-app requiring data persistence using GWT and GAE. As I understand it, my only (or at least by far the most convenient) option for data persistence is GAE's Datastore, using JDO or JPA annotated objects. I would also like to be able to send my objects back and forth client-server using GWT Remote Procedure Calls (RPC), therefore my objects must be able to "detach". However, GWT RPC serialization cannot handle detached JDO/JPA objects and it doesn't appear as though it will in the near future.
My question: what is the simplest and most direct solution to this? Being able to share the same objects client/server with server-side persistence would be extremely convenient.
EDIT
I should clarify that I still wish to use GWT RPC with GAE's Datastore. I am just looking for the best solution that would allow all these technologies to work together.
Try use http://gilead.sourceforge.net/
I've recently found Objectify, which is designed to be a replacement for JDO. Not much experience with it yet but its simpler to use than JDO, seems more lightweight, and claims to get around the need for DTOs with GWT, though I haven't tried that particular feature yet.
Ray Cromwell has a temporary hack up. I've tried it, and it works.
It forces you to use Transient instead of Detachable entities, because GWT can't serialize a hidden Object[] used by DataNucleus; This means that the objects you send to the client can't be inserted back into the datastore, you must retrieve the actual datastore object, and copy all the persistent fields back into it. Ray's method uses reflection to iterate over the methods, retrieve the getBean() and setBean() methods, and apply the entity setBean() with your transient gwt object's getBean().
You should strive to use JDO, the JPA isn't much more than a wrapper class for now. To use this hack, you must have both getter and setter methods for every persistent field, using PROPER getBean and setBean syntax for every "bean" field. Well, ALMOST PROPER, as it assumes all getters will start with "get", when the default boolean field use is "is".
I've fixed this issue and posted a comment on Ray's blog, but it's awaiting approval and I'm not sure if he'll post it. Basically, I implemented a #GetterPrefix(prefix=MethodPrefix.IS) annotation in the org.datanucleus package to augment his work.
In case it doesn't get posted, and this is an issue, email x_AT_aiyx_DOT_info Re: #GetterPrefix for JDO and I'll send you the fix.
Awhile ago I wrote a post Using an ORM or plain SQL?
This came up last year in a GWT
application I was writing. Lots of
translation from EclipseLink to
presentation objects in the service
implementation. If we were using
ibatis it would've been far simpler to
create the appropriate objects with
ibatis and then pass them all the way
up and down the stack. Some purists
might argue this is Badâ„¢. Maybe so (in
theory) but I tell you what: it
would've led to simpler code, a
simpler stack and more productivity.
which basically matches your observation.
But of course that isn't an option with Google App Engine so you're pretty much stuck having a translation layer between client-side objects and your JPA entities.
JPA entities are quite rigid so they're not really appropriate for sending back and forth between the client anyway. Typically you want little bits from several entities when doing this (thus ending up with some sort of presentation-layer value object). That is your path forward.
Try this. It is a module for serializing GAE core types and send them to the GWT client.
You can consider using JSON. GWT has necessary API to parse & generate JSON string in the client side. You get a lot of JSON API for server side. I tried with google-gson, which is fine. It converts your JSON string to POJO model and viceversa. Hope this helps you providing a decent solution for your requirement
Currently, I use the DTO (DataTransferObject) pattern. Not necessarily as clean and plenty more boilerplate but GAE still requires a fair amount of boilerplate at current. ;)
I have a Domain Object mapped (usually) one-to-one with a DTO. When a client needs Domain info, a DAO(DataAccessObject) coughs up a DTO representation of the Domain object and sends that across the wire. When a DTO comes back, I hand the DAO the DTO which then updates all the appropriate Domain Objects.
Not as clean as being able to pass Domain Objects directly across the wire obviously but the limitations of GAE's JDO implementation and GWT's Serialization process means this is the cleanest way for me to handle this currently.
I believe Google's official answer for this is GWT 2.1 RequestFactory.
Given that you are using GWT and GAE, I'd suggest you stick to the official Google framework... I have a similar GWT / GAE based app and that's what I am doing.
By the way, setting up RequestFactory is a bit of pain in the ass. The current Eclipse plug-in doesn't include all the jars but I was able to find the help I needed, in Stackoverflow
I've been using Objectify as well, and I really like it. You still have to do some dancing around with pre/postLoad methods to translate e.g. Text to String and back.
since GWT ultimately compiles to JavaScript, for detached persistence it would need one of a few services available. the best known are HTML5 stores and Gears (both use SQLite!). of course, neither is widely deployed, so you'd have to convince your users to either use a modern browser or install a little-known plugin. be sure to degrade to a usable subset if the user doesn't comply
What about directly using Datastore API to load/store POJO domain objects?
It should be comparable to DTO approach, meaning e.g. that you have to manually handle all fields (if you don't use tricks like reflection-based automation) while it should give you more flexibility and full access to all Datastore features.

Resources