I am developing a RESTfull webservice with GAE. My technology stack is focused around Jersey, Spring, and Objectify.
If you don't know Objectify is ...
“Objectify is a Java data access API specifically designed for the Google App Engine datastore. It occupies a "middle ground"; easier to use and more transparent than JDO or JPA, but significantly more convenient than the Low-Level API. Objectify is designed to make novices immediately productive yet also expose the full power of the GAE datastore.”
https://code.google.com/p/objectify-appengine/
As of now I have used Objectify Keys to store relationships in my models. Like this ...
public class MyModel {
#Id private Long id;
private Key<MyOtherModel>> myOtherModel;
...
Objectify keys provide additional power as compared to Long IDs, but they can be created from a Long ID and a MyOtherModel.class with a static method Key.create(...),
Key.create(MyOtherModel.class, id)
so I don't exactly have to store relationships as Objectify keys at the model level, I just thought it be more consistent.
The problem is I need to write a lot of additional code to create XML adapters to convert the Objectify keys to Long IDs when I serialize my model objects to JSON, and deserialize them from JSON to a Java object.
I was thinking about using Long IDs instead and creating an Objectify Key in the DAO, when I need it. Also, this would remove any Objectify specific code from anything that wasn't a DAO.
I would like some perspective from a more experienced programmer. I have never created a software of this size, several thousand lines of code that is.
Thanks a lot everyone.
I am an in-experienced datastore/objectify developer too, so I'm just musing here.
I see your point, that replacing the Key<> type in MyModel with a Long id would simplify things for you. I would note though, that the Key<> object can contain a path (as well as a kind and an id). So, if your data model becomes more complex and MyOtherModel is no longer a root kind then your ability to generate a Key<> from a Long id breaks down.
If you know that won't happen, or don't mind changing MyModel later then I guess that isn't a problem.
For your serializing format I would suggest you use a String to hold your key or id. Your Long id can be converted to a string, and would have to be anyway for JSON (so there is no loss in efficiency), but that same string could later be used to hold the full Key too.
You can also store them as long (or Long or String) and have a method of getMyOtherModelKey() and that can return a key after calling the static method. You can also have getMyOtherModelID() to just return the ID. This really works both ways since you can have both methods if you store a key or just the ID.
The trick comes in if you use parents in any of your models. If you do the ID alone is not enough to get the other model, you need the ID and the IDs of all the parents (and grand parents if needed). This is where Keys are nice.
Related
Im at the beginning of my first "real" software project, and I'd like to start off right. The concept of DDD seems like a very clean approach which separates the various software parts, however im having trouble implementing this in reality.
My Software is measurement tracker and essentially stores list of measurement data, consisting of a timestamp and the data value.
My Domain Models
class MeasurementDM{
string Name{get;set;}
List<MeasurementPointDM> MeasurementPoints{get;set;}
}
class MeasurementPointDM{
DateTime Time{get;set;}
double Value{get;set;}
}
My Persistence Models:
class MeasurementPM{
string Id{get;set;} //Primary key
string Name{get;set;} //Data from DomainModel to store
}
class MeasurementPointPM{
string Id{get;set;} //Primary Key
string MeasurementId{get;set;} //Key of Parent measurement
}
I now have the following issues:
1) Because I want to keep my Domain Models pure, I don't want or need the Database Keys inside those classes. This is no problem when building my Domain models from the Database, but I don't understand how to store them, as the Domain Model no longer knows the Database Id. Should I be including this in the Domain model anyway? Should I create a Dictionary mapping Domain objects to Database ids when i retreive them from the Database?
2)The measurement points essentially have the same Id problem as the measurements themselves. Additionally I'm not sure what the right way is to store the MeasurementPoints themselves. Above, each MeasurementPointPM knows to which MeasurementPM it belongs. When I query, I simply select MeasurementPoints based on their Measurement key. Is this a valid way to store such data? It seems like this will explode as more and more measurements are added. Would I be better off serializing my list of MeasurementPoints to a string, and storing the whole list as an nvarchar? This would make adding and removing datapoints more difficult, as Id always need to deserialize, reserialize the whole list
I'm having difficulty finding a good example of DDD that handles these problems, and hopefully someone out there can help me out.
My Software is measurement tracker and essentially stores list of measurement data, consisting of a timestamp and the data value.
You may want to have a careful think about whether you are describing a service or a database. If your primary use case is storing information that comes from somewhere else, then introducing a domain model into the mix may not make your life any better.
Domain models test to be interesting when new information interacts with old information. So if all you have are data structures, it's going to be hard to discover a good model (because the critical element -- how the model entities change over time -- is missing).
That said....
I don't understand how to store them, as the Domain Model no longer knows the Database Id.
This isn't your fault. The literature sucks.
The most common answer is that _people are allowing their models to be polluted with O/RM concerns. For instance, if you look at the Cargo entity from the Citerus sample application, you'll find these lines hidden at the bottom:
Cargo() {
// Needed by Hibernate
}
// Auto-generated surrogate key
private Long id;
This is an indirect consequence of the fact that the "repository" pattern provides the illusion of an in-memory collection of objects that maintain their own state, when the reality under the covers is that you are copying values between memory and durable storage.
Which is to say, if you want a clean domain model, then you are going to need a separate in memory representation for your stored data, and functions to translate back and forth between the two.
Put another way, what you are running into is a violation of the Single Responsibility Principle -- if you are using the same types to model your domain that you use to manage your persistence, the result is going to be a mix of the two concerns.
So essentially you would say that some minimal pollution of the domain model, for example an Id, is standard practice.
Less strong; I would say that it is a common practice. Fundamentally, a lot of people, particularly in the early stages of a project, don't value having a boundary between their domain model and their persistence plumbing.
Could it make sense to have every Domain Model inherit from a base class or implement an interface that forces the creation of Unique Id?
It could. There are a lot of examples on the web where domain entities extend some generic Entity or Aggregate pattern.
The really interesting questions are
What are the immediate costs and benefits of doing that?
What are the deferred costs and benefits of doing that?
In particular, does that make things easier or harder to change?
This question refers to database design using app engine and objectify. I want to discuss pros and cons of the approach of placing all (or let's say multiple) entities into a single "table".
Let's say I have a (very simplified) data model of two entities:
class User {
#Index Long userId;
String name;
}
class Message {
#Index Long messageId;
String message;
private Ref<User> recipient;
}
At first glance, it makes no sense to put these into the same "table" as they are completely different.
But let's look at what happens when I want to search across all entities. Let's say I want to find and return users and messages, which satisfy some search criteria. In traditional database design I would either do two separate search requests, or else create a separate index "table" during writes where I repeat fields redundantly so that I can later retrieve items in a single search request.
Now let's look at the following design. Assume I would use a single entity, which stores everything. The datastore would then look like this:
Type | userId | messageId | Name | Message
USER | 123456 | empty | Jeff | empty
MESSAGE | empty | 789012 | Mark | This is text.
See where I want to go? I could now search for a Name and would find all Users AND Messages in a single request. I would even be able to add an index field, something like
#Index List index;
to the "common" entity and would not need to write data twice.
Given the behavior of the datastore that it never returns a record when searching for an indexed field which is empty, and combining this with partial indexes, I could also get the User OR Message by querying fields unique to a given Type.
The cost for storing long (non-normalized) records is not higher than storing individual records, as long as many fields are empty.
I see further advantages:
I could use the same "table" for auditing as well, as every record
stored would form a "history" entry (as long as I don't allow
updates, in which case I would need to handle this manually).
I can easily add new Types without extending the db schema.
When search results are returned over REST, I can return them in a single List, and the client looks at the Type.
There might be disadvantages as well, for example with caching, but maybe not. I can't see this at this point.
Anybody there, who has tried going down this route or who can see serious drawbacks to this approach?
This is actually how the google datastore works under the covers. All of your entities (and everyone else's entities) are stored in a single BigTable that looks roughly like this:
{yourappid}/{key}/{serialized blob of your entity data}
Indexes are stored in three BigTables shared across all applications. I try to explain this in a fair amount of detail in my answer to this question: efficient searching using appengine datastore ancestor paths
So to rephrase your question, is it better to have Google maintain the Kind or to maintain it yourself in your own property?
The short answer is that having Google maintain the Kind makes it harder to query across all Kinds but makes it easier to query within one Kind. Maintaining the pseudo-kind yourself makes it easier to query across all Kinds but makes it harder to query within one Kind.
When Google maintains the Kind as per normal use, you already understand the limitation - there is no way to filter on a property across all different kinds. On the other hand, using a single Kind with your own descriminator means you must add an extra filter() clause every time you query:
ofy().load().type(Anything.class).filter("discriminator", "User").filter("name >", "j")
Sometimes these multiple-filter queries can be satisfied with zigzag merges, but some can't. And even the ones that can be satisfied with zigzag aren't as efficient. In fact, this tickles the specific degenerative case of zigzags - low-cardinality properties like the discriminator.
Your best bet is to pick and choose your shared Kinds carefully. Objectify makes this easy for you with polymorphism: https://code.google.com/p/objectify-appengine/wiki/Entities#Polymorphism
A polymorphic type hierarchy shares a single Kind (the kind of the base #Entity); Objectify manages the discriminator property for you and ensures queries like ofy().load().type(Subclass.class) are converted to the correct filter operation under the covers.
I recommend using this feature sparingly.
One SERIOUS drawback to that will be indexes:
every query you do will write a separate index to be servable, then ALL writes you do will need to write to ALL these tables (for NO reason, in a good amount of cases).
I can't think of other drawbacks at the moment, except the limit of a meg per entity (if you have a LOT of types, with a LOT of values, you might run into this as you end up having a gazillion columns)
Not mentioning how big your ONE entity model would be, and how possibly convoluted your code to "triage" your entity types could end up being
Task: implement global, cross entity group blob sharing.
I need an ancestor group with either BlobInfo or a string-representation of the BlobKey as parent of the BlobReference objects to have strong consistency. So I construct a virtual ancestor group with the blob-key as parent of the referencing DB-object ...
br = BlobReferenece(id=some_id, parent = ndb.Key("MyBlobKey",str(blob)))
br.put()
This works in SDK so far, but I am concerned that this is is way off the documeted paths of appengine.
My previous attempts failed to convert a blob-key to a db-key using ndb.Key.from_old_key(blobinfo.key()). It seems there is no legal way to get a "db/ndb" reference to the BlobInfo table (because The BlobInfo class provides a db.Model-like interface). Am I missing something here?
Seems like your question is asking whether you can create some kind of "virtual ancestor group" by specifying a parent that doesn't exist. This is legitimate, it's mentioned in the docs that the parent doesn't actually need to exist.
https://developers.google.com/appengine/docs/python/datastore/entities#Python_Ancestor_paths
Alternatively, if your list of BlobReferences will be limited, it would probably be easier and less expensive to just store a list of them inside one entity. You an make the Key of that container entity the same as the BlobKey. Then fetching that entity by key and modifying it will let you work without eventual consistency problems. It'll also be cheaper than querying and modifying indexed entities.
You sound confused by the various uses of the word "key" in different parts of the API. A blob key has nothing in common with an entity key. The good news is that str() of a BlobKey instance is a sane base64-encoded string that should be fine to use as the ID portion of a Key object. And you can go from that ID string to a BlobKey instance using the BlobKey constructor.
I have an entity with 2 properties: UserId(String) and RSSSubscriptions(String). Instances of this class will be storing in App Engine Datastore.
Where RSSSubscriptions should be a key value pair like "Site1: Feed1", "Site2: Feed2".
Since datatypes like Hashmaps are not persistable I am forced to keep this data in a String format. Currently I have stored it as a string type with JSONArray format. Say, "[{"Site1: Feed1"}, {"Site2: Feed2"}]".
My client will be an Android app. So Iam supposed to parse this string as JSON Array at client side. But I think its a bad idea to create a String with JSON format and append it with existing string, each time when user is adding new subscription. Any better Ideas?
You can use JSONProperty which is supported by ndb for that particular reason. In my opinion its a "hairy" solution to store Json as string and parse it back and forth. You have to be very careful to guarantee validity.
Correct answer depends on several factors with expected number of pairs being the most important. Important to remember that there are significant costs associated with storing the pair in an entity accessed by query. There are numerous ops costs for doing a query, and there will be significant cpu time. Compare this to using a single record keyed by user id, and storing the JSON inside a TextProperty. That is one small op cost and cpu times which will likely be 10x less than a query.
Please consider these factors when deciding to go with the technically cleaner approach of querying entities. Myself, I would always use a serialized string inside a TextProperty for anything in the "thousands of pairs" volume unless there was a very high rate of deletions (and even this it likely the string approach could be better). Using a query is generally the last design choice for GAE given its high resource costs.
I really like CodeIgniter's Active Record and how nicely it allows all my needed database queries.
But I've also been reading about ORMs like Doctrine. When I read Doctrine's documentation, it does not seem as clear to use as Active Record, and I can't see what makes it better (if it is).
What does Doctrine allow that is not possible with Active Record? Does Doctrine make the same job faster, easier, better? Or does it do things Active Record cannot do?
Best would be if people could post examples of tasks showing what we're talking about.
Thanks,
Matthew
First of all, what Doctrine are you talking about, 1 or 2 ?
There is a huge difference. The only thing that the two have in common is that they are both full-fledged ORM-s. Otherwise there really isn't any connection between the two.
Doctrine 1 is based on ActiveRecords, Doctrine 2 is based on Data mapper pattern.
Both can do same things, but there are some significant differences between the two.
Generally speaking Data mapper is less "developer-friendly" but should have better performance. Why? Actually it is pretty simple. With active records each entity knows everything "around" itself, relation with other entities etc. With data mapper, entities are dumb and lightweight, there is a central entity (EntityManager/UnitOfWork in Doctrine2) which handles all the relation mapping. So in terms of memory usage and performance Data mapper should be faster.
The Doctrine guys say that Doctrine2 is a least 50% faster that Doctrine1 (there are other differences too, not just the design pattern).
If you feel up for it, you can even implement ActiveRecords over Doctrine2 data mapper. Look at this blog post. I'm using this approach just for the development phase, to keep as little code as possible. Once it gets into production I will kill the additional ActiveRecords layer, and rollback to the default data mapper of Doctrine2.
So the conclusion is that you can do everything with both, but in the same way you could say that you can do everything with raw SQL. If you are a beginner in the ORM world, I would suggest going with ActiveRecords, because it is simple and (usually) requires less code. On the other hand, if you are building a large, complex model, I think data mapper is the better option.
Maybe I got something wrong, but this is how I understood it.
As for the comparison between CodeIgniters ActiveRecords and Doctrine (1 or 2), I can't really say, because I never used CodeIgniter. One thing I am sure of, Doctrine has a lot more features than CodeIgniters default ORM. For example: result hydration, inheritance (single table, class table), prefetching, lazy loading, extra lazy loading, extensions, behaviours, optimization, proxies, datetime handling... It is a massive and full-fledged ORM with a lot of features, while my experience with any "default framework ORM" is that their main goal is to be simple as possible, so a newbie can get a hang of it very easily. Doctrine is a mighty beast, and for sure can do a lot of things in a more efficient and/or logically more correct way than the built in CodeIgniter ORM. The downside is, that it takes more time to learn and code, and it is a huge library, with thousands of files, so just to get everything running adds some overhead compared to a lighter alternative.
Doctrine is a full-fledged ORM that implements the active record pattern. CodeIgniter's active record class is a query builder/database wrapper that is based on a "modified" version of the pattern.
Disclaimer: I have never used Doctrine. I will try my best to illustrate the differences between CodeIgniter's active record implementation and Doctrine, based on my understanding.
Using CodeIgniter's active record class, you might implement a model like this:
class User_model extends CI_Model
{
public function get_user_by_username($username)
{
// Build query using active record methods
$this->db->where('username', $username);
$this->db->where('active', 1);
// Execute query
$query = $this->db->get('users');
// Return results
return $query->result();
}
// ...
}
You are basically building the query using the active record methods. It's easy to see how each method (where(), get(), etc) maps to raw SQL. The advantage to using the active record methods as opposed to just $this->db->query() is that CodeIgniter compiles each query based on the database driver you are using. Other than that, CodeIgniter's active record implementation doesn't really do much. Any queries you need, you'll need to create. I hope I've illustrated how the active record methods are similar to a query builder.
Note that that the following sample code may be incorrect. Using Doctrine, you might have a model like this:
/** #Entity */
class User
{
/** #Column(type="integer") */
private $id;
/** #Column(length=50) */
private $username;
// ...
}
Then to use the model and the associated active record functionality, you would do something like this:
// Instantiate object
$user = new User();
// Set properties
$user->username = 'some_username';
// Save object
$user->save();
// Access properties
echo $user->id;
This is just scratching the surface in terms of what Doctrine can do. You can set default values for properties or specify relationships between tables. Notice how I didn't write any SQL or build the query. I just set the properties of the object and then saved it. Doctrine takes care of the rest.
Note that Doctrine includes its own query builder, so in a way it does what CodeIgniter's active record does, and more.
Using Doctrine is similar to CakePHP's or Ruby on Rails' implementation of the active record pattern. You could take a look there for additional insight into the pattern. CakePHP's examples might be particularly easy to digest if you're coming from a CodeIgniter background.
To answer some of your other questions, I don't think there's anything that makes Doctrine better than the CodeIgniter active record methods. It may be more advanced, but like any other library, you want to pick the best tool for the job. If you are happy with CodeIgniter's active record methods and you see no need for an advanced ORM, then skip it.