Google App Engine: Datastore parent key confusion

Google App Engine: Datastore parent key confusion - database

I am currently learning more about the Google App Engine datastore, and I have some doubts regarding my understanding of the concept of defining a parent key. Now, here's the code for defining a parent key from the GAE documentation:
def guestbook_key(guestbook_name="default"):
"""Constructs a Datastore key for a Guestbook entity with guestbook_name."""
return ndb.Key('Guestbook', guestbook_name)
Note: this code is included in the source code of an application which accepts entries from a user and stores it in a datastore and displays them collectively on the homepage.
Now, this is what I understand from this code(please correct me if my understanding of this concept is not what it is supposed to be):
The 'guestbook_key' function defines a parent key, which we have named as 'default', for all the posts that the user submits into the datastore. So basically, all the posts that are submitted by the user are stored in an entity named 'Guestbook', and we define a key for it's parent(which is non-existent) named 'default'.
Please correct me wherever I went wrong with my understanding.

It really depends on how you use this key. Right now, it is just a name. If you put() it, you are putting a Guestbook type with the name "default".
However, if you are using it as a parent, then you may have code that looks like this:
post = Post(parent=guestbook_key())
post.comment = "This is a new post!"
post.put()
In this case, the new Post object will have the Guestbook object with the name "default" as a parent. Then, you could use an ancestor query to get all Posts for a given guestbook.
The reason you might choose to do this (rather than, for example, have a property on each post with the name of the guestbook) is that it guarantees strongly consistent results. This basically allows all requests to see a consistent view of a guestbook. If you didn't have strongly consistent results, you may see cases where a user writes a post but when they look at the guestbook it doesn't appear.
You are correct that you never actually need to create the Guestbook entity. When you define a parent key, it is actually creating a key where the parent key is a prefix of the child key.

Related

Domain driven design database validation in model layer

I'm creating a design for a Twitter application to practice DDD. My domain model looks like this:
The user and tweet are marked blue to indicate them being a aggregate root. Between the user and the tweet I want a bounded context, each will run in their respective microservice (auth and tweet).
To reference which user has created a tweet, but not run into a self-referencing loop, I have created the UserInfo object. The UserInfo object is created via events when a new user is created. It stores only the information the Tweet microservice will need of the user.
When I create a tweet I only provide the userid and relevant fields to the tweet, with that user id I want to be able to retrieve the UserInfo object, via id reference, to use it in the various child objects, such as Mentions and Poster.
The issue I run into is the persistance, at first glance I thought "Just provide the UserInfo object in the tweet constructor and it's done, all the child aggregates have access to it". But it's a bit harder on the Mention class, since the Mention will contain a dynamic username like so: "#anyuser". To validate if anyuser exists as a UserInfo object I need to query the database. However, I don't know who is mentioned before the tweet's content has been parsed, and that logic resides in the domain model itself and is called as a result of using the tweets constructor. Without this logic, no mentions are extracted so nothing can "yet" be validated.
If I cannot validate it before creating the tweet, because I need the extraction logic, and I cannot use the database repository inside the domain model layer, how can I validate the mentions properly?

Whenever an AR needs to reach out of it's own boundary to gather data there's two main solutions:
You pass in a service to the AR's method which allows it to perform the resolution. The service interface is defined in the domain, but most likely implemented in the infrastructure layer.
e.g. someAr.someMethod(args, someServiceImpl)
Note that if the data is required at construction time you may want to introduce a factory that takes a dependency on the service interface, performs the validation and returns an instance of the AR.
e.g.
tweetFactory = new TweetFactory(new SqlUserInfoLookupService(...));
tweet = tweetFactory.create(...);
You resolve the dependencies in the application layer first, then pass the required data. Note that the application layer could take a dependency onto a domain service in order to perform some reverse resolutions first.
e.g.
If the application layer would like to resolve the UserInfo for all mentions, but can't because it doesn't know how to parse mentions within the text it could always rely on a domain service or value object to perform that task first, then resolve the UserInfo dependencies and provide them to the Tweet AR. Be cautious here not to leak too much logic in the application layer though. If the orchestration logic becomes intertwined with business logic you may want to extract such use case processing logic in a domain service.
Finally, note that any data validated outside the boundary of an AR is always considered stale. The #xyz user could currently exist, but not exist anymore (e.g. deactivated) 1ms after the tweet was sent.

How to design a DataBase for well structured Java API Documentation?

I have a scenario to design a database for Java API Documentation, in which I have to present the information about every class and method in a given piece of code. For example, consider:
1. main()
2. {
3. String foo="test";
4. foo.substring(1,2);
5. }
Here, I have to show documentation for class String and method substring from Java docs (The classes/methods can be any valid class/method).
My Observations:
The classes may repeat in various packages, so they can't be unique. Same goes for methods.
The method name foo() can be :
1) The method of this class
2) Overrides the method of some parent class
3) Simply inherits the a method.
With this info, I have following tables:
1)
CREATE TABLE "JAVACLASSDESCRIPTION"
( "CLASSFULLNAME" VARCHAR2(400) NOT NULL ENABLE,
"CLASSNAME" VARCHAR2(400),
"CLASSDEFINATION" CLOB,
"CLASSDECLARATION" CLOB,
"INHERITEDCLASSES" CLOB,
CONSTRAINT "JAVACLASSDESCRIPTION_PK" PRIMARY KEY ("CLASSFULLNAME") ENABLE
) ;
INHERITEDCLASSES is a multi-valued attribute.I know it's a really poor thing, but I have reasons.
1) 1st check if the method is available in JAVAMETHODDESCRIPTION table (Either as the class method itself, or a override method ).
2) If not, it has to be a method which is inherited for some parent class. So we have to show the documentation of method of this parent class.To save multiple searching, the value INHERITEDCLASSES contains is as follows(for some random class):
java.lang.Object: clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
so that it's parent class is java.lang.Object followed by the list of methods, so that it's easy to match the method name.
2)
CREATE TABLE "JAVAMETHODDESCRIPTION"
( "CLASSFULLNAME" VARCHAR2(400) NOT NULL ENABLE,
"METHODNAME" VARCHAR2(400) NOT NULL ENABLE,
"METHODDECLARATION" VARCHAR2(400),
"METHODDEFINATION" CLOB
) ;ALTER TABLE "JAVAMETHODDESCRIPTION" ADD CONSTRAINT "JAVAMETHODDESCRIPTION_FK" FOREIGN KEY ("CLASSFULLNAME")
REFERENCES "JAVACLASSDESCRIPTION" ("CLASSFULLNAME") ON DELETE CASCADE ENABLE;
Sample output:
I know there are lot of design issues.How can I improve my database design?
EDIT:
About the multi-valued entry, if decomposed to another table may result into redundant entries.For eg. Object class is the super class for all.
Link for documentation page

hard to tell everything because a lot of things is unspecified, but:
a few things to consider: loaded java class is identified by unique full name (x.y.z.Myclass$Inner.class) and it's classloader. if you don't care about loaded class but only the source code (javadoc) then you can probably skip classloader.
javadoc can be at method, class, package and field level (each of them can be uniquely identified by signature)
if you want to support javadoc for inherited methods then you need to model multiple inheritance (javadoc can be inherited also from interfaces) and in your application traverse that tree top on request to display the javadoc. other option is to do the traversing during saving the content to db
another thing is versioning of library and jdk. don't know if you want to support different versions.

The best method for going about something like this, is to get to the 3rd normal form. You may not stay there, but at least by "seeing it" you learn a few things along the way about your system, the relationships, and such.
http://en.wikipedia.org/wiki/Third_normal_form
Once you get to that stage, then start looking at what you have, and ask yourself how it will behave with your specific situation. ie how are you accessing it .. is it going to perform poorly, would it help to denormalize things a little to help improve query performance, etc, etc.

When to maintain reference to key vs. reference to actual entity object after put operation.

When working with datastore entities in App Engine, people have noticed odd behavior after a put operation is performed on an entity if you choose to hold on to a reference of that entity.
For example, see this issue where repeated String properties mutated to _BaseValue after a put was performed.
In the ensuing discussion, in reference to a repeated String property, Guido van Rossum writes:
"I see. We should probably document that you're not supposed to hang
on to the list of items for too long; there are various forms of
undefined behavior around that."
I get the sense from this thread that it's not a good idea to maintain reference to an entity for too long after a put, as unexpected behavior might arise.
However, when I look at the GAE source code for the Model.get_or_insert() method, I see the following code (docstring removed):
#classmethod
def get_or_insert(cls, key_name, **kwds):
def txn():
entity = cls.get_by_key_name(key_name, parent=kwds.get('parent'))
if entity is None:
entity = cls(key_name=key_name, **kwds)
entity.put()
return entity
return run_in_transaction(txn)
Notice how a put operation is performed, and the entity is returned post-put. Just above, we saw an example of when this is not recommended.
Can anyone clarify when and when it is not ok to maintain a reference to an entity, post put? Guido seemed to hint that there are various scenarios when this could be a bad idea. Just curious if anyone has seen documentation on this (I cannot find any).
Thanks!

The problem described in the issue is not regarding entities, but rather lists obtained from its properties. You can hold a copy of entity as long as you like. It's just an object.
The issue is caused by some "magical" functionality provided by ndb. Let's take a look at the model definition
from google.appengine.ext.ndb import model
class MyModel(model.Model):
items = model.StringProperty(repeated=True)
What can we say about items property?
It looks like a class attribute, but metaclass logic of model.Model transforms it into an instance attribute.
What type are these instance attributes?
They can be accessed like a list of strings, but they are more complex objects having the logic required for storing and retrieving the data from datastore, validating etc.
This "magic" works well in most cases, but sometimes it doesn't. One of the problematic cases is when you get the reference to items from the instance and try to use it after put was called. Another case, mentioned by Guido, was to pass external list to initialize items property and then try to modify this property by manipulating the external list.
The thing to remember: Model properties in ndb try to behave like their basic types, but they are more complex objects. You can read more about their internals in Writing property subclasses

JDO, unowned relationship and persisting multiple objects at once

I would like to have an unowned relationship between two persistence capable classes, created at once. The relationship has to be unowned, because my app can’t really guarantee that the two instances I want to persist will stay in the same entity group for good. My relationship is a bidirectional one-to-many:
// in first class
#Persistent
private Set<Key> instancesOfSecondClass;
// in second class
#Persistent
private Key instanceOfFirstClass;
Then, in one servlet doPost() call, I need to persist one instance per these classes. I actually made a nice methods to maintain both sides of the relationship. First, the Key id (I use …
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key id;
… as primary keys everywhere) is added to the Set in first class, then a convenience method is called on the instance of second class to notify of this change and update the one Key property accordingly, comparing old values, null checks and everything. Indeed I don’t yet have any Key when the instance is fresh, so I need to persist them both first with pm.makePersistent() call, then do the assignment and wait for pm.close() to happen later.
But, what really happens, is pretty confusing. After running the servlet (whose sole purpose is to return serialized Keys of these two instances for use elsewhere), I go check the datastore viewer and see:
the instance of first class got persisted (actually have one more issue here throwing at me NullPointerException from org.datanucleus.store.mapped.mapping.PersistenceCapableMapping.postInsert(PersistenceCapableMapping.java:1039) at the very moment I call pm.makePersistent() on the instance with Set<Key>)
the instance of second class got persisted (so far so good)
the instance of second class has the Key reference to the first instance persisted, yay
the set of Keys on the first instance is… wait for it… empty
The local datastore shows just empty space, while the one online shows <null>, even though my constructor for the class (a one with arguments) creates a new instance of HashSet<T> for the relation.
Using Google App Engine for Java 1.6.4, happened in 1.6.3 too.
Spent a whole day trying to solve this, putting transactions in between, using two PersistenceManagers, different order of persisting calls, cross-group transactions enabling, nothing helped.
Generically, I would be happy to find a working way to create and update two instances in two separate entity groups with unowned relationship between them, no matter of the possible inconsistence (doesn’t worry me that much, based on the frequency of possible updates).
In the mean time, I found two more possible solutions, but haven’t tried them yet: 1. create not only two, but four PersistenceManagers (one to create first instance, second to create second instance, third to update one side of relationship, fourth to update the second side of relationship), or 2. detach the instances and make them persistent again after update. Not sure which way to go now.

v2 of the plugin provides "real" unowned relations with none of this Key hackery needed before. You can persist both sides in the same transaction if you have the multiple-Entity-Groups flag set, or you can persist them non-transactional.
Just look at the tests for ideas http://code.google.com/p/datanucleus-appengine/source/browse/trunk/tests/com/google/appengine/datanucleus/jdo/JDOUnownedOneToManyTest.java
I have no control over what goes in their docs, just what goes in their code ;-)

Google AppEngine JDO Persistence FK Arrays

I'm hoping someone's seen this. I've found no clues on Google.
I'm using Google AppEngine with JDO to persist my objects.
I have two objects, Parent and Child. Each Parent has n Child objects.
I initially stored the Child objects in an ArrayList data member in the Parent class.
I got the exception "java.lang.UnsupportedOperationException: FK Arrays not supported" when persisting the Parent object.
I put this down to my storing more than one Child key references, so changed it around so that the Child objects store key references to the Parent object instead. In this way, there is only one key reference per Child object instead of n key references per Parent object.
Yet the exception still gets thrown when persisting the Parent object. So I suspect I was mistaken about the probable cause of this exception.
Has anyone seen this exception or know what it means?

According to DataNucleus a lot of things are persisted by default... and they even had a complaint in their blog about the manual in the google app engine site, which said that you need to explicitly mark fields as #Persistent.

I figured out what was wrong.
It wasn't complaining about my ArrayList.
My Parent class had an array data member that I hadn't put an annotation on. Arrays are persisted by default in the absence of annotations.
I added the annotation #NotPersistent and this solved my problem.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight