AppEngine and JDO: fetch objects in unowned relationship - google-app-engine

Unowned one-to-many and many-to-many relationships are defined using Sets or Lists of Keys. Let's say I have an object of an article containing set of keys of labels of the article. So how do I fetch the labels themselves? The objects matching the keys? How do I iterate over them? Naturally I could iterate over the keys and fetch the objects separately, but is this the only way? I tried to find the answer everywhere but I wasn't successful. The documentation describes defining unowned relationships in detail, but is silent about making queries on them...

You mean you have what Google was originally calling an "unowned relation" ? (where you have a Set<Key>). Sadly that is not a relation at all IMHO, since there is no related object involved. With v2.x of their plugin you can have a real unowned relation (the example there is for 1-1, but you can equally have Set<B>) where everything looks as it should. I'd strongly advise you to use that (the keys are stored in the parent, so a single call is made to get the Set of related objects).

Related

When is it okay to have a relationship loop in my database?

Possible duplicate of Can I avoid a relation loop in my database design?, but I'd like to get a broader answer than for that specific design.
The goal in this case is to store automated testing data as it’s generated. A portion of the relationship diagram is shown below.
A variable number of tests may be run on each build, hence the direct one-to-many relationship between Builds and Sessions.
Each build is made of several hundred parts, and each part number may be used on several hundred builds, hence the many-to-many relationship between Builds and DT_Parts, associated through LT_HeaderParts.
If an assembly error is found during testing, a part or parts may be switched out and the unit retested. Instead of duplicating hundreds of part records on each retest, I implement PartsChangeLog to document any changes made after a given session.
PartsChangeLog uses DT_Parts as a dictionary to save memory by storing integers instead of the varchar(20) part_number.
LT_HeaderParts and PartsChangeLog both have appear to have valid, non-redundant reasons for using DT_Parts, yet this setup creates a reference loop and poses the danger of creating a false many-to-many bridge from build_id to session_id that would yield incorrect relationships.
Is this an okay structure? Why or why not?
Trying to answer the actual title question "When is it okay to have a relationship loop in my database?".
One part of the answer is that it depends on the intended usage of the schema/diagram per se. Is it intended as a conceptual model, with the purpose of illustrating business concepts ? Then basically you can highlight just any relationship you like. By which I mean you can highlight anything in the form of a relationship if you think that relationship is of interest to the intended business audience. Or is it intended as a logical db schema ?
In that case it mostly depends on the precise "semantics" of the relationships. If two relationships are saying things that are semantically distinct, then you can bet your ... that both will be relevant to the business being modeled and that you should be keeping both.
The simplest example of such a loop is a bill-of-materials structure. Such structures have a single "parts" entity, with a many-to-many relationship of "containment". This "containment" relationship gets instantiated as a "containment" entity with two relationships to the "parts" entity. Each of these two relationships has different semantics (one saying "the containing part must be a known part" and the other saying "the contained part must be a known part") and so they should definitely be kept both.
What you have is two sets of parts associated with a session: build parts (session -> build ->> part) and changed parts (session ->> partschangelog -> part). As the answer to the question linked by JJ32 explains, consistency is the main concern in these situations. In this case, I suspect the set of changed parts should be a subset of the build parts, but your schema doesn't enforce this.
One way of enforcing it is via controlled redundancy. If you include build_id in PartsChangeLog as a non-prime attribute (and modifying the foreign key reference to Sessions accordingly), you can create two composite foreign key constraints referencing LT_HeaderParts (for build_id, part_added and build_id, part_removed).
This eliminates the possibility of associating inconsistent session_id and build_id via the many-to-many bridge; though if no parts were changed, there won't be such a bridge. That's understandable, our goal is not to replace the direct mapping between session_id and build_id, only to ensure consistency. The rest is up to the query developer.

Fetching entities from datastore where Entity.key.IN([keys...])

I'm trying to fetch a long list of entities, and those entities all refer to one of a few different related entities. It's explained in the comments, but basically many "items" reference to a few "Company"s. I don't want to have to make multiple queries for each key in unique_key (IE key.get()), so I thought the following would work but it's returning an empty list. Pray tell, what am I doing wrong? Or is there a better way to accomplish this relationship of many items referencing a few, while minimizing calls to the db (I'm new to AppEngine Datastore).
Notice, this is in Python, using the ndb library offered by app engine.
# "items" is a list of entities that have a property "parenty_company"
# parent_company is a string of the Company key
# I get a unique list of all Key strings and convert them to Keys
# I then query for where the Company Key is in my unique list
unique_keys = list(set([ndb.Key(Company, prop.parent_company) for prop in items]))
companies = Company.query(Company.key.IN(unique_keys)).fetch()
You definitely should use ndb.get_multi(unique_keys). It will fetch all keys asynchronously in a single batch.

Issues understanding Google App Engine key

I'm looking at the GAE example for datastoring here, and among other things this confused me a bit.
def guestbook_key(guestbook_name=DEFAULT_GUESTBOOK_NAME):
"""Constructs a Datastore key for a Guestbook entity with guestbook_name."""
return ndb.Key('Guestbook', guestbook_name)
I understand why we need the key, but why is 'Guestbook' necessary? Is it so you can query for all 'Guestbook' objects in the datastore? But if you need to search a datastore for a type of object why isn't there a query(type(Greeting)? Concidering that that is the ndb.model that you are putting in?
Additionally, if you are feeling generous, why in creating the object you are storing, do you have to set parent?
greeting = Greeting(parent=guestbook_key(guestbook_name))
First: GAE Datastore is one big distributed database used by all GAE apps concurrently. To distinguish entities GAE uses system-wide keys. A key is composed of:
Your application name (implicitly set, not visible via API)
Namespace, set via Namespace API (if not set in code, then an empty namespace is used).
Kind of entity. This is just a string and has nothing to do with types at database level. Datastore is schema-less so there are no types. However, language based APIs (Java JDO/JPA/objectify, Python NDB) map this to classes/objects.
Parent keys (afaik, serialised inside key). This is used to establish entity groups (defining scope of transactions).
A particular entity identifier: name (string) or ID (long). They are unique within namespace and kind (and parent key if defined) - see this for more info on ID uniqueness.
See Key methods (java) to see what data is actually stored within the key.
Second: It seems that GAE Python API does not allow you to query Datastore without defining classes that map to entity kind (I don't use GAE Python, so I might be wrong). Java does have a low-level API that you can use without mapping to classes.
Third: You are not required to define a parent to an entity. Defining a parent is a way to define entity groups, which are important when using transactions. See ancestor paths and
transactions.
That's what a key is: a path consisting of pairs of kind and ID. The key is what identifies what kind it is.
I don't understand your second question. You don't have to set a parent, but if you want to set one, you can only do it when creating the entity.

Database design rules to follow for a programmer

We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.
In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)
1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.

Use a ListProperty or custom tuple property in App Engine?

I'm developing an application with Google App Engine and stumbled across the following scenario, which can perhaps be described as "MVP-lite".
When modeling many-to-many relationships, the standard property to use is the ListProperty. Most likely, your list is comprised of the foreign keys of another model.
However, in most practical applications, you'll usually want at least one more detail when you get a list of keys - the object's name - so you can construct a nice hyperlink to that object. This requires looping through your list of keys and grabbing each object to use its "name" property.
Is this the best approach? Because "reads are cheap", is it okay to get each object even if I'm only using one property for now? Or should I use a special property like tipfy's JsonProperty to save a (key, name) "tuple" to avoid the extra gets?
Though datastore reads are comparatively cheaper datastore writes, they can still add significant time to request handler. Including the object's names as well as their foreign keys sounds like a good use of denormalization (e.g., use two list properties to simulate a tuple - one contains the foreign keys and the other contains the corresponding name).
If you decide against this denormalization, then I suggest you batch fetch the entities which the foreign keys refer to (rather than getting them one by one) so that you can at least minimize the number of round trips you make to the datastore.
When modeling one-to-many (or in some
cases, many-to-many) relationships,
the standard property to use is the
ListProperty.
No, when modeling one-to-many relationships, the standard property to use is a ReferenceProperty, on the 'many' side. Then, you can use a query to retrieve all matching entities.
Returning to your original question: If you need more data, denormalize. Store a list of titles alongside the list of keys.

Resources