AppEngine Datastore: Hierarchical queries - google-app-engine

If you're dealing with a hierarchy of records, where most of the keys have ancestors, will you have to create a chain of all of the keys before you can retrieve a leaf?
Example (in Go):
rootKey = datastore.NewKey(ctx, "EntityType", "", id1, nil)
secondGenKey = datastore.NewKey(ctx, "EntityType", "", id2, rootKey)
thirdGenKey = datastore.NewKey(ctx, "EntityType", "", id3, rootKey)
How do you get the record described by thirdGenKey without having to declare the keys for all of the levels of the hierarchy above it?

In order to get an individual entity, its key must be globally unique - this is enforced through each entity key being unique within its entity group. The ancestor path forms an intrinsic part of the entity key for this reason.
So, the only way to get a single entity with strong consistency is to specify its ancestor path. This can be done either with a get-by-key or with an ancestor query.
If you don't know the full ancestor path, your only option is to query on a property of the entity, but bear in mind that:
this may not be unique within your application
you will be subject to eventual consistency.

To supplement tx802's answer:
If you want to load an entity by key, you need its key. If the key is such a key that has a parent, in order to form / create the key, you also need the parent key to be created prior. The parent key is part of the key, just like the numeric ID or the string name.
Looking from the implementation perspective: datastore.Key is a struct:
type Key struct {
kind string
stringID string
intID int64
parent *Key
appID string
namespace string
}
In order to construct a Key which has a parent, you must construct the parent key too, recursively. If you're finding it too verbose to always create the key hierarchy, you can create a helper function for it.
For the sake of simplicity, let's assume all keys use the same entity name, and we only use numeric IDs. It may look like this:
func createKey(ctx context.Context, entity string, ids ...int) (k *datastore.Key) {
for _, id := range ids {
k = datastore.NewKey(ctx, entity, "", id, k)
}
return
}
With this helper function, your example is reduced to this:
k2 := createKey(ctx, "EntityType", id1, id2)
k3 := createKey(ctx, "EntityType", id1, id3)

Related

Using proper relations for product and attribute

I am about to implement a database for simple ecommerce platform. I want to implement the following:
Each product belongs to one product category;
Each product category has its own attributes;
Each product has one value for each attribute of this products type.
What relations should I use to store this kind of information?
Here is the logical model -- the way I understood it; you should be able to tweak it.
From this you can derive the physical model and the SQL code. The word KEY here means UNIQUE NOT NULL and you may use them for primary keys. Should you choose to introduce integers as primary keys, make sure you keep these UNIQUE.
Note that everything should be NOT NULL, once you get to the SQL.
Category named (CAT) exists.
Category {CAT}
KEY {CAT}
Attribute named (ATR) exists.
Attribute {ATR}
KEY {ATR}
Category (CAT) has attribute (ATR).
Each category has more than one attribute, it is possible for the same attribute to belong to more than one category.
CategoryAttribute {CAT, ATR}
KEY {CAT, ATR}
Product named (PRD) belongs to category (CAT).
Each product belongs to exactly one category, each category may have more than one product.
ProductCategory {PRD, CAT}
KEY {PRD}
KEY {PRD, CAT} -- seems redundant here, but is
-- needed for the FK from the next table
FOREIGN KEY {CAT} REFERENCES Category {CAT}
Product (PRD) from category (CAT) has attribute (ATR) that belongs to that category.
For each attribute that belongs to a category, that attribute may belong to more than one product from that category.
ProductCategoryAttribute {PRD, CAT, ATR}
KEY {PRD, CAT, ATR}
FOREIGN KEY {PRD, CAT} REFERENCES ProductCategory {PRD, CAT}
FOREIGN KEY {CAT, ATR} REFERENCES CategoryAttribute {CAT, ATR}
I don't know what database platform you are using, but for small numbers of products, and for queries that do not depend on the value of the per-category attributes, I'd use the following strategy:
CREATE TABLE "Category" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT
);
CREATE TABLE "Product" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"categoryId" INTEGER NOT NULL REFERENCES "Category" ("id"),
"attributes" TEXT NOT NULL
);
In this example, the categories are used mainly to enforce referential integrity and to provide a list of categories for navigation.
The attributes are stored inside the attributes column as JSON (most modern databases tend to support this natively).
If there are any attributes common to all types of products, we'd create specific columns in Product. For example, you could add creationDate, deletionDate, price, or whatnot.
This allows you to perform the typical Select * From Product Where id = #Id to get a specific product and Select * From Product Where categoryId = #CategoryId to get all products in a category.
A creationDate could be useful to sort the products by creation date and take the top N, if necessary, when filtering by category. However with small quantities like thousands of products you might as well get all products by category and do this in code.
Regarding the code aspect, products like Dapper have specific extensions helping you deal with these discriminated unions, but writing code to support it is fairly easy. Here's an how. I'll write pseudo-C#, but I'm sure you can adapt.
We have an abstract class taking care of the Product table rows
public abstract class ProductBase
{
// only the fields in the Product table here
public int CategoryId { get; set; }
protected string Attributes { get; set; }
// serialize extra fields to JSON in Attributes
protected abstract void Prepare();
// load the common fields from a data row
protected static ProductBase(DataRow dr)
{
CategoryId = int.Parse(dr["categoryId"]);
Attributes = dr["attributes"] as string;
}
// save to DB
public void Save()
{
Prepare();
// save to SQL
}
}
We also have specific classes per category which have the extra attributes and handle serialization and deserialization.
public class FooProduct: ProductBase
{
public string Color { get; set; }
protected override void Prepare()
{
Attributes = Json.Serialize(new { Color });
}
public FooProduct(DataRow dr): base(dr)
{
// we can only create foo products if the category is foo
if (CategoryId != 23) throw new InvalidOperationException();
var attr = Json.Deserialize(Attributes);
Color = attr.Color;
}
}
This idea works great while you don't need to get the "foo" products by Color. If you can afford to get all "foo" products and filter in code, great. If your database understands JSON and lets you query inside the Attributes field, good it will get slow with large numbers unless the server allows indexes to reference JSON-serialized values.
If all else fails, you'll need to create an index table which contains the color values and the ids of the products which have that color. This is relatively painful and you don't want to do it unless you need it (and you don't right now).

What's the KEY_RESERVED_PROPERTY equivalent for the Go API ? datastore

I need to check the existence of a key (i.e. an username). It seems that KEY_RESERVED_PROPERTY is a special key available for the java api that you can use to achieve the best performance and strong consistency so I'm wondering if there is any equivalent in Go.
Currently I'm considering using a query with the username as ancestor + KeysOnly().
If you look at the docs, KEY_RESERVED_PROPERTY is nothing but a property to refer to the key:
A reserved property name used to refer to the key of the entity. This string can be used for filtering and sorting by the entity key itself.
So this is nothing magical, you could do the same thing in Go with the __key__ property, as stated in the docs:
Key filters
To filter on the value of an entity's key, use the special property __key__:
q := datastore.NewQuery("Person").Filter("__key__ >", lastSeenKey)
I need to check the existence of a key (i.e. an username).
You can also do that by attempting to load the entity by key using the datastore.Get() function. A return value of ErrNoSuchEntity means no entity exists with the specified key:
if err := datastore.Get(c, key, dst); err == datastore.ErrNoSuchEntity {
// Key doesn't exist
}

Get entity by int Id in GAE in GO

I'm trying to get entity Key by its int Id. (not the entity itself, but it's Key) (in the long run I do it to find entities parent)
Data from DatastoreViewer:
Entity Kind
File
Entity Key
ag9kZXZ-dHJhc2hib3hhcHByIgsSBEZpbGUYgICAgICAwAoMCxIERmlsZRiAgICAgIDACww
ID
6473924464345088
Parent
ag9kZXZ-dHJhc2hib3hhcHByEQsSBEZpbGUYgICAgICAwAoM
File: id=5910974510923776
I do it like this:
k := datastore.NewKey(c, "File", "", 6473924464345088, nil)
currentDirQuery := datastore.NewQuery("File").Filter("__key__ =", k).KeysOnly()
keys, err := currentDirQuery.GetAll(c, nil)
The length if keys is 0. What do I do wrong?
If you have the key already why are doing a keys only query matching the key ? Why don't you just do a datastore.Get() with the key?
As to why your keys_only query is not working, you are not including the ancestor in the key you are constructing , the key in your example has a parent you showed ag9kZXZ-dHJhc2hib3hhcHByEQsSBEZpbGUYgICAgICAwAoM this urlsafe version of the key must have a parent if you are specifying an ancestor.
Using python we can decode this key
> ndb.Key(urlsafe="ag9kZXZ-dHJhc2hib3hhcHByIgsSBEZpbGUYgICAgICAwAoMCxIERmlsZRiAgICAgIDACww")
Key('File', 5910974510923776, 'File', 6473924464345088, app='dev~trashboxapp')
See the parent Key is Key('File', 5910974510923776)
You can not perform a partial match on a child with the key you created for the query. You can only perform ancestor queries which will return the ancestor and all of it's children irrespective of the depth of the heirarchy.
This also means a datastore.Get() will fail with the key you have created in your example code.
So construct you key so that it includes the ancestor - see the docs https://developers.google.com/appengine/docs/go/datastore/entities#Go_Retrieving_an_entity
But to be honest what you are doing is completely redundant unless it's just an excercise in understanding whats going on and your trying to roundtrip a key -> query -> key

JPA search by Key without Knowing Parent Key

Ok so I have an application that uses GAE and consequently the datastore.
Say I have multiple companies A, B and C and I have within each company Employees X,Y and Z. The relationship between a company and employee will be OneToMany, with the company being the owner. This results in the Company Key being of the form
long id = 4504699138998272; // Random Example
Key CompanyKey = KeyFactory.createKey(Company.class.getSimpleName(), id);
and the employee key would be of the form
long id2 = 5630599045840896;
Key EmployeeKey = KeyFactory.createKey(CompanyKey,Employee.class.getSimpleName(),id2);
all fine and well and there is no problem, until in the front end, during jsp representation. Sometimes I would need to generate a report, or open an Employees profile, in which case the div containing his information would get an id as follows
<div class="employeeInfo" id="<%=employee.getKey().getId()%>" > .....</div>
and this div has an onclick / submit event, that will ajax the new modifications to the employee profile to servelet, at which point I have to specify the primary key of the employee, (which I thought I could easily get from the div id), but it didnt work server side.
The problem is I know the Employees String portion of the Key and the long portion, but not the Parent key. To save time I tried this and it didnt work
Key key = KeyFactory.creatKey(Employee.class.getSimpleName(); id);
Employee X = em.find(Employee.class,key);
X is always returned null.
I would really appreciate any idea of how to find or "query" Entities by keys without knowing their parents key (as I would hate having to re-adjust Entity classes)
Thanks alot !!
An Entity key and its parents cannot be separated. It's called ancestor path, a chain composed of entity kinds and ids.
So, in your example ancestor paths will look like this:
CompanyKey: ("Company", 4504699138998272)
EmployeeKey: ("Company", 4504699138998272, "Employee", 5630599045840896)
A key composed only of ("Employee", 5630599045840896) is a completely different one comparing to the EmployeeKey even though both keys end with the same values. Think of concatenating elements into a single "string" and comparing final values, they will never match.
One thing you can do is use encoded keys instead of their id values:
String encodedKey = KeyFactory.keyToString(EmployeeKey);
Key decodedKey = KeyFactory.stringToKey(encodedKey);
decodedKey.equals(EmployeeKey); // true
More about Ancestor Paths:
https://developers.google.com/appengine/docs/java/datastore/entities#Java_Ancestor_paths
KeyFactory Java doc:
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/KeyFactory#keyToString(com.google.appengine.api.datastore.Key)

Which database can do this?

In the recent project, I need a database that does this.
each item is key value pair, each key is a multi-dimensional string, so for example
item 1:
key :['teacher','professor']
value: 'david'
item 2:
key :['staff', 'instructor', 'professor']
value: 'shawn'
so each key's length is not necessarily the same. I can do query like
anyone with both ['teacher','staff'] as keys.
Also I can add another item later easily, for example, a key-value pair like.
item 3:
key :['female', 'instructor', 'professor','programmer']
value: 'annie'
so the idea is that I can tag any array of keys to a value, and I can search by a subset of keys.
Since (judging on your comments) you don't need to enforce uniqueness, these are not actually "keys", and can be more appropriately thought of as "tags" whose primary purpose is to be searched on (not unlike StackOverflow.com tags).
The typical way of implementing tags in a relational database looks something like this:
Note the order of fields in the junction table TAG_ITEM primary key: since our goal is to find items of given tag (not tags of given item), the leading edge of the index "underneath" PK is TAG_ID. This facilitates efficient index range scan on given TAG_ID.
Cluster TAG_ITEM if your DBMS supports it.
You can then search for items with any of the given tags like this:
SELECT [DISTINCT] ITEM_ID
FROM
TAG
JOIN TAG_ITEM ON TAG.TAG_ID = TAG_ITEM.TAG_ID
WHERE
TAG_NAME = 'teacher'
OR TAG_NAME = 'professor'
And if you need any other fields from ITEM, you can:
SELECT * FROM ITEM WHERE ITEM_ID IN (<query above>)
You can search for items with all of the given tags like this:
SELECT ITEM_ID
FROM
TAG
JOIN TAG_ITEM ON TAG.TAG_ID = TAG_ITEM.TAG_ID
WHERE
TAG_NAME = 'teacher'
OR TAG_NAME = 'professor'
GROUP BY
ITEM_ID
HAVING
COUNT(*) = 2
PostgreSQL can do something similar with it's hstore data-format: http://www.postgresql.org/docs/9.1/static/hstore.html
Or maybe you search for arrays?: http://postgresguide.com/sexy/arrays.html

Resources