Best practice to add the ID to a datastore entity? - google-app-engine

When creating an entity using an IncompleteKey so that each record is inherently unique, what is the best way to add the key back into the record so it can be passed around in the structure- at the time of creation?
For example, is something like this (untested code) a good idea, using Transactions?
err = datastore.RunInTransaction(c, func(c appengine.Context) error {
incompleteKey := datastore.NewIncompleteKey(c, ENTITY_TYPE, nil)
entityKey, err := datastore.Put(c, incompleteKey, &MyStruct)
if(err != nil) {
return err
}
MyStruct.SelfID = entityKey.IntID()
_, err = datastore.Put(c, entityKey, &MyStruct)
return err
}, nil)
As a followup- I'm guessing this should almost never fail since it will almost never operate over the same incompleteKey?

You don't need to put the MyStruct into DB twice - it's unnecessary overhead. The key stored as a part of the entity and can be retrieved when needed.
There is a good example in docs on how to store an entity and used it ID as a reference: https://cloud.google.com/appengine/docs/go/datastore/entities#Go_Ancestor_paths
When you want to get keys for entities you can do this using this approach:
https://cloud.google.com/appengine/docs/go/datastore/queries#Go_Retrieving_results - (edited) notice in the example that keys and structs are populated in 1 operation.
If you query the an entity by key you already know it ID.
So there is no need to have an ID as a separate property. If you want to pass it around with the entity for your business logic you can create a wrapper - either generalized using interface() for the entity struct or a strongly typed (1 per each entity struct).

Related

Upsert a document with value from filter

I have a collection with following structure in MongoDB:
{
"userId": String,
"refs": Set<String>
}
I need to update the collections with those documents. I want to add to refs a new string, for users that are in the filter $in.
But, if the user is not exists I need to "upsert" him.
In code (golang) it looks like this:
filter := bson.M{
"userId": bson.M{
"$in:": tokens // tokens is []string
}
}
update := bson.M{
"$addToSet": bson.M{
"refs": newReference
}
}
ctx, _ := newDbOperationContext()
_, err := driver.UpdateMany(ctx, filter, update)
So, for existing users it works ok, the reference is added. But, for users that not exists nothing happen.
I set in driver.UpdateMany(bson, bson, opts...) opts to options.UpdateOptions.SetUpsert(true)", but as a result I got a document without userId:
{
"_id": ObjectId("..."),
"refs": ["new_reference"]
}
So, my question is, how to upsert the new values with userId field.
The scale is like 2*10^6 users to update, so I would like to do that using batch request. Creating using "one by one" and updating him is not an option here, I think.
Thanks for your support!
According to previous questions in SO like this one and this other one it does not seem possible to perform multiple upserts using only the $in operator because it will insert only a single document (the one matching the filter):
If no document matches the query criteria, db.collection.update() inserts a single document.
So as mentioned by #Kartavya the best is to perform multiple write operations using BulkWrite.
For that you need to append an upsert op (=WriteModel) for each of the users in tokens as a filter, and for all you can use the same $addToSet update operation:
tokens := [...]string{"userId1", "userId3"}
newRef := "refXXXXXXX"
// all docs can use the same $addToSet update operation
updateOp := bson.D{{"$addToSet", bson.D{{"refs", newRef}}}}
// we'll append one update for each userId in tokens as a filter
upserts := []mongo.WriteModel{}
for _, t := range tokens {
upserts = append(
upserts,
mongo.NewUpdateOneModel().SetFilter(bson.D{{"userId", t}}).SetUpdate(updateOp).SetUpsert(true))
}
opts := options.BulkWrite().SetOrdered(false)
res, err := col.BulkWrite(context.TODO(), upserts, opts)
if err != nil {
log.Fatal(err)
}
fmt.Println(res)
Looking at your use case, I think the best solution will be the following :
Since you have a high scale and wish to make batch requests, it is best to use BulkWrite : The db.collection.bulkWrite() method provides the ability to perform bulk insert, update, and remove operations.
Example : https://godoc.org/go.mongodb.org/mongo-driver/mongo#example-Collection-BulkWrite
This uses UpdateOne Model but it supports UpdateMany Model as well. It also a function of SetUpsert(true)
Now for the _id field : Your updated/upserted document should have _id field for the new document to have that _id field else mongoDb auto-generates an _id field while inserting the document if your upsert document does not have _id field
I think, it will not be much of a pain to have _id field in your documents, so that way your problem is solved.
Regarding the scale, I suggest using BulkWrite with UpdateOne or UpdateMany models.
Hope this helps.
In case of upsert if the document is not present then only the updator part of query is going to insert in the database. So that's why your output is like that. You can see here.

Appengine Datastore query returns different result inside transaction

Hoping someone can help point out the issue in my code.
I have a query defined outside a transaction, and when it's executed, it correctly matches an existing record in the database.
However, the moment that query is executed inside a transaction, it fails to match the existing records in the database, despite the fact that they exist.
Here's the code, with output below:
// Query for URL to see if any already exist
existingRemoteURLQuery := datastore.NewQuery("RepoStats").
Filter("RepoURL =", statsToSave.RepoURL).
KeysOnly().Limit(1)
testKey, _ := existingRemoteURLQuery.GetAll(ctx, new(models.RepoStats))
if len(testKey) > 0 {
log.Infof(ctx, "TEST Update existing record vice new key")
} else {
log.Infof(ctx, "TEST No existing key found, use new key")
}
// Check if we already have a record with this remote URL
var key *datastore.Key
err := datastore.RunInTransaction(ctx, func(ctx context.Context) error {
// This function's argument ctx shadows the variable ctx from the surrounding function.
// last parameter is ignored because it's a keys-only query
existingKeys, err := existingRemoteURLQuery.GetAll(ctx, new(models.RepoStats))
if len(existingKeys) > 0 {
log.Infof(ctx, "Update existing record vice new key")
// use existing key
key = existingKeys[0]
} else {
log.Infof(ctx, "No existing key found, use new key")
key = datastore.NewIncompleteKey(ctx, "RepoStats", nil)
}
return err
}, nil)
As you can see in the output, the first query outside the transaction correctly matches the existing record. But inside the transaction, it doesn't recognize the existing record:
2018/08/28 11:50:47 INFO: TEST Update existing record vice new key
2018/08/28 11:50:47 INFO: No existing key found, use new key
Thanks for any help in advance
Updated
Dan's comment lead to printing out the error message on the query inside the transaction:
if err != nil {
log.Errorf(ctx, "Issue running in transaction: %v", err)
}
Which prints:
ERROR: Issue running in transaction: API error 1 (datastore_v3: BAD_REQUEST): Only ancestor queries are allowed inside transactions.
Converting a comment into an answer
Turns out this is the go-specific behaviour when attempting to perform non-ancestor queries inside transactions (FWIW, in python attempting to do so actually raises an exception).
Ancestor queries are the only queries allowed inside transactions. From What can be done in a transaction (not very explicit, tho, IMHO implicit as queries could return entities not meeting the transaction restrictions):
All Cloud Datastore operations in a transaction must operate on
entities in the same entity group if the transaction is a single-group
transaction, or on entities in a maximum of twenty-five entity groups
if the transaction is a cross-group transaction. This includes
querying for entities by ancestor, retrieving entities by key,
updating entities, and deleting entities. Notice that each root entity
belongs to a separate entity group, so a single transaction cannot
create or operate on more than one root entity unless it is a
cross-group transaction.

Google appengine queries fail with namespacing

I am introducing namespacing into my application, but I have run into an issue with one of my existing queries that performs the following operation in order to determine whether or not an entity exists for the given key.
// c is of type context.Context
c, _ = appengine.Namespace(c, "name")
k := datastore.NewKey(c, "Kind", "", id, nil)
q := datastore.NewQuery("Kind").Filter("__key__ =", k).KeysOnly()
keys, err := q.GetAll(c, nil)
When this command is executed with a namespace applied to the context, it gives back the following error:
datastore_v3 API error 1: __key__ filter namespace is but query namespace is db
I could just use a Get query instead, but I don't need to actually retrieve the entity at all. Plus, keys-only queries are free!
Update
It seems that all queries are failing after I have introduced namespacing. The documentation doesn't mention any sort of special treatment for the indices:
https://cloud.google.com/appengine/docs/go/multitenancy/multitenancy
"By default, the datastore uses the current namespace for datastore requests. The API applies this current namespace to datastore.Key objects when they are created. Therefore, you need to be careful if an application stores Key objects in serialized forms, since the namespace is preserved in those serializations."
Using namespaces with the Datastore
https://cloud.google.com/appengine/docs/go/multitenancy/multitenancy#Go_Using_namespaces_with_the_Datastore

List all Entities of single Datastore Kind using GetMulti

Is there a way for me to use datastore's GetMulti, or another function built into the "appengine/datastore" package, to get all entities of a single kind?
For instance, I have a kind "Queue" with many entities that have two to three properties. Each entity has a unique stringID and what I'm trying to get is a slice or other comparable data type of each unique stringID.
The purpose of Queue is to store some metadata and the unique key names that I'll be looping over and performing a cron task on (e.g. keys "user1", "user2", and "user3" are stored as kind Queue, then - during cron - are looped over and interacted with).
Thanks.
I'm new to Google App Engine and I didn't read the documentation before diving in. Now that I actually read the docs, it looks like I'll be answering my own question. This can be accomplished via a simple query, looping over the Keys, and appending the StringID of each key to a slice of strings:
var queuedUsers []string
q := datastore.NewQuery("Queue").KeysOnly()
keys, _ := q.GetAll(c, nil)
for _, v := range keys {
queuedUsers = append(queuedUsers, v.StringID())
}

How can I query for direct descendants only?

Let's say I have entities a, b and c all of the same type, and the situation is like this:
entity a is parent for entity b
entity b is parent for entity c
Now if I do the following query
query = ndb.Query(ancestor=a.key)
result = query.fetch()
The result will contain both b and c entities. Is there a way I can filter out c so that only entities that are direct descendants remain? Any way apart from me going through the results and removing them I mean.
The only way to do this is to modify your schema, adding a 'parent' KeyProperty that references an entity's direct parent, then filtering on that.
Actually, this is not supported at all. Nick's answer does work but only if you can specify the entity kind in your query which the OP did not specify:
"Kindless queries cannot include filters on properties. They can, however, filter by Entity Key by passing Entity.KEY_RESERVED_PROPERTY as the property name for the filter. Ascending sorts on Entity.KEY_RESERVED_PROPERTY are also supported."
This is a little late, however it will help anyone with the same problem.
The solution is to first do a keys-only query and take the subset of keys which are direct descendants.
With that subset of keys, you can batch get the desired entities.
I'm unfamiliar with python, so here's an example in go:
directDescKeys := make([]*datastore.Key, 0)
q := datastore.NewQuery("A").Ancestor(parentKey).KeysOnly()
for it := q.Run(ctx);; {
key, err := it.Next(nil)
if err == datastore.Done {
break
} else if err != nil {
// handle error
}
if reflect.DeepEquals(key.Parent(), parentKey) {
directDescKeys = append(directDescKeys, key)
}
}
entities := make([]*A, len(directDescKeys))
if err := datastore.GetMulti(ctx, directDescKeys, entities); err != nil {
// handle error
}

Resources