How can I query for direct descendants only? - google-app-engine

Let's say I have entities a, b and c all of the same type, and the situation is like this:
entity a is parent for entity b
entity b is parent for entity c
Now if I do the following query
query = ndb.Query(ancestor=a.key)
result = query.fetch()
The result will contain both b and c entities. Is there a way I can filter out c so that only entities that are direct descendants remain? Any way apart from me going through the results and removing them I mean.

The only way to do this is to modify your schema, adding a 'parent' KeyProperty that references an entity's direct parent, then filtering on that.

Actually, this is not supported at all. Nick's answer does work but only if you can specify the entity kind in your query which the OP did not specify:
"Kindless queries cannot include filters on properties. They can, however, filter by Entity Key by passing Entity.KEY_RESERVED_PROPERTY as the property name for the filter. Ascending sorts on Entity.KEY_RESERVED_PROPERTY are also supported."

This is a little late, however it will help anyone with the same problem.
The solution is to first do a keys-only query and take the subset of keys which are direct descendants.
With that subset of keys, you can batch get the desired entities.
I'm unfamiliar with python, so here's an example in go:
directDescKeys := make([]*datastore.Key, 0)
q := datastore.NewQuery("A").Ancestor(parentKey).KeysOnly()
for it := q.Run(ctx);; {
key, err := it.Next(nil)
if err == datastore.Done {
break
} else if err != nil {
// handle error
}
if reflect.DeepEquals(key.Parent(), parentKey) {
directDescKeys = append(directDescKeys, key)
}
}
entities := make([]*A, len(directDescKeys))
if err := datastore.GetMulti(ctx, directDescKeys, entities); err != nil {
// handle error
}

Related

How to get all vector ids from Milvus2.0?

I used to use Milvus1.0. And I can get all IDs from Milvus1.0 by using get_collection_stats and list_id_in_segment APIs.
These days I am trying Milvus2.0. And I also want to get all IDs from Milvus2.0. But I don't find any ways to do it.
milvus v2.0.x supports queries using boolean expressions.
This can be used to return ids by checking if the field is greater than zero.
Let's assume you are using this schema for your collection.
referencing: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_milvus.py
as of 3/8/2022
fields = [
FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
Remember to insert something into your collection first... see the pymilvus example.
Here you want to query out all ids (pk)
You cannot currently list ids specific to a segment, but this would return all ids in a collection.
res = hello_milvus.query(
expr = "pk >= 0",
output_fields = ["pk", "embeddings"]
)
for x in res:
print(x["pk"], x["embeddings"])
I think this is the only way to do it now, since they removed list_id_in_segment

Upsert a document with value from filter

I have a collection with following structure in MongoDB:
{
"userId": String,
"refs": Set<String>
}
I need to update the collections with those documents. I want to add to refs a new string, for users that are in the filter $in.
But, if the user is not exists I need to "upsert" him.
In code (golang) it looks like this:
filter := bson.M{
"userId": bson.M{
"$in:": tokens // tokens is []string
}
}
update := bson.M{
"$addToSet": bson.M{
"refs": newReference
}
}
ctx, _ := newDbOperationContext()
_, err := driver.UpdateMany(ctx, filter, update)
So, for existing users it works ok, the reference is added. But, for users that not exists nothing happen.
I set in driver.UpdateMany(bson, bson, opts...) opts to options.UpdateOptions.SetUpsert(true)", but as a result I got a document without userId:
{
"_id": ObjectId("..."),
"refs": ["new_reference"]
}
So, my question is, how to upsert the new values with userId field.
The scale is like 2*10^6 users to update, so I would like to do that using batch request. Creating using "one by one" and updating him is not an option here, I think.
Thanks for your support!
According to previous questions in SO like this one and this other one it does not seem possible to perform multiple upserts using only the $in operator because it will insert only a single document (the one matching the filter):
If no document matches the query criteria, db.collection.update() inserts a single document.
So as mentioned by #Kartavya the best is to perform multiple write operations using BulkWrite.
For that you need to append an upsert op (=WriteModel) for each of the users in tokens as a filter, and for all you can use the same $addToSet update operation:
tokens := [...]string{"userId1", "userId3"}
newRef := "refXXXXXXX"
// all docs can use the same $addToSet update operation
updateOp := bson.D{{"$addToSet", bson.D{{"refs", newRef}}}}
// we'll append one update for each userId in tokens as a filter
upserts := []mongo.WriteModel{}
for _, t := range tokens {
upserts = append(
upserts,
mongo.NewUpdateOneModel().SetFilter(bson.D{{"userId", t}}).SetUpdate(updateOp).SetUpsert(true))
}
opts := options.BulkWrite().SetOrdered(false)
res, err := col.BulkWrite(context.TODO(), upserts, opts)
if err != nil {
log.Fatal(err)
}
fmt.Println(res)
Looking at your use case, I think the best solution will be the following :
Since you have a high scale and wish to make batch requests, it is best to use BulkWrite : The db.collection.bulkWrite() method provides the ability to perform bulk insert, update, and remove operations.
Example : https://godoc.org/go.mongodb.org/mongo-driver/mongo#example-Collection-BulkWrite
This uses UpdateOne Model but it supports UpdateMany Model as well. It also a function of SetUpsert(true)
Now for the _id field : Your updated/upserted document should have _id field for the new document to have that _id field else mongoDb auto-generates an _id field while inserting the document if your upsert document does not have _id field
I think, it will not be much of a pain to have _id field in your documents, so that way your problem is solved.
Regarding the scale, I suggest using BulkWrite with UpdateOne or UpdateMany models.
Hope this helps.
In case of upsert if the document is not present then only the updator part of query is going to insert in the database. So that's why your output is like that. You can see here.

Dynamically query mongodb with golang

I'm trying to query my mongodb database using golang (and the mgo library) with only one function, and the method I am currently using is:
er = c.Find(sel(items)).Sort("-createdAt").All(&result)
Where items is a map and the key is the name of the field I am searching inthe db, and the value is what I want to search by.
and sel() is:
func sel(query map[string]string) bson.M {
result := make(bson.M, len(query))
result[ ] = "$in"
for k, v := range query {
result[k] = v
}
return result
currently it will return all of the results where at least one of the fields matches the input map. (So a logical OR) however I would like it to return the logical AND of these fields.
Does anyone have suggestions on how to modify the existing code or a new way of efficiently querying the database?
Thank you
I don't know what this line is supposed to mean:
result[ ] = "$in"
As it is a compile-time error.
But the elements of the query document (the conditions) are in logical AND connection by default, so this is all it takes:
func sel(query map[string]string) bson.M {
result := make(bson.M, len(query))
for k, v := range query {
result[k] = v
}
return result
}
If this gives you all the documents in the collection, then that means all the key-value pairs match all the documents. Experiment with simple filters to see that it works.
Also note that the mgo package also accepts a wide range of maps and structs, not just bson.M. Documentation of Collection.Find() has this to say about the allowed types:
The document may be a map or a struct value capable of being marshalled with bson. The map may be a generic one using interface{} for its key and/or values, such as bson.M, or it may be a properly typed map. Providing nil as the document is equivalent to providing an empty document such as bson.M{}.
So you can use your map which is of type map[string]string without converting it:
err = c.Find(items).Sort("-createdAt").All(&result)

Best practice to add the ID to a datastore entity?

When creating an entity using an IncompleteKey so that each record is inherently unique, what is the best way to add the key back into the record so it can be passed around in the structure- at the time of creation?
For example, is something like this (untested code) a good idea, using Transactions?
err = datastore.RunInTransaction(c, func(c appengine.Context) error {
incompleteKey := datastore.NewIncompleteKey(c, ENTITY_TYPE, nil)
entityKey, err := datastore.Put(c, incompleteKey, &MyStruct)
if(err != nil) {
return err
}
MyStruct.SelfID = entityKey.IntID()
_, err = datastore.Put(c, entityKey, &MyStruct)
return err
}, nil)
As a followup- I'm guessing this should almost never fail since it will almost never operate over the same incompleteKey?
You don't need to put the MyStruct into DB twice - it's unnecessary overhead. The key stored as a part of the entity and can be retrieved when needed.
There is a good example in docs on how to store an entity and used it ID as a reference: https://cloud.google.com/appengine/docs/go/datastore/entities#Go_Ancestor_paths
When you want to get keys for entities you can do this using this approach:
https://cloud.google.com/appengine/docs/go/datastore/queries#Go_Retrieving_results - (edited) notice in the example that keys and structs are populated in 1 operation.
If you query the an entity by key you already know it ID.
So there is no need to have an ID as a separate property. If you want to pass it around with the entity for your business logic you can create a wrapper - either generalized using interface() for the entity struct or a strongly typed (1 per each entity struct).

Google appengine queries fail with namespacing

I am introducing namespacing into my application, but I have run into an issue with one of my existing queries that performs the following operation in order to determine whether or not an entity exists for the given key.
// c is of type context.Context
c, _ = appengine.Namespace(c, "name")
k := datastore.NewKey(c, "Kind", "", id, nil)
q := datastore.NewQuery("Kind").Filter("__key__ =", k).KeysOnly()
keys, err := q.GetAll(c, nil)
When this command is executed with a namespace applied to the context, it gives back the following error:
datastore_v3 API error 1: __key__ filter namespace is but query namespace is db
I could just use a Get query instead, but I don't need to actually retrieve the entity at all. Plus, keys-only queries are free!
Update
It seems that all queries are failing after I have introduced namespacing. The documentation doesn't mention any sort of special treatment for the indices:
https://cloud.google.com/appengine/docs/go/multitenancy/multitenancy
"By default, the datastore uses the current namespace for datastore requests. The API applies this current namespace to datastore.Key objects when they are created. Therefore, you need to be careful if an application stores Key objects in serialized forms, since the namespace is preserved in those serializations."
Using namespaces with the Datastore
https://cloud.google.com/appengine/docs/go/multitenancy/multitenancy#Go_Using_namespaces_with_the_Datastore

Resources