How to get all vector ids from Milvus2.0? - database

I used to use Milvus1.0. And I can get all IDs from Milvus1.0 by using get_collection_stats and list_id_in_segment APIs.
These days I am trying Milvus2.0. And I also want to get all IDs from Milvus2.0. But I don't find any ways to do it.

milvus v2.0.x supports queries using boolean expressions.
This can be used to return ids by checking if the field is greater than zero.
Let's assume you are using this schema for your collection.
referencing: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_milvus.py
as of 3/8/2022
fields = [
FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
Remember to insert something into your collection first... see the pymilvus example.
Here you want to query out all ids (pk)
You cannot currently list ids specific to a segment, but this would return all ids in a collection.
res = hello_milvus.query(
expr = "pk >= 0",
output_fields = ["pk", "embeddings"]
)
for x in res:
print(x["pk"], x["embeddings"])
I think this is the only way to do it now, since they removed list_id_in_segment

Related

Django query bases on greater date

I want to know how efficient this filter can be done with django queries. Essentially I have the followig two clases
class Act(models.Model):
Date = models.DateTimeField()
Doc = models.ForeignKey(Doc)
...
class Doc(models.Model):
...
so one Doc can have severals Acts, and for each Doc I want to get the act with the greater Date. I'm only interested in Acts objects.
For example, if a have
act1 = (Date=2021-01-01, Doc=doc1)
act2 = (Date=2021-01-02, Doc=doc1)
act3 = (Date=2021-01-03, Doc=doc2)
act4 = (Date=2021-01-04, Doc=doc2)
act5 = (Date=2021-01-05, Doc=doc2)
I want to get [act2, act5] (the Act with Doc=doc1 with the greater Date and the Act with Doc=doc2 with the greater Date).
My only solution is to make a for over Docs.
Thank you so much
You can do this with one or two queries: the first query will retrieve the latest Act per Doc, and then the second one will then retrieve the acts:
from django.db.models import OuterRef, Subquery
last_acts = Doc.objects.annotate(
latest_act=Subquery(
Act.objects.filter(
Doc_id=OuterRef('pk')
).values('pk').order_by('-Date')[:1]
)
).values('latest_act')
and then we can retrieve the corresponding Acts:
Act.objects.filter(pk__in=last_acts)
depending on the database, it might be more efficient to first retrieve the primary keys, and then make an extra query:
Act.objects.filter(pk__in=list(last_acts))

OrientDB - find "orphaned" binary records

I have some images stored in the default cluster in my OrientDB database. I stored them by implementing the code given by the documentation in the case of the use of multiple ORecordByte (for large content): http://orientdb.com/docs/2.1/Binary-Data.html
So, I have two types of object in my default cluster. Binary datas and ODocument whose field 'data' lists to the different record of binary datas.
Some of the ODocument records' RID are used in some other classes. But, the other records are orphanized and I would like to be able to retrieve them.
My idea was to use
select from cluster:default where #rid not in (select myField from MyClass)
But the problem is that I retrieve the other binary datas and I just want the record with the field 'data'.
In addition, I prefer to have a prettier request because I don't think the "not in" clause is really something that should be encouraged. Is there something like a JOIN which return records that are not joined to anything?
Can you help me please?
To resolve my problem, I did like that. However, I don't know if it is the right way (the more optimized one) to do it:
I used the following SQL request:
SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []
In Java, I execute it with the use of the couple OCommandSQL/OCommandRequest and I retrieve an OrientDynaElementIterable. I just iterate on this last one to retrieve an OrientVertex, contained in another OrientVertex, from where I retrieve the RID of the orpan.
Now, here is some code if it can help someone, assuming that you have an OrientGraphNoTx or an OrientGraph for the 'graph' variable :)
String cmd = "SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []";
List<String> orphanedRid = new ArrayList<String>();
OCommandRequest request = graph.command(new OCommandSQL(cmd));
OrientDynaElementIterable objects = request.execute();
Iterator<Object> iterator = objects.iterator();
while (iterator.hasNext()) {
OrientVertex obj = (OrientVertex) iterator.next();
OrientVertex orphan = obj.getProperty("rid");
orphanedRid.add(orphan.getIdentity().toString());
}

Sitecore Solr Search by Multilist with Values from Another Multilist

I have a set of product items. Each product item has a multilist field that points to a set of product type items. When on a product page, I want to show a paged list of related items. These should be items that share a product type with the currently selected item. I'm running into some trouble because products can have multiple types. I need to split the type list on the current item and check that against the list of products in an expression. For some reason split and contains are throwing runtime exceptions and I can't really figure out why. I saw some things about the predicate builder being used for dynamic queries and I will try to use that with what I currently have but I'd like to know why this can't be done straight in the where clause.
Another issue I ran into is that the list of ids stored in solr are being stripped of their '{', '}', and '-' characters.
If you are already on the product page I assume you already have the product item and that product item should have a "ProductType" multilist field. You can use Sitecore.Data.Fields.MultilistFiled to avoid worrying about have to split the raw values.
You can then use Sitecore's Predicate Builder to build out your search predicate, which I assume you want to find all products that have one similar product type. You should adjust this search logic as needed. I am using the ObjectIndexerKey (see more here -> http://www.sitecore.net/Learn/Blogs/Technical-Blogs/Sitecore-7-Development-Team/Posts/2013/05/Sitecore-7-Predicate-Builder.aspx) to go after a named field, but you should build out a proper search model and actually define ProductTypes as a List< ID> or something similar. You may need to add other conditions to the search predicate as well such as path or templateid to limit your results. After that you can just execute the search and consume the results.
As far as Solr stripping the special characters, this is expected behavior based on the Analyzer used on the field. Sitecore and Solr will apply the proper query time analyzers to match things up so you shouldn't have to worry about formatting as long as the proper types are used.
var pred = PredicateBuilder.True<SearchResultItem>();
Sitecore.Data.Fields.MultilistField multilistField = Sitecore.Context.Item.Fields["ProductTypes"];
if (multilistField != null)
{
foreach (ID id in multilistField.TargetIDs)
{
pred = pred.Or(x => ((ID)x[(ObjectIndexerKey)"ProductType"]).Contains(id);
}
}
ISearchIndex _searchIndex = ContentSearchManager.GetIndex("sitecore_master_index"); // change to proper search index
using (var context = _searchIndex.CreateSearchContext())
{
var relatedProducts = context.GetQueryable<SearchResultItemModel>().Where(pred);
foreach(var relatedProduct in relatedProducts)
{
// do something here with search results
}
}
Just an Improvement to #Matt Gartman code,
The error (ID doesn't contain a definition for Contains) which keeps popping up is because .Contains is not a functionality of Type ID, I recommend you to Cast it in string type as below
foreach (ID id in multilistField.TargetIDs)
{
pred = pred.Or(x => (Convert.ToString((ID)x[(ObjectIndexerKey)"ProductType"]).Contains(id.toString())));
}

Adding additional fields to a return from a query in ndb but not working

courses, next_cursor, more_results = CourseTable.query()\
.order(CourseTable.name)\
.fetch_page(2, \
start_cursor=current_cursor,\
projection=['name', 'abbrev_name']
list = format_list(courses)
logging.info(list)
def format_list(coursequery):
for x in coursequery:
keyid = x.key.id()
x.school_list = ICTable.school_nameabbrev_from_courseid(courseid=keyid)
x.teacher_list = TCTable.teacherfilter_from_courseid(courseid=keyid)
x.courseid = keyid
full_list.append(x)
return full_list
For the above logic, I am doing a projection query in ndb google datastore and then formatting (i.e. format_list) the returned query with extra fields retreived from other entities. I then append the updated query with the extra fields, but when I do a logging.info display of the list, I do NOT see the added fields (i.e. school_list and teacher_list -- only the fields from the original projection query---ANY BODY have any ideas why??? Thank you so much anyone.
I think you should first convert the returned object into dict and then add the additional fields.
you can use this to convert to dict
https://developers.google.com/appengine/docs/python/ndb/modelclass#Model_to_dict
A projection query returns entities intended to be read-only. An easy workaround would be to convert the data to a dict, as omair says. Try d = x._to_dict().
It actually is saving the data, but the str method on projections only prints the projected fields.

Google App Engine: IN filter and argument position

I have a table where one of the columns contains a list. I want to know if it is possible to select all rows where the list contains a specific element.
More concretely, I have a guests column containing a list of strings and I want to know if a specific guest string is part of this list. I would like to write a query like this:
q = TableName.gql('WHERE :g IN guests', g=guest)
It seems, however, that I can't put variables in this position. For instance, this query (where ownerid is a string and not a string list) is also disallowed:
q = TableName.gql('WHERE :g = ownerid', g=guest)
I seem to have to write it this way:
q = TableName.gql('WHERE ownerid = :g', g=guest)
Thus I have the following questions:
How can I construct a query that gets rows where a list-cell contains a specific member?
Are arguments for GQL queries restricted to the right-hand side of operators? What is the restriction?
I am using Google App Engine with Python 2.7. Thanks!
You have misunderstood what the IN operator is for. It is not for querying against a repeated field: you just use the normal = for that. IN is for querying against a list of values: eg guest IN [1, 2, 3, 4]. Your query should be:
q = TableName.gql('WHERE guests = :g', g=guest)
or better, since GQL doesn't give you anything that the standard DB syntax doesn't:
q = TableName.all().filter('guests =', guest)

Resources