OrientDB - find "orphaned" binary records - file

I have some images stored in the default cluster in my OrientDB database. I stored them by implementing the code given by the documentation in the case of the use of multiple ORecordByte (for large content): http://orientdb.com/docs/2.1/Binary-Data.html
So, I have two types of object in my default cluster. Binary datas and ODocument whose field 'data' lists to the different record of binary datas.
Some of the ODocument records' RID are used in some other classes. But, the other records are orphanized and I would like to be able to retrieve them.
My idea was to use
select from cluster:default where #rid not in (select myField from MyClass)
But the problem is that I retrieve the other binary datas and I just want the record with the field 'data'.
In addition, I prefer to have a prettier request because I don't think the "not in" clause is really something that should be encouraged. Is there something like a JOIN which return records that are not joined to anything?
Can you help me please?

To resolve my problem, I did like that. However, I don't know if it is the right way (the more optimized one) to do it:
I used the following SQL request:
SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []
In Java, I execute it with the use of the couple OCommandSQL/OCommandRequest and I retrieve an OrientDynaElementIterable. I just iterate on this last one to retrieve an OrientVertex, contained in another OrientVertex, from where I retrieve the RID of the orpan.
Now, here is some code if it can help someone, assuming that you have an OrientGraphNoTx or an OrientGraph for the 'graph' variable :)
String cmd = "SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []";
List<String> orphanedRid = new ArrayList<String>();
OCommandRequest request = graph.command(new OCommandSQL(cmd));
OrientDynaElementIterable objects = request.execute();
Iterator<Object> iterator = objects.iterator();
while (iterator.hasNext()) {
OrientVertex obj = (OrientVertex) iterator.next();
OrientVertex orphan = obj.getProperty("rid");
orphanedRid.add(orphan.getIdentity().toString());
}

Related

How to get all vector ids from Milvus2.0?

I used to use Milvus1.0. And I can get all IDs from Milvus1.0 by using get_collection_stats and list_id_in_segment APIs.
These days I am trying Milvus2.0. And I also want to get all IDs from Milvus2.0. But I don't find any ways to do it.
milvus v2.0.x supports queries using boolean expressions.
This can be used to return ids by checking if the field is greater than zero.
Let's assume you are using this schema for your collection.
referencing: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_milvus.py
as of 3/8/2022
fields = [
FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
Remember to insert something into your collection first... see the pymilvus example.
Here you want to query out all ids (pk)
You cannot currently list ids specific to a segment, but this would return all ids in a collection.
res = hello_milvus.query(
expr = "pk >= 0",
output_fields = ["pk", "embeddings"]
)
for x in res:
print(x["pk"], x["embeddings"])
I think this is the only way to do it now, since they removed list_id_in_segment

What is a dynamic apex in salesforce?

I am in the learning stages of Salesforce Apex. I have read the topic of Dynamic Apex and was not able to understand the concept. Can someone explain it how to deal with it and in which scenarios it is best to use?
Thanks in advance.
Use case 1:
You are developing a page that reads the salesforce object meta data to display the object records to the user. You want to use the describe global methods but you dont know how to combine standard SOQL with the generic SObject type.
Standard SOQL eg
Person__c [] persons = [SELECT Id, Name, Age__c, Height__c FROM Person__c];
But the describe global metadata methods returns a SObject types.
Solution:
Use the describe global methods to get a list of objects, then further get all the fields on that object. Build a SELECT statement in a local string variable with all the fields then execute the query with Database.query().
string objectfullname = 'scenario__c';
Schema.SObjectType targetType = Schema.getGlobalDescribe().get('scenario__c');
if (targetType == null) {
system.debug('Type not found: '+objectFullname);
throw new TypeNotFoundException(objectFullName);
}
Schema.DescribeSObjectResult typedescription = targetType.getDescribe();
Map<String, schema.Sobjectfield> resultMap = typedescription.Fields.getMap();
string query = 'SELECT ' + string.join(new List<string>(resultMap.keySet()), ',') + ' FROM '+ objectfullname + ' LIMIT 100';
sobject [] records = Database.query(query);
Use Case 2
You want to loosely couple your code with custom objects in a beta managed packaged so that the managed packaged can be uninstalled and upgraded.
Solution
When you use the Database.query() method, your code is not compiled against the custom object so it can be re-installed without any need for commenting out code to remove the dependency.
Use Case 3
You have a trigger that copies records to another custom object after insert according to an dynamic field mapping schema. You can't code it in the standard way [SELECT ...] because you only know what object you are inserting to at run time.
Solution
Again, use describe global methods & Database.query to get the records and type information then you can insert into the target object like normal DML.
sobject newRecord = ...
for (integer i = 0; i < fieldCount; i++) {
newRecord.put(fields[i],values[i]);
}
insert newRecord;
If you are doing bulk inserts, like always, make sure that you dont put DML (insert, update) statements in a loop.

Is there any way to do a Insert or Update / Merge / Upsert in LLBLGen

I'd like to do an upmerge using LLBLGen without first fetching then saving the entity.
I already found the possibility to update without fetching the entity first, but then I have to know it is already there.
Updating entries would be about as often as inserting a new entry.
Is there a possibility to do this in one step?
Would it make sense to do it in one step?
Facts:
LLBLgen Pro 2.6
SQL Server 2008 R2
.NET 3.5 SP1
I know I'm a little late for this, but As I remember working with LLBLGenPro, it is totally possible and one of its beauties is everithing is possible!
I don't have my samples, but I'm pretty sure you there is a method named UpdateEntitiesDirectly that can be used like this:
// suppose we have Product and Order Entities
using (var daa = new DataAccessAdapter())
{
int numberOfUpdatedEntities =
daa.UpdateEntitiesDirectly(OrderFields.ProductId == 23 && OrderFields.Date > DateTime.Now.AddDays(-2));
}
When using LLBLGenPro we were able to do pretty everything that is possible with an ORM framework, it's just great!
It also has a method to do a batch delete called DeleteEntitiesDirectly that may be usefull in scenarios that you need to delete an etity and replace it with another one.
Hope this is helpful.
I think you can achieve what you're looking for by using EntityCollection. First fetch the entities you want to update by FetchEntityCollection method of DataAccessAdapter then, change anything you want in that collection, insert new entities to it and save it using DataAccessAdapter, SaveCollection method. this way existing entities would be updated and new ones would be inserted to the Database. For example in a product order senario in which you want to manipulate orders of a specified product then you can use something like this:
int productId = 23;
var orders = new EntityCollection<OrderEntity>();
using (DataAccessAdapter daa = new DataAccessAdapter())
{
daa.FetchEntityCollection(orders, new RelationPredicateBucket(OrderFields.ProductId == productId))
foreach(var order in orders)
{
order.State = 1;
}
OrderEntity newOrder = new OrderEntity();
newOrder.ProductId == productId;
newOrder.State = 0;
orders.Add(newOrder);
daa.SaveEntityCollection(orders);
}
As far as I know, this is not possible, and could not be possible.
If you were to just call adapter.Save(entity) on an entity that was not fetched, the framework would assume it was new. If you think about it, how could the framework know whether to emit an UPDATE or an INSERT statement? No matter what, something somewhere would have to query the database to see if the row exists.
It would not be too difficult to create something that did this more or less automatically for single entity (non-recursive) saves. The steps would be something like:
Create a new entity and set it's fields.
Attempt to fetch an entity of the same type using the PK or a unique constraint (there are other options as well, but none as uniform)
If the fetch fails, just save the new entity (INSERT)
If the fetch succeeds, map the fields of the created entity to the fields of the fetched entity.
Save the fetched entity (UPDATE).

Entity Framework efficient querying

Lets say I have a model, Article that has a large amount of columns and the database contains more than 100,000 rows. If I do something like var articles = db.Articles.ToList() it is retrieving the entire article model for each article in the database and holding it in memory right?
So if I am populating a table that only shows the date of the entry and it's title is there a way to only retrieve just these columns from the database using the entity framework, and would it be more efficient?
According to this,
There is a cost required to track returned objects in the object
context. Detecting changes to objects and ensuring that multiple
requests for the same logical entity return the same object instance
requires that objects be attached to an ObjectContext instance. If you
do not plan to make updates or deletes to objects and do not require
identity management , consider using the NoTracking merge options when
you execute queries.
it looks like I should use NoTracking since the data isn't being changed or deleted, only displayed. So my query now becomes var articles = db.Articles.AsNoTracking().ToList(). Are there other things I should do to make this more efficient?
Another question I have is that according to this answer, using .Contains(...) will cause a large performance drop when dealing with a large database. What is the recommended method to use to search through the entries in a large database?
It's called a projection and just translates into a SELECT column1, column2, ... in SQL:
var result = db.Articles
.Select(a => new
{
Date = a.Date,
Title = a.Title
})
.ToList();
Instead of a => new { ... } (creates a list of "anonymous" objects) you can also use a named helper class (or "view model"): a => new MyViewModel { ... } that contains only the selected properties (but you can't use a => new Article { ... } as an entity itself).
For such a projection you don't need AsNoTracking() because projected data are not tracked anyway, only full entity objects are tracked.
Instead of using Contains the more common way is to use Where like:
var date = DateTime.Now.AddYears(-1);
var result = db.Articles
.Where(a => date <= a.Date)
.Select(a => new
{
Date = a.Date,
Title = a.Title
})
.ToList();
This would select only the articles that are not older than a year. The Where is just translated into a SQL WHERE statement and the filter is performed in the database (which is as fast as the SQL query is, depending on table size and proper indexing, etc.). Only the result of this filter is loaded into memory.
Edit
Refering to your comment below:
Don't confuse IEnumerable<T>.Contains(T t) with string.Contains(string subString). The answer you have linked in your question talks about the first version of Contains. If you want to search for articles that have the string "keyword" in the text body you need the second Contains version:
string keyword = "Entity Framework";
var result = db.Articles
.Where(a => a.Body.Contains(keyword))
.Select(a => new
{
Date = a.Date,
Title = a.Title
})
.ToList();
This will translate into something like WHERE Body like N'%Entity Framework%' in SQL. The answer about the poor performance of Contains doesn't apply to this version of Contains at all.

Hitting the 2100 parameter limit (SQL Server) when using Contains()

from f in CUSTOMERS
where depts.Contains(f.DEPT_ID)
select f.NAME
depts is a list (IEnumerable<int>) of department ids
This query works fine until you pass a large list (say around 3000 dept ids) .. then I get this error:
The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
I changed my query to:
var dept_ids = string.Join(" ", depts.ToStringArray());
from f in CUSTOMERS
where dept_ids.IndexOf(Convert.ToString(f.DEPT_id)) != -1
select f.NAME
using IndexOf() fixed the error but made the query slow. Is there any other way to solve this? thanks so much.
My solution (Guids is a list of ids you would like to filter by):
List<MyTestEntity> result = new List<MyTestEntity>();
for(int i = 0; i < Math.Ceiling((double)Guids.Count / 2000); i++)
{
var nextGuids = Guids.Skip(i * 2000).Take(2000);
result.AddRange(db.Tests.Where(x => nextGuids.Contains(x.Id)));
}
this.DataContext = result;
Why not write the query in sql and attach your entity?
It's been awhile since I worked in Linq, but here goes:
IQuery q = Session.CreateQuery(#"
select *
from customerTable f
where f.DEPT_id in (" + string.Join(",", depts.ToStringArray()) + ")");
q.AttachEntity(CUSTOMER);
Of course, you will need to protect against injection, but that shouldn't be too hard.
You will want to check out the LINQKit project since within there somewhere is a technique for batching up such statements to solve this issue. I believe the idea is to use the PredicateBuilder to break the local collection into smaller chuncks but I haven't reviewed the solution in detail because I've instead been looking for a more natural way to handle this.
Unfortunately it appears from Microsoft's response to my suggestion to fix this behavior that there are no plans set to have this addressed for .NET Framework 4.0 or even subsequent service packs.
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=475984
UPDATE:
I've opened up some discussion regarding whether this was going to be fixed for LINQ to SQL or the ADO.NET Entity Framework on the MSDN forums. Please see these posts for more information regarding these topics and to see the temporary workaround that I've come up with using XML and a SQL UDF.
I had similar problem, and I got two ways to fix it.
Intersect method
join on IDs
To get values that are NOT in list, I used Except method OR left join.
Update
EntityFramework 6.2 runs the following query successfully:
var employeeIDs = Enumerable.Range(3, 5000);
var orders =
from order in Orders
where employeeIDs.Contains((int)order.EmployeeID)
select order;
Your post was from a while ago, but perhaps someone will benefit from this. Entity Framework does a lot of query caching, every time you send in a different parameter count, that gets added to the cache. Using a "Contains" call will cause SQL to generate a clause like "WHERE x IN (#p1, #p2.... #pn)", and bloat the EF cache.
Recently I looked for a new way to handle this, and I found that you can create an entire table of data as a parameter. Here's how to do it:
First, you'll need to create a custom table type, so run this in SQL Server (in my case I called the custom type "TableId"):
CREATE TYPE [dbo].[TableId] AS TABLE(
Id[int] PRIMARY KEY
)
Then, in C#, you can create a DataTable and load it into a structured parameter that matches the type. You can add as many data rows as you want:
DataTable dt = new DataTable();
dt.Columns.Add("id", typeof(int));
This is an arbitrary list of IDs to search on. You can make the list as large as you want:
dt.Rows.Add(24262);
dt.Rows.Add(24267);
dt.Rows.Add(24264);
Create an SqlParameter using the custom table type and your data table:
SqlParameter tableParameter = new SqlParameter("#id", SqlDbType.Structured);
tableParameter.TypeName = "dbo.TableId";
tableParameter.Value = dt;
Then you can call a bit of SQL from your context that joins your existing table to the values from your table parameter. This will give you all records that match your ID list:
var items = context.Dailies.FromSqlRaw<Dailies>("SELECT * FROM dbo.Dailies d INNER JOIN #id id ON d.Daily_ID = id.id", tableParameter).AsNoTracking().ToList();
You could always partition your list of depts into smaller sets before you pass them as parameters to the IN statement generated by Linq. See here:
Divide a large IEnumerable into smaller IEnumerable of a fix amount of item

Resources