Order by best match of string column in EF Core - sql-server

I am implementing a search feature for the users in our api. The search will query our db to fetch the first N users matching the search term. I want to order the user after "best match" so the most relevant user is on top.
What I'd like to do is something like:
var users = await _dbContext.Users
.IncludeUserData()
.Where(u => u.Name.Contains(searchTerm))
.OrderBy(u => u.Name.IndexOf(searchTerm)) <- This row is not possible
.ToListAsync();
Where basically a name that contains the search term early is ordered before a user whose name contains the term late.
E.g.
Simon Carlsson should come before Carl Simonsson if the searchTerm is "Simon"
Using SQL Server as the DB provider
How would I achieve an order by query where users with names better matching the searchTerm are sorted higher up in the list?

After some more searching this method of importing functions from the DB provider was found:
[DbFunction("CHARINDEX", IsBuiltIn = true)]
public static long CHARINDEX(string substring, string str)
{
throw new NotImplementedException();
}
Put this in user dbContext class. It will bind to the CHARINDEX function in SQL Server.
https://learn.microsoft.com/en-us/ef/core/querying/user-defined-function-mapping
Then use it to sort the query:
.sortBy(u => DbContext.CHARINDEX(searchTerm, u.name))

Have you tried the LIKE operator?
You may find this useful
Entity framework EF.Functions.Like vs string.Contains

Related

Optimize IQueryable query to let EF generate a single SQL query instead multiple. Child collection of an entity must contains a custom collection

The goal is to have a single query that will be generated by the EF and MSSQL will execute it in one go. Having the current implementation, everything works correctly, but not optimal. To be more specific, looking at the SQL Server Profiler logs, it makes additional exec sp_executesql queries per each company to fetch data (in example below, it would be Products).
Say, we have selected product ids.
List<int> selectedProductIds = { 1, 2, 3 };
We filter over a collection of Companies to get only those companies that have ALL selected products.
And a query where we dynamically extend it as many as we need, thankfully to IQueryable interface.
Imagine x of type Company and it contains a collection of Products.
if (selectedProductIds.Count > 0)
{
query = query.Where(x => selectedProductIds.All(id => x.Products.Select(p => p.ProductId).Contains(id)));
}
Is there any way to rewrite the predicate using LINQ? I know I can make a dynamic SQL query myself anytime, but I am trying to understand what I miss in terms of EF/LINQ. Thanks!
The version of Entity Framework Core is 2.1.
UPD:
Company products are unique and never duplicated within a company entity. No need to make distinct.
Try the following query:
if (selectedProductIds.Count > 0)
{
query = query.Where(x => x.Products
.Where(p => selectedProductIds.Contains(p.ProductId))
.Count() == selectedProductIds.Count
);
}

Controlling NHIbernate search query output regarding parameters

When you use NHibernate to "fetch" a mapped object, it outputs a SELECT query to the database. It outputs this using parameters; so if I query a list of cars based on tenant ID and name, I get:
select Name, Location from Car where tenantID=#p0 and Name=#p1
This has the nice benefit of our database creating (and caching) a query plan based on this query and the result, so when it is run again, the query is much faster as it can load the plan from the cache.
The problem with this is that we are a multi-tenant database, and almost all of our indexes are partition aligned. Our tenants have vastly different data sets; one tenant could have 5 cars, while another could have 50,000. And so because NHibernate does this, it has the net effect of our database creating and caching a plan for the FIRST tenant that runs it. This plan is likely not efficient for subsequent tenants who run the query.
What I WANT to do is force NHibernate NOT to parameterize certain parameters; namely, the tenant ID. So I'd want the query to read:
select Name, Location from Car where tenantID=55 and Name=#p0
I can't figure out how to do this in the HBM.XML mapping. How can I dictate to NHibernate how to use parameters? Or can I just turn parameters off altogether?
OK everyone, I figured it out.
The way I did it was overriding the SqlClientDriver with my own custom driver that looks like this:
public class CustomSqlClientDriver : SqlClientDriver
{
private static Regex _partitionKeyReplacer = new Regex(#".PartitionKey=(#p0)", RegexOptions.Compiled);
public override void AdjustCommand(IDbCommand command)
{
var m = _tenantIDReplacer.Match(command.CommandText);
if (!m.Success)
return;
// replace the first parameter with the actual partition key
var parameterName = m.Groups[1].Value;
// find the parameter value
var tenantID = (IDbDataParameter ) command.Parameters[parameterName];
var valueOfTenantID = tenantID.Value;
// now replace the string
command.CommandText = _tenantIDReplacer.Replace(command.CommandText, ".TenantID=" + valueOfTenantID);
}
} }
I override the AdjustCommand method and use a Regex to replace the tenantID. This works; not sure if there's a better way, but I really didn't want to have to open up NHibernate and start messing with core code.
You'll have to register this custom driver in the connection.driver_class property of the SessionFactory upon initialization.
Hope this helps somebody!

Query by key in Datastore with Dart

I have a List<Key> which I would like to retrieve the full data records for but with applying additional filtering to it.
I can retrieve them via dbService.lookup(Project, keys) but lookup doesn't allow me to apply additional filtering.
This is essentially what I want to do:
dbService.query(Project)
..filter('__key__ IN', keys)
..filter('acl_read IN', roles)
..run();
but since __key__ is not supported in Google Cloud's Dart implementation, I cannot run this query.
I could do:
projects = dbService.lookup(keys);
projects.removeWhere((project) => (project.acl_read.fold(false, (result, key) => result || members.contains(key))));
but this seems not like the right way of achieving this.
So what's the right way of doing this?
There isn't a server-based method to do what you're looking to do, so your method of post filtering on the client-side is how you'd do it..
Alternatively, if you know that all querying all the keys with your filter results in a small set of keys then what you have in List, then do a full query first and then find the Union of results and List

Entity Framework efficient querying

Lets say I have a model, Article that has a large amount of columns and the database contains more than 100,000 rows. If I do something like var articles = db.Articles.ToList() it is retrieving the entire article model for each article in the database and holding it in memory right?
So if I am populating a table that only shows the date of the entry and it's title is there a way to only retrieve just these columns from the database using the entity framework, and would it be more efficient?
According to this,
There is a cost required to track returned objects in the object
context. Detecting changes to objects and ensuring that multiple
requests for the same logical entity return the same object instance
requires that objects be attached to an ObjectContext instance. If you
do not plan to make updates or deletes to objects and do not require
identity management , consider using the NoTracking merge options when
you execute queries.
it looks like I should use NoTracking since the data isn't being changed or deleted, only displayed. So my query now becomes var articles = db.Articles.AsNoTracking().ToList(). Are there other things I should do to make this more efficient?
Another question I have is that according to this answer, using .Contains(...) will cause a large performance drop when dealing with a large database. What is the recommended method to use to search through the entries in a large database?
It's called a projection and just translates into a SELECT column1, column2, ... in SQL:
var result = db.Articles
.Select(a => new
{
Date = a.Date,
Title = a.Title
})
.ToList();
Instead of a => new { ... } (creates a list of "anonymous" objects) you can also use a named helper class (or "view model"): a => new MyViewModel { ... } that contains only the selected properties (but you can't use a => new Article { ... } as an entity itself).
For such a projection you don't need AsNoTracking() because projected data are not tracked anyway, only full entity objects are tracked.
Instead of using Contains the more common way is to use Where like:
var date = DateTime.Now.AddYears(-1);
var result = db.Articles
.Where(a => date <= a.Date)
.Select(a => new
{
Date = a.Date,
Title = a.Title
})
.ToList();
This would select only the articles that are not older than a year. The Where is just translated into a SQL WHERE statement and the filter is performed in the database (which is as fast as the SQL query is, depending on table size and proper indexing, etc.). Only the result of this filter is loaded into memory.
Edit
Refering to your comment below:
Don't confuse IEnumerable<T>.Contains(T t) with string.Contains(string subString). The answer you have linked in your question talks about the first version of Contains. If you want to search for articles that have the string "keyword" in the text body you need the second Contains version:
string keyword = "Entity Framework";
var result = db.Articles
.Where(a => a.Body.Contains(keyword))
.Select(a => new
{
Date = a.Date,
Title = a.Title
})
.ToList();
This will translate into something like WHERE Body like N'%Entity Framework%' in SQL. The answer about the poor performance of Contains doesn't apply to this version of Contains at all.

GQL query with "like" operator [duplicate]

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Resources