I have this query and it refuses to use an index, idk if it's because the "Expand" stage in the pipeline or what exactly, but I can't get it to use an index in this form, especially in the ORDER BY clause, it still gives me a "Sort" stage in the planner, and I'd like to avoid it.
The index is the createdAt property.
PROFILE
MATCH (u:User {user_id: '61c84762da4e457d55656efa'})-[follows:FOLLOWS]->(following:User)-[relatedTo:POSTED|SHARED]->(everything)
WHERE relatedTo.createdAt > datetime("2000-02-12T15:42:10.866+00:00")
RETURN u, relatedTo, everything
ORDER BY relatedTo.createdAt DESC
Here is a picture of the planner
The only way it does what I want it to do, is if I remove everything prior to the last relation, which obviously defies the point of that query but it was just for testing.
PROFILE
MATCH (following:User)-[relatedTo:POSTED|SHARED]->(everything)
WHERE relatedTo.createdAt > datetime("2000-02-12T15:42:10.866+00:00")
RETURN relatedTo, everything
ORDER BY relatedTo.createdAt DESC
Now it uses the index.
Any ideas how to do I get it to use an index in both, the query & the sort?
I'm not entirely clear why you want to use an index?
In your first query an index is used to find the :User node and then relationship pointers are followed to find the other nodes of interest. In Neo4j following relationship pointers is always faster than trying to use an index to find nodes (unlike a relational database). Typically, you only want to use an index to find your start nodes in a path, which is what your first query is doing.
If you really want to split the query to start the index search in a different part of the path you could split the query into multiple parts using WITH.
Related
So I understand that Neo4j 3.5 and above implements full-text search in cypher query via createNodeIndex(), e.g.:
CALL db.index.fulltext.createNodeIndex("myIndex", ["PersonNode"], ["name"])
where myIndex is an arbitrary variable I make up to store the index, PersonNode is the name of my Node label, and name is one of the attributes of PersonNode where I want the full-text search performed.
And to actually perform the search by name, I can do something like the following:
CALL db.index.fulltext.queryNodes("myIndex", "Charlie")
But now assume that PersonNode has a relationship of type PURCHASED_ITEM, which is connected to another node label ProductNode as follows:
PersonNode-[:PURCHASED_ITEM]->ProductNode
And assume further that ProductNode has an attribute called productTitle indicating the display title name for each product.
My question is, I would like to set up an index for this relationship (using, presumably, createRelationshipIndex()), and perform a full-text search by productTitle and return a list of all PersonNode that purchased the given product. How can I do this?
Addendum: I understand that the above could be done by first getting a list of all ProductNode instances matching the given title, then performing a normal cypher query to extract all related PersonNode instances. I also understand that for the above example, a normal cypher query would be all that I need. But the reason I'm asking this question is that I eventually need to implement a single search bar that would allow the user to input any text, including possible misspellings and all, and have it perform a search through multiple attributes and/or relationships of PersonNode, and the results need to be sorted by some kind of relevance score. And in order to do this, I feel I need to first grasp exactly how the relationship queries work in neo4j.
Here is an example of how to create a full-text index for the productTitle property of PURCHASED_ITEM relationships:
CALL db.index.fulltext.createRelationshipIndex("myRelIndex", ["PURCHASED_ITEM"], ["productTitle"])
And here is a snippet showing the use of that index:
CALL db.index.fulltext.queryRelationships("myRelIndex", "Hula Hoop") YIELD relationship, score
...
product title is the property of product node not the purchased item
Currently, the Rethink API documentation says that the get_nearest command only works on a table. Ofcourse I can filter the results afterwards, but that seems inefficient, plus that requires to have all the items sorted by distance when I want to limit the result to a specific number of items.
Is there a way I'm overlooking to get the closest results from a filtered list in one query?
The reason it only works on a table is because of mandatory index. Index only works on table level. Thinking of it a bit, that makes sense because it is an expensive query.
However, if you have a filtered list, the best you can do is to use distance and order by its result.
Something like this will work:
r.db('db').table('table')
.filter(function_to_filter)
.orderBy(function(doc) {
return r.distance('your_point_to_compare', doc('point'))
})
I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!
I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.
To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)
solr sort,i want Specify a particular document at the first
for example:
Results :5,2,3,1
I want 2 at the first ,Other sorted in accordance with the rules
2,1,3,5
how to do this ?
I know of two ways you can try to tackle this using Solr.
The first is to use the QueryElevationComponent. This lets you define the top results at index time. As suggested in the documentation, this is good for placing sponsored results or popular documents at the top of the search results. The potential downside is that you have to be able to identify those documents at index time and not at query time.
The other approach is to boost the desired documents at query time using the bq parameter. To boost document 435, you would do something like this:
...&bq=id:435^10
Unfortunately, neither of these approaches give you absolute control over the order of the results.
The solution provided by Riking would certainly do the job if you don't mind processing the results after performing the search. Another approach you could consider is to add a field to your Solr schema that defines a display order or priority. You can then sort on that field to get the desired sort order.
If you are using Solr 3.1 or later, you can sort by a function query. The map function is useful for this.
sort=map(field_name,5,5,0) asc
In the above, field_name is the name of the field you want to sort by, 5 is the value you want to push to the front and 0 must be replaced with some number that you know is less than all other numbers.
Call the builtin sort() function, then shift the desired element to the front.
Pseudocode, in case you do not have a builtin method to shift it to the front:
tmp = desired;
int dIndex = array.indexOf(desired);
for(i=dIndex-1; i >= 0; i--)
{
array[i+1] = array[i]
}
In case you use standart query (not dismax) add "OR id:2^1000" to you query. Like this:
q=(text:lalala AND author:Bob) OR id:2^1000
that will place document with ID=2 at the top of results.
Say I have an entity that looks a bit like this:
class MyEntity(db.Model):
keywords = db.StringListProperty()
sortProp = db.FloatProperty()
I have a filter that does a keyword search by doing this:
query = MyEntity.all()\
.filter('keywords >=', unicode(kWord))\
.filter('keywords <', unicode(kWord) + u"\ufffd")\
.order('keywords')
Which works great. The issue I'm running into is that if I try to put an order on that using 'sortProp':
.order('sortProp')
ordering has no effect. I realize why - the documentation specifically says this is not possible and that sort order is ignored when using equality filters with a multi-valued property (from the Google docs):
One important caveat is queries with both an equality filter and a
sort order on a multi-valued property. In those queries, the sort
order is disregarded. For single-valued properties, this is a simple
optimization. Every result would have the same value for the property,
so the results do not need to be sorted further. However, multi-valued
properties may have additional values. Since the sort order is
disregarded, the query results may be returned in a different order
than if the sort order were applied. (Restoring the dropped sort order
would be expensive and require extra indices, and this use case is
rare, so the query planner leaves it off.)
My question is: does anyone know of a good workaround for this? Is there a better way to do a keyword search that circumvents this limitation? I'd really like to combine using keywords with ordering for other properties. The only solution I can think of is sorting the list after the query, but if I do that I lose the ability to offset into the query and I may not even get the results with the highest sort order if the data set is large.
Thanks for your tips!
Workaround 1:
Apply stemming algorithms for keywords then you won't need to do a comparison look up.
Workaround 2:
Store all unique keywords in separate entity group ("table"). From this group find keywords which match your criteria. Then do query with keywords IN [kw1, kw2, ...]. Make sure that the number of matching keywords is not too big, for example you can select only first 10.
Workaround 3:
Reorder list of items on application side
Workaround 4:
Use IndexTank for full-text search, or apply for "Trusted Tester Program" as mentioned by #proppy.
Instead of doing prefix matches, properly tokenize, stem and normalize your strings, and do equality comparisons on them.