I am working on two different searching tools: DtSearch and Solr. I do a FULL_TEXT search on one indexed search term ("2008/12/02") and unfortunately both give different hits though the data are the same. Another strange thing I notice is that Solr gives three DOC_ID as hits and DtSearch gives me five for the same search terms.
I am confused about date searching now. How can it be possible though the data are the same?
Do I need to apply some extra settings in config files? Is there any way I get consistent output?
Thank you,
Related
This is a general question that would like to get some input from the search community, so I don't have a piece of code to share just yet.
The objective is for a single document to get a list of similar and/or identical documents indexed by Azure Search - is that possible?
So given a document_id = 1 how do I get a list of the most similar documents to the specified id in the index? Ideally the outcome would be a list of documents order by a match of 0-100 - where 100 (%) would be an identical match.
I considering maybe taking the content of a given document and submitting that as part of the search, but that doesn't seem to be very elegant and it is also error prone in terms of constructing the query and the size of a document can be significant.
Thank you in advance for any suggestions or comments.
You could try using the preview feature "moreLikeThis" -> https://learn.microsoft.com/en-us/azure/search/search-more-like-this
I believe that's the closest Azure Search has to offer to what you want.
Edit 1: Be advised that this feature has limitations like non-support for complex types. Make sure it meets your requirements before taking a production dependency.
From reading the docs the search +term +another_term should return the same documents as term AND another_term. But I'm getting different results. Someone suggested that one of the terms is actually acting as an OR. But I thought the search queries were baked into SOLR.
Where in the Solr config would I check for this?
If you enable the debug flat in the admin UI when you run those two queries, it will show you what they get translated to on the lowest level after the Query Parser, etc. You can compare and see if something is different.
I´m quite new to the MoreLikeThis search in solr but i find one option is missing.
The wiki pages and google (and stack overflow) search says nothing about the document format of the returned value of a MLT-Search.
My aim is to get either all or at least a specified field-set in the returned documents, but it seams that one have no influence which fields are included in the similar documents.
Of course one can do a query for each of the documents from the moreLikeThis result to get those field but i don´t like the idea to do multiple queries where just one could really be sufficient.
I would really appreciate if anybody does knows a way to influence the result format of the documents.
Thanks.
We are planning on using Solr to show the users the "n" most frequent terms from a field and we want to apply stemming so that similar terms get grouped.
Now, we need to show the terms to the users but the stemmed terms are not always human readable. Is there any way to get an example of the original terms that got stemmed so that those could be shown to the user?
The only solution we can think of is quering two different fields, one with stemming and one without and then do the matching ourselves. But we think that is going to be expensive (two queries) and may be error prone (the matching may produce errors).
Is there any other way to implement this on Solr? Thanks in advance.
Stemming is applied at both query time and index time so I don't think there is an easy way to accomplish what you're trying to do. However, it may be possible, depending on the number of results in your database, to do this by employing a combination of faceting and highlighting. The highlighted term will be the entire matching term rather than the stemmed term (so, for example, the stemmed term might be "associ" but the highlighted terms will be "associated", "association", "associations", etc.). Perhaps what you could do is the following:
?q=keyword&facet=true&facet.field=myfield&&facet.limit=20hl=true&hl.fl=myfield&hl.fragsize=0&rows=10
Getting 10 rows and examining the highlighted results (by default, these are highlighted using <em> </em> tags but you can change this by using hl.simple.pre and hl.simple.post -- for example, using &hl.simple.pre=[&hl.simple.post=] would wrap the matching terms in square brackets) should at least give a sample of the "original" matching terms. hl.fragsize=0 returns the entire field along with highlighting.
Hope this helps. You can read more about highlighting parameters here:
http://wiki.apache.org/solr/HighlightingParameters
I'm using Apache Solr and querying an index with a schema that has a text field PostBody, a integer Userid field, and a trie based datetime field MostRecentActivityDate.
I'm attempting to apply query-time boosting to my select query such that more recent posts are boosted by some factor to assist in scoring. My values for this are in attempts to have a timescale of days rather than years as in many online date boosting examples.
The following two queries produce different results, the only thing being different in them is where the "code" for the boosting is actually placed (i.e. prior to or after the field conditionals themselves). In my testing I've also noticed that they both produce different results from when there is no {} boosting code, so its not as if in one case its being ignored.
Is anyone able to explain why they would produce different results? Thanks!
{!boost%20b=recip(ms(NOW,MostRecentActivityDate),1.16e-7,1,1)} (PostBody:"timmy is great and that is a fact") AND !Userid=2
Vs.
(PostBody:"timmy is great and that is a fact") AND !Userid=2 {!boost%20b=recip(ms(NOW,MostRecentActivityDate),1.16e-7,1,1)}
Since this will be very specific to your data, the best way to figure out what is happening, is to turn on query Debugging - via the debugQuery=on parameter of your search. Here are two links that help explain the debug output.
Debugging Search Applications Relevance - Explanations
Why does id:archangel come before id:hawkgirl when querying for "wings"