I am currently using Azure Search to perform product searches on my website.
I have the following indexes:
A: Index with 55,000 documents
B: Inde with 16 documents
All documents in index B were filled with index A documents
When performing a simple search in the 2 indices with the same parameters the results are not what I expect.
Example:
Index A:
Query String: search=kfc
Result sorted by search.score descending:
ProductoName - search.score
KFC Product1 - 1.6514521
KFC Product2 - 1.5482594
Index B:
Query String: search=kfc
Result sorted by search.score descending:
ProductoName - search.score
KFC Product2 - 0.21555252
KFC Product1 - 0.13616839
I am surprised the order of the results by search score changes, because they are exactly the same data only the amount of documents changes
The amount of documents affect in the assignment of search score ?, Could you indicate where I can read about it, I look in the documentation but I did not find anything about it
Could you explain to me why the order of the products is affected if it is the same information? :(
The Index has no Scoring Profile and is exactly the same information
Your analysis is correct, scoring (and thus ranking) is indeed affected by the number of documents in the index. To compute scores we use some statistical characteristics of the data corpus, such as the frequency of each term across the entire corpus and within each document.
The article How full text search works in Azure Search explains this in great detail. In particular, the section on Scoring goes into how frequencies (term frequency, document frequency) are used.
Related
I have a secondary index which stores the search terms executed on a primary index for searching documents. I want to run a search on the secondary index and list down the search terms in descending order of frequency of execution like I want to find the top 10 most searched terms.
The secondary index stores data in this format
Search Term | Date ...<some more irrelevant fields>
term1 | 01-01-2018
term2 | 01-01-2018
term3 | 02-01-2018
term1 | 02-01-2018
term3 | 03-01-2018
I need something like this which I can use java to manipulate. So any json with the search term and frequency from solr is okay.
Searh Term, Frequency
term1, 2
term2, 1
term3, 2
I have looked up some articles which state the use of Term Vector Component but those articles run search on the number of times a specific term exists in a document.
Can someone help me to get the desired result.
Thanks
You can use faceting to tally how often a given token appears in a field.
&facet=true&facet.field=term&facet.sort=count
There are also many other parameters you can give, such as to order by term or count.
I'm generating facet counts for a multivalued field and sorting them by index in order to see them in alphabetical order. Given a particular facet prefix, I would like to jump to its place in the facet count list and show the facet counts surrounding it. For example, if my prefix is "wha" then I would want the following returned (four before and four after):
weld 1
welsh 5
west 4
wetland 1
whale 99
wheat 123
wheel 1
whey 9
There are millions of values in the field and so I can't just ask for them all. I need to be able to jump to that location or use some kind of filter on the facet counts themselves. I've tried using facet.offset, but I have to basically do a binary search in order to find the appropriate offset which is too slow.
I could probably get close enough if I could put in a range for a facet prefix. For example facet.prefix=[we TO wk] or even multiple prefixes like facet.prefix=we,wf,wg,wh,wi,wj,wk.
I'm currently using other non-Solr solutions to accomplish this, but I would like to use Solr 6.6 in order to take advantage of filter queries.
I am using SOLR with mongoDB in one of my projects for search. I must say, SOLR is very powerful.
Currently, I am looking for a method to set different scores for different keywords if query is multi word.
e.g. If a user searches of black doll house
the weightage of black should be greater than doll and weightage of doll should be greater than house.
black > doll > house
Is it possible to implement this in SOLR. If yes, how?
You can give a separate weight to each term in the standard lucene query syntax (searching in a field named text):
text:black^10 text:doll^5 text:house
This will give black ten times as much weight as house, and doll five times a much weight as house, but only half the weight of black. You'll have to tweak the weights to get the results you're looking for. If you want to use the regular text in the q= field with (e)dismax as the query parser, you can use bq to add apply these boosts separately from the query itself.
Did you try boosting the terms in the query. you can specify different boost value for a term in the query.
example: if you transform your query to :
textfeild:black^6 textfeild:doll^5 textfeild:house^2
you get results with top documents will be matched for black, next black, next with house.
it multiplies term weight with boost value. here black with 6, doll with 5 and house with 2.
i work with solr, i can't fix my problem of result's accuracy (q vs bf taking into account accents)
i have a solr index with 2 fields indexed (this is simplified):
town, population
Félines, 100
Ferrand, 10000
when i query: q=Fé&qf=town town_ascii&bf=population^2&defType=dismax
I'd like this order on my results : Félines > Ferrand.
When i query: q=Fe&qf=town town_ascii&bf=population^2&defType=dismax I'd like this order on my results : Ferrand > Félines
The trouble is that Ferrand beats every time Félines because its population is bigger, how can i solve that? I didn't find how to use the score of the query and use it in bf to balance population
You didn't post your schema.xml but I suppose you're using the ASCIIFoldingFilterFactory for the town_ascii field. It means that if you're indexing the word Félines the following are the indexed terms:
town: Félines
town_ascii: Felines
Therefore, you're saying that a match for the town field is more important than a match for town_ascii. You should change the qf parameter to something like qf=town^3 town_ascii to give more weight to the town field. Then you can adjust the weight depending on what is the desired weight for town compared to population.
I have information of products in Solr and each product is under a category. I would like to sort product search result based on facet count on Category. So if there are 100 products matching criteria under Electronics category and 50 products under Books, I would like to sort the result(or boost) the way that I see first 100 electronics and then 50 books.
Is it possible with one query?
Thanks.
I don't think this is possible; faceting does not influence search results.