Will duplicated documents impact search results?

Will duplicated documents impact search results? - azure-cognitive-search

Will duplicated documents impact search results?
For example, we have an index that we can have the same documents repeated and different by only one field.
Index: ChannelID, ProductID, ProductName and ProductDescription
We may have the same product on different ChannelIDs. So, if we have 100 ChannelIDs, we will have 100 times the same product (document) if this product is available is on all channels.
When doing a search, because of these repetition of documents (same product name, description), will it impact the results quality?
Thanks.

Depending on your search, similar documents would all show up in search results. For example, in your ‘100 different channel ids but same product’ example, if one searches by product description (assuming the same product gets the same description), all of the 100 documents of that product would either be returned if the search matched or none of them will.

Related

Solr Boost-Function on Sales

I am using Apache Solr 8 with products as documents. Each document includes sales within the last X days that I want to boost, as well as a title and other fields.
Say productA has been sold 5 times, I want to boost it with score+10; a productB has been sold 50 times, I want to boost the score by 30.
I tried to use a boostFunction that looks like (edismax query parser)
q=Coffee&qf=title&bf=if(lt(sales,5),10,if(lt(sales,50),30))
Solr now returns documents that have nothing to do with my "Coffee"-Query but just match the boostfunction. There are even results with score "0".
E.g.
Rank;Score;Sales;Title
1;58.53;55;Coffee big
2;38.11;50;Coffee
3;30;55;Tea
Any idea to get rid of those "only boost function"-matches?

Found the answer!
My Query-Fields actually included boostings like
&qf=title^2 longDescription^0 whatever^0...
Instead of excluding the results found in those 0-boosted fields, solr adds them and matches with - well score 0.
When I remove the 0-boostings, everything works as intended.

Solr: Fetching results with a minimum from each category

I am using solr 4.4.0. The search is performed on products, each of which has a category field. I want to retrieve top n products. But, if some category has less than m products among the top n, then I want to retrieve more products only for those categories.
Eg. I have 4 categories a, b, c, d. n=20 and m=5. Now lets say the top 20(=n) have following category distribution (a:6, b:4, c:6, d:4). Categories b and d have less than m(=5) products. So I would like to fetch one more product(with the next highest score) for both these categories.
Is there a way I can do this using solr

Did you try to solve this with FieldCollapsing?
You use group.field=category, and group.limit lets you set the size of each group. Then you need to be a bit careful on how the groups are sorted, I think it was by the first doc in the group...
But I guess you can achieve what you are looking for fairly easy.

Solr: Searching a term in multiple, indexed fields and returning top 'N' hits from each search field

I have two indexed fields in my Solr schema
Employee Name
Manager Name
Which are plain strings.
my Question is: Given a search term, I want to display top 5 suggested completions from Manager Names and the next 5 from Employee Names.
I can use copy fields, but sometimes I get all top 10 results from Employee Names.
I have a hunch that boosting can help me.. but could not figure out how?

Boost can't help you control the results and distribute 5 each in the top 10 results.
Probably you can check on Field Collapsing, where you can group per role (Manager and Name) and limit 5 results for the group.
So you would have 2 groups returned back to you with 5 results each.

Grouping results and keeping facet counts consistent

Using Solr 3.3
Key Store Item Name Description Category Price
=========================================================================
1 Store Name Xbox 360 Nice game machine Electronic Games 199.99
2 Store Name Xbox 360 Nice game machine Electronic Games 199.99
3 Store Name Xbox 360 Nice game machine Electronic Games 249.99
I have data similar to above table and loaded into Solr. Item Name,
description Category, Price are searchable.
Expected result
Facet Field
Category
Electronic(1)
Games(1)
**Store Name**
XBox 360 Nice game machine priced from 199.99 - 249.99
What will be the query parameters that I can send to Solr to receive results above, basically I wan to group it by Store, ItemName, Description and min max price
And I want to keep paging consistent with the main (StoreName). The paging should be based on the Store Name group. So if 20 stores were found. I should be able to correctly page.
Please suggest

If using Solr 4.0, the new "Grouping" (which replaces FieldCollapsing) fixes this issue when you add the parameter "group.facet=true".
So to group your fields you would have add the following parameters to your search request:
group=true // Enables grouping
group.facet=true // Facet counts to be number of groups instead of documents
group.field=Store // Groups results by the field "Store"
group.ngroups=true // Tells Solr to return the number of groups found
The number of groups found is what you would show to the user and use for paging, instead of the normal total count, which would be the total number of documents in the index.

Have you looked into field collapsing? It is new in Solr 3.3.
http://wiki.apache.org/solr/FieldCollapsing

What I did is I created another field that grouped the required fields in a single field and stored it, problem solved, so now I just group only on that field and I get the correct count.

Solr - Sort search result based on facet count

I have information of products in Solr and each product is under a category. I would like to sort product search result based on facet count on Category. So if there are 100 products matching criteria under Electronics category and 50 products under Books, I would like to sort the result(or boost) the way that I see first 100 electronics and then 50 books.
Is it possible with one query?
Thanks.

I don't think this is possible; faceting does not influence search results.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Will duplicated documents impact search results? - azure-cognitive-search

Related

Solr Boost-Function on Sales

Solr: Fetching results with a minimum from each category

Solr: Searching a term in multiple, indexed fields and returning top 'N' hits from each search field

Grouping results and keeping facet counts consistent

Solr - Sort search result based on facet count

Categories

Resources