finding duplicate field values in solr - solr

Using Solr 4.3
I have a field "digest" in a solr index - and I would like to execute a query that will return me all the cases where there are duplicate values of digest. Can this be done?
For the records that have duplicate values, I would like to return other values - such as "url" which may not be duplicated.

You have two options, neither perfect.
You can use Grouping/Field Collapsing which will group by digest and can give you other fields, but does not allow you to avoid groups with only 1 element.
Or you can use Facets, which allow you to specify minimum number of elements for that facet value, but do not allow you to see which documents match that facet. Though you might be able to get something useful by using Pivot (nested) facets.

Related

Nested document searches in Solr complex parentFilter syntax

We are adding nested documents to our Solr index. For this purpose, we've added a solr_record_type field to each record, but there will be an interval while we are updating the index where the original documents will have null in this field. We would like to treat all of the original documents as root documents.
In our Solr index, solr_record_type equals 1 and the child types are represented by 2-4. So, in order to get backwards compatibility with what is currently returned by queries, I added this fq parameter:
-solr_record_type:[2 TO 4]
However, I am having trouble composing the parentFilter in the child transformer. For the fl field I've tried:
*,[child parentFilter="-solr_record_type:[2 TO 4]"]
This doesn't work because it then omits the _childDocuments_ section from the results for some reason. I don't know why. I need some way to specify that the parent filter is either "null or 1" or "anything but 2, 3, and 4". How can I do this?
I was unable to find a definitive reference for syntax for the parentFilter, only very simple examples.
A negative query needs to be prefixed with what it's going to remove the documents from. Think of it as the intersection between the two sets, and if you only have the set which are "these documents should be removed", you have nothing to remove them from.
The regular query parser (and the edismax handlers) append the set of all documents, *:* automagically in front of negative queries for you, so it appears to work - until you start with longer AND and OR statements involving negative queries, where you suddenly need to prefix *:* as well.
The same is the case in the parentFilter syntax - there is no inherent set of all documents automagically prefixed internally, so if you have a negative query, you'll have to add it yourself.
*,[child parentFilter="*:* -solr_record_type:[2 TO 4]"]

Hybris: Combine different solr facet under one

I have applied solr facet on properties of products.
Eg: The product can be either Medicine(0/1) or Drug(0/1) or Poison(0/1).
0 means NO, 1 means YES.
These are different features of a product hence appear as different facets. It is possible to display them under one facet instead eg: "Type", under which these three solr facet "Medicine", "Drug", "Poison" should display like:
Type
-----
Medicine (50)
Drug (100)
Poison (75)
Not sure about Hybris, but you should be able to do so with facet queries. You would have one facet query per each of your three conditions. In the UI, you can organize the counts anyway you want.
However, I am not sure why you can't just have a category field that contains a multi-valued field that contains Medicine and/or Drug and/or Poison value. Then faceting on that field would give you the breakdowns. If your values do not come in that way, you can probably manipulate them either with copyField or with a custom Update Request Processor chain to merge into one field.
This is super easy. Just make an IndexedProperty "Type" and a new custom ValueProvider for it. Then extract these values based on the boolean flags - just hard code if necessary. No need for anything more complex.
I tried the solutions posted here but they were not fitting my requirement. I did changes through facet navigation tag files to bring all classification attribute facets (Medicine, Drug, Poison) under a single facet (Type).

what's the difference between these two solr queries?

What's the difference between these two solr queries:
NOT name:*
NOT name:[* TO *]
Both of the two can return some results.But I can't discriminate the difference.
Based on reading the documentation for SOLR query.
NOT name:[* TO *]
removes all documents with the name and whatever value name contains, as shown on this documentation: https://wiki.apache.org/solr/SolrQuerySyntax
NOT name:*
removes all member fields belonging to name.
NOT is a keyword reserve in removing results of whatever field + value. They show you different results because if you specify a value to NOT name:[* TO *], you are bound to get results that exclude from the rule you've specified.
Keep in mind that SOLR query employs certain regex rules.

Select multiple values of same facet using IBM WCS v7 and Apache Solr

We use IBM WCS v7, with embeded Apache Solr. Solr is used as a search engine for our e-commerce based application.
As per a recent requirement, we want to use multi select facet functionality, where the user can check multiple facet attributes, and the corresponding values will be OR'ed to the search result.
Ex- I wish to check Color:RED, Color:BLUE and Color:BLACK in my default Search Results, so that each attribute value will be OR'ed in the resulting search results display.
We use the out-of-the-box SearchDisplayCmd, for our Search functionality, where the field "metaData=" takes care of history of the facets applied, and "facet=" takes care of applying a facet field. For the query param "metaData", it encodes the multiple facets into base64 encoding. It uses a special de-limiter to AND the different facet fields,and restrict the search results.
brand:"POLO" color:"RED" shape:"Oval"
I want to know, if there exists any such de-limiter or any alternatives by using which, I can perform an OR operation, on different values of the same facet attribute, and use "metaData" parameter to maintain a history of the applied facets.
Any help on the same front is highly appreciated. Any other approaches, on applying multiple values of the same facet attribute are also welcome.
Great Thanks in advance.
Regards,
Jitendriya Dash
I recently worked on this: Select multiple values of same facet
I was able to get it also.
Try to find where it hits the tag. The expression builder I used comes OOB. getCatalogNavigationView. Make sure you use the appropriate searchProfile.
Pass the facet param in this way.
<c:forEach var="facetSelect" value="paramValues.facet">
<wcf:param name="facet" value="facetSelect>
</c:forEach
But by this method you will not be able to select values from any other attributes. If someone knows how to select values from the same facet or different facet, pls share.
Update SELECTION column of FACET table to 1 to mark the facetable attribute as multi selectable.
In WCS7+, for enabling multi select facet functionality go to FACET table and set 'SELECTION' column value to 1 instead of 0.
If an attribute is to be made multi select facet, you can make the changes from CMC. Go to the attribute dictionary select the attribute and in facetable properties, check 'Allow multiple facet value'.

Solr group.facet=true will not return facet counts

As the title states, I can't get facet values or counts to return when using the group.facet=true parameter. group.truncate appears to return the correct values, but that's not what I'm looking for.
I started with the Solr 4 alpha, then Beta, and I'm now working on the nightly build from 9/5/2012.
I'm grouping by a single value field. The fields I am faceting on are a mix of single and multi value fields. I've simplified my query here MyFacetField represents a single value field.
Here are the grouping parameters:
group.field=GroupField
group.ngroups=true
group.facet=true
group=true
Facet set up like this:
f.MyFacetField.facet.limit=-1
f.MyFacetField.facet.mincount=1
f.MyFacetField.facet.sort=false
facet.field=MyFacetField
facet=true
Match all documents:
q:*:*
Again, my problems is:
When I specify group.facet=true I get the list of facet fields I specified in the request paramaters with no values and no counts.
WHen I specify group.facet=false (or leave the parameter out) I get facet values and counts for the ungrouped result set as expected.
According to the wiki this feature is included in solr4.
It turns out that the issue was in this parameter.
f.MyFacetField.facet.limit=-1
When limit is set to -1 (all) and group.facet=true, facet values and their respective counts are not returned by solr. I'm not sure if this is intended behavior, or not. This doesn't appear to be a requirement of truncate or of faceting in general with group.facet=false.

Resources