Avoiding duplicate records in Solr server - solr

In my solr admin i am quering like this -
http:// :8080/solr/realestatecategory/select?q=%3A&fl=propcategory&wt=json&indent=true
it returns me records like
{
"responseHeader":{
"status":0,
"QTime":4},
"response":{"numFound":4,"start":0,"docs":[
{
"propcategory":"Residential Property"},
{
"propcategory":"Residential Property"},
{
"propcategory":"Commercial Property"},
{
"propcategory":"Invest"}]
}}
i want to avoid duplicate records like "Residential Property". how to do that ?
answer the question quickly as i badly need help.

You would need to use grouping.
Use group=true along with group.field=propcategory.
For details have a look here.

If you just want a unique list of the Records you can check Solr faceting as well.
It will give a list of Items with the Count.

Related

Can I find documents based on duplicated fields?

I have a Solr server with data under this format:
{
id: 1,
text_1: "some_text1",
text_2: "some_text2",
},
{
id: 2,
text_1: "some_text1",
text_2: "some_text2",
}
I need to find documents like the ones I wrote above. Documents that have the same "text_1" and "text_2" values but different ids.
I've tried using facets, but I'm not sure if it helps. Firstly, it only returns a count of the duplicates and I need the id's of these documents. Secondly, I'm not sure that faceting over multiple fields does what I want. I'm not sure that:
facet.field=text_1&facet.field=text_2 shows me a count of documents that have both those fields.
Thank you, I don't know much about Solr. Any help is greatly appreciated!
I think facets are your best bet to get this done, but as you noticed you will need to issue at least two queries: one to get the facets and another to fetch the actual documents that belong to the facet (i.e. the duplicates in your case)
To get the multi facets to work for what you are trying to do you'll need to use PivotFaceting (https://lucene.apache.org/solr/guide/7_0/faceting.html#pivot-decision-tree-faceting). The syntax is facet=on&facet.pivot=field1,field2
Make sure the field that you use for facets is a string field and not a text field.

Solr query by specific fields

I use solr , latest version.
I run text query with "OR" condition by different fields.
I want to have indication due to which field the document return.
How can I do it?
Faceting could be an option here. In your Solr Query set "facet" to true. You also need to set "facet.field" to the fields that you are including in OR search criteria. The Solr Response will then show you how many results are returned for each of the search fields.
Here is the reference - https://cwiki.apache.org/confluence/display/solr/Faceting
-Amit
If you only want to specify which fields do you want to get back in the response, then you're after the fl parameter, if you do a request like:
http://localhost:8983/solr/demo/query?
q=title_t:black
fl=author,title
You're indicating that you only want to get back the author and title fields, something like:
{"response":{"numFound":2,"start":0,"docs":[
{
"title":"The Black Company",
"author":"Glen Cook"},
{
"title":"The Black Cauldron",
"author":"Lloyd Alexander"}]
}}

Solr facet with additional metadata

Is it possible to use additional metadata fields when using Solr facets? I would like to aggregate one attribute by counting them and desplaying the related group as additional metadata field.
http://localhost:8983/solr/gitIndex/select?indent=on&q=*:*&rows=0&wt=json&
json.facet={
Repository_s: {
type: terms,
field: Repository_s,
limit: 10,
facet: {
x:"count()"
}
}
}
The result should look like this:
...
"facets":{
"count":1354013,
"<name of attribute>":{
"buckets":[{
"val":"<value of attribute>",
"count":173997,
"<metadata_field>":<value of metadata_field>},
...
A solution is to use facet pivots - it'll get you any values in a secondary field under each facet, and if the value is unique for the set of documents, it'll just be a single value.
The reference guide has the syntax for non-json facets.

Solr Faceting on Multiple Concatenated Fields

I need a way to get facets on two combined field names. To show you what I mean, take a look at the query as it is now:
{
"responseHeader":{
"status":0,
"QTime":16,
"params":{
"facet":"true",
"indent":"true",
"q":"productId:(1 OR 2 OR 3 OR 4)",
"facet.field":["productMetaType",
"productId"],
"rows":"10"}},
"response":{"numFound":4,"start":0,"docs":[
{
"productId":1,
"productMetaType":"PRIMARY_PHOTO",
"url":"1_PRIM.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_1.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_2.JPG"},
{
"productId":2,
"productMetaType":"OTHER_PHOTO",
"url":"2_1.JPG"}]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"productMetaType":[
"PRIMARY_PHOTO",1,
"OTHER_PHOTO",3],
"productId":[
"1",3,
"2",1]},
"facet_dates":{},
"facet_ranges":{}
}
}
I get two facet fields, productMetaType and productId. What I need to do is somehow combine those fields so I get data back something like this:
1_PRIMARY_PHOTO, 1,
1_OTHER_PHOTO, 2,
2_PRIMARY_PHOTO, 0,
2_OTHER_PHOTO, 1
Does the pivot functionality do this? Unfortunately, we're running Solr 3.1, so pivot isn't available, but if that is the only way to do this, I might have some ammo for upgrading.
The only other thing I could think of was some how concatenating the field names. I am new to Solr and don't know what is possible. Any advice or assistance is appreciated. Thank you for your time.
Yes, Pivot would work do the trick, but as you observed, this feature is only available in Solr trunk.
Your idea to combine both fields would work too. Actually, if your fields have a limited number of values, the easiest and most flexible way to do this would be to use facet queries:
productId:1 AND productMetaType:PRIMARY_PHOTO
productId:2 AND productMetaType:OTHER_PHOTO
productId:1 AND productMetaType:OTHER_PHOTO
productId:2 AND productMetaType:PRIMARY_PHOTO
Otherwise, just create a new field in your Solr schema.xml with string type, recreate your index by adding your documents as previously, but with this new field (that you can generate as you wish, using '_' as a separator between the two field values would work perfectly).

Solr Faceting Restriction Over Two Fields

Im trying to achieve the following in SOLR via faceting.
I want to return all the MODEL fields where the MAKE field = 'FORD'
http://wa12-d17251.print.tradermedia.co.uk:8080/solr/select/?q=make:FORD&fq={!geofilt}&sfield=location&pt=51.5375,-0.1934&d=5&facet=true&facet.field=model&facet.query=make:FORD&rows=0
Ignore the geoLocation stuff.
What i get back is all of the other MODELS which i understand why as they are not joined in any way.
How would i configure solr to just return Models where the Make is 'X'
Any help appreciated.
Thanks
Ben
I want to return all the MODEL fields where the MAKE field = 'FORD'
I assume you meant "model values" instead of "model fields"
q=*:*&fq=make:FORD&facet=true&facet.field=model
If you're using Solr 4+, you can use pivoted faceting (Group facets by make) then select the elements under 'Ford'.
&facet=true&facet.pivot=MAKE,MODEL

Resources