Solr Faceting on Multiple Concatenated Fields

Solr Faceting on Multiple Concatenated Fields - solr

I need a way to get facets on two combined field names. To show you what I mean, take a look at the query as it is now:
{
"responseHeader":{
"status":0,
"QTime":16,
"params":{
"facet":"true",
"indent":"true",
"q":"productId:(1 OR 2 OR 3 OR 4)",
"facet.field":["productMetaType",
"productId"],
"rows":"10"}},
"response":{"numFound":4,"start":0,"docs":[
{
"productId":1,
"productMetaType":"PRIMARY_PHOTO",
"url":"1_PRIM.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_1.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_2.JPG"},
{
"productId":2,
"productMetaType":"OTHER_PHOTO",
"url":"2_1.JPG"}]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"productMetaType":[
"PRIMARY_PHOTO",1,
"OTHER_PHOTO",3],
"productId":[
"1",3,
"2",1]},
"facet_dates":{},
"facet_ranges":{}
}
}
I get two facet fields, productMetaType and productId. What I need to do is somehow combine those fields so I get data back something like this:
1_PRIMARY_PHOTO, 1,
1_OTHER_PHOTO, 2,
2_PRIMARY_PHOTO, 0,
2_OTHER_PHOTO, 1
Does the pivot functionality do this? Unfortunately, we're running Solr 3.1, so pivot isn't available, but if that is the only way to do this, I might have some ammo for upgrading.
The only other thing I could think of was some how concatenating the field names. I am new to Solr and don't know what is possible. Any advice or assistance is appreciated. Thank you for your time.

Yes, Pivot would work do the trick, but as you observed, this feature is only available in Solr trunk.
Your idea to combine both fields would work too. Actually, if your fields have a limited number of values, the easiest and most flexible way to do this would be to use facet queries:
productId:1 AND productMetaType:PRIMARY_PHOTO
productId:2 AND productMetaType:OTHER_PHOTO
productId:1 AND productMetaType:OTHER_PHOTO
productId:2 AND productMetaType:PRIMARY_PHOTO
Otherwise, just create a new field in your Solr schema.xml with string type, recreate your index by adding your documents as previously, but with this new field (that you can generate as you wish, using '_' as a separator between the two field values would work perfectly).

Related

Can I find documents based on duplicated fields?

I have a Solr server with data under this format:
{
id: 1,
text_1: "some_text1",
text_2: "some_text2",
},
{
id: 2,
text_1: "some_text1",
text_2: "some_text2",
}
I need to find documents like the ones I wrote above. Documents that have the same "text_1" and "text_2" values but different ids.
I've tried using facets, but I'm not sure if it helps. Firstly, it only returns a count of the duplicates and I need the id's of these documents. Secondly, I'm not sure that faceting over multiple fields does what I want. I'm not sure that:
facet.field=text_1&facet.field=text_2 shows me a count of documents that have both those fields.
Thank you, I don't know much about Solr. Any help is greatly appreciated!

I think facets are your best bet to get this done, but as you noticed you will need to issue at least two queries: one to get the facets and another to fetch the actual documents that belong to the facet (i.e. the duplicates in your case)
To get the multi facets to work for what you are trying to do you'll need to use PivotFaceting (https://lucene.apache.org/solr/guide/7_0/faceting.html#pivot-decision-tree-faceting). The syntax is facet=on&facet.pivot=field1,field2
Make sure the field that you use for facets is a string field and not a text field.

How to increase score threshold in solr

I'm running the following select query to find a restaurant in a certain area using Solr:
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"name:\"Sushi Hiro\"",
"pt":"51.048688,-114.0778858",
"d":"0.2",
"fl":"*,score",
"fq":"{!geofilt sfield=location}",
"rows":"10000000",
"wt":"json",
"debugQuery":"true"}},
"response":{"numFound":1,"start":0,"maxScore":11.842687,"docs":[
{ .... }
However, Solr only returns the most similar document and it doesn't show me the rest. So what I want is to get at least 2 more documents which are also similar to my query. How can I modify the score threshold to get more results ?

There is no such thing as a "score threshold" - the documents returned are those that match your query. Those that haven't been included does not match the terms you've given in your query, in which you way what the requirements to be included in the query is.
In your example I guess the issue is that you're asking for documents located within 200m of the position given (d=0.2), and there is only one document within range that can be included.
If you want to sort (or boost) by the distance instead of limiting the results to those that are close by, take a look at spatial search and geodist.

Hybris: Combine different solr facet under one

I have applied solr facet on properties of products.
Eg: The product can be either Medicine(0/1) or Drug(0/1) or Poison(0/1).
0 means NO, 1 means YES.
These are different features of a product hence appear as different facets. It is possible to display them under one facet instead eg: "Type", under which these three solr facet "Medicine", "Drug", "Poison" should display like:
Type
-----
Medicine (50)
Drug (100)
Poison (75)

Not sure about Hybris, but you should be able to do so with facet queries. You would have one facet query per each of your three conditions. In the UI, you can organize the counts anyway you want.
However, I am not sure why you can't just have a category field that contains a multi-valued field that contains Medicine and/or Drug and/or Poison value. Then faceting on that field would give you the breakdowns. If your values do not come in that way, you can probably manipulate them either with copyField or with a custom Update Request Processor chain to merge into one field.

This is super easy. Just make an IndexedProperty "Type" and a new custom ValueProvider for it. Then extract these values based on the boolean flags - just hard code if necessary. No need for anything more complex.

I tried the solutions posted here but they were not fitting my requirement. I did changes through facet navigation tag files to bring all classification attribute facets (Medicine, Drug, Poison) under a single facet (Type).

stats.field and stats.facet doesn't seem to be working right

I'm storing a series of records that contain a section_url and a count fields.
I'm trying to aggregate by section_url and sum the count field, so I'm querying with the following params:
"params":{
"indent":"true",
"stats.field":"count",
"stats":"true",
"q":"section_url:tv AND domain:[* TO *] AND date:\"2014-12-22T00:00:00Z\"",
"stats.facet":"section_url",
"wt":"json"}}
As you can see, I'm running stats on the count field and stats faceting on section_url.
Most of the times this is working fine, but for some reason it's being buggy for some fields. For example, this result My result
As you can see, all the section_url's are http://www.cb10.tv/ . However, for some reason my stats faceting seems to think section_url is actually two fields, www.cb10 and tv instead of http://www.cb10.tv/
Any idea of what could be the problem?

Seems like section_url is tokenized as you get several tokens for each entry. Faceting is performed on the indexed token, which means that you end up with a count for each token in the indexed content - and not for the content of the field itself.
Add a StrField (or a TextField with a KeywordTokenizer) and do a copyField to populate it (or change the existing field), and reindex your content. Use that field for generating the facet counts instead.

Avoiding duplicate records in Solr server

In my solr admin i am quering like this -
http:// :8080/solr/realestatecategory/select?q=%3A&fl=propcategory&wt=json&indent=true
it returns me records like
{
"responseHeader":{
"status":0,
"QTime":4},
"response":{"numFound":4,"start":0,"docs":[
{
"propcategory":"Residential Property"},
{
"propcategory":"Residential Property"},
{
"propcategory":"Commercial Property"},
{
"propcategory":"Invest"}]
}}
i want to avoid duplicate records like "Residential Property". how to do that ?
answer the question quickly as i badly need help.

You would need to use grouping.
Use group=true along with group.field=propcategory.
For details have a look here.

If you just want a unique list of the Records you can check Solr faceting as well.
It will give a list of Items with the Count.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr Faceting on Multiple Concatenated Fields - solr

Related

Can I find documents based on duplicated fields?

How to increase score threshold in solr

Hybris: Combine different solr facet under one

stats.field and stats.facet doesn't seem to be working right

Avoiding duplicate records in Solr server

Categories

Resources