solr faceted categories with count - solr

I am completely new in solr and have the issue to have to go on developing our new search-engine, because my collegue is is not here anymore.
My problem:
I want to get facetted (hierarchical)categories with itemcount.
example
Search for 'Galaxy'
Items found: 123
Shown Categories:
Electronics (83)
Mobiles (60)
Tablets (23)
Smartphones (37)
.....
Books (40)
....
....
my category-fields (in solr) for each article contain several category-trees, seperated by comma.
e.g.:
"categories_raw": "Electronics/Mobiles/Tablets,Books/MobilePhones"
A query sent to my solr with the following parameters results facet_fields with item counts, but only with counts from the items own subcategory:
q=samsung&q.alt=samsung&...&facet=true&facet.field=categories_raw&facet.mincount=1
results (at the end of resulting JSON):
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"categories_raw": [
"Electronics/Mobiles/Smartphones",
37,
"Books/MobilePhones",
20,
....
How can I get a count on each category like on my example on top?
Is it possible to break down my hierarchical category-string in field "categories_raw" by solr? Did I miss something?
Hope someone could help ;) thx

Solr counts what you give it. Try using multivalued fields and putting the values you want to count in there. So:
Electronics/Mobiles/Tablets
Electronics/Mobiles
Electronics
Then, the facets will give you counts for each of those. To create those, you can either do them on client side or look at writing a custom Update Request Processor to do it when the entry is created (before indexing).
Solr has also some support to extra hierarchical information from the structured path.

Related

How do I facet on two fields at the same time to get combination facets and their count?

Request you to please help me in fixing the below facets problem. Thanks in advance. Also pls let me know if you do not understand any part of my explanation below.
How do I facet on more than two categories (let’s say ‘project’ and ‘type’ as discussed below) at the same time to get combination facets and their count ?
When you open URL, http://search-lucene.com/?q=facets you can see the facets on right hand side as 'Project','type','date','author' and their corresponding values with count in brackets.
For instance, let’s say you select 'solr(3366)' under 'Project' facet, still I can see other values under 'Project' facet like ElasticSearch etc. along with their respective count.
Project:
solr(3366) -- selected
ElasticSearch (1650)
Lucene (1255)
Lucene.Net (43)
Nutch (20)
PyLucene (17)
Mahout (16)
ManifoldCF (8)
Tika (4)
OpenRelevance (3)
Lucy (2)
type:
mail # user (2791)
issue (303)
mail # dev (134)
source code (82)
javadoc (37)
wiki (36)
web site (2)
Further when I Select 'mail # user(2791)' under “type” section , again I can see other values under “type” section with their corresponding count in brackets and their corresponding values in “Project” facet gets changed accordingly (namely the count ).
project:
Solr (2784) -- selected
ElasticSearch (1056)
Lucene (237)
Lucene.Net (24)
Nutch (14)
Mahout (10)
ManifoldCF (4)
Lucy (2)
OpenRelevance (1)
type
mail # user (2791) -- selected
issue (303)
mail # dev (134)
source code (82)
javadoc (37)
wiki (36)
web site (2)
Observe how solr(3366) changed to Solr (2784) post selection of mail # user along with the other values of ‘Project’ (like ElasticSearch etc.) and ‘type’ (issue, javadoc etc.,) with a change in their count values.
I want to achieve similar working functionality. Can you pls let me know if the below query is in the correct direction. Pls let me know if I have to modify this. if yes, what and how. Probably an explanation on why would do a huge help.
localhost:8080/solr/collection1/select?q=facets&fq=Project%3A(%22solr%22)&fq=type%3A(%22mailhashuser%22)&facet=true&facet.mincount=1&facet.field=project&facet.field=type&wt=json&indent=true&defType=edismax&json.nl=map
If the above query is not in the correct direction. Pls help in constructing the same. Thanks in advance.
Kind Regards,
Vamshi
Use Tagging and excluding Filters:
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
Example:
I have 2 Facets color and shape.
Color: Red Green yellow
Shape: Circle Square
So if I select "Red" then green and yellow should be present in facet list and all shapes as well.
The following query worked for me.
/select?q={!tag=dt1}color:red&fq={!tag=dt2}shape:*&facet=true&facet.field={!ex=dt1}color&facet.field={!ex=dt2}shape
Using the above solution I am able to resolve the exact issue that you have iterated in above question.
This solution will really help you in resolving your issue
Unable to use Tagging and excluding Filters
Let me know if it resolve your issue.
Regards,
Jayesh Bhoyar

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

How to return column that matched the query in Solr..?

I am using apache Solr for searching my database..!!
Suppose i have indexed 4 columns from one of my table..!!..I just want that only those columns that contains my query term are returned in response..!!..is that possible..??
For example :
I have a table cars with columns : name, displayName, description, extra ..!!
Now i make a query , something like :
localhost:8983/solr/select?q=maruti&wt=json
Now some in some rows only name may contain the word "maruti"
So, In return, i want only name (along with some other fixed fields like ID) ..
Similarly, If description contains this word, then only description should be returned..and not other columns..!!
How can i acheive this..??
You may be able to do this with Solr 4 and a custom transformer - my reading of the documentation would seem to indicate as much. But it would be quite a bit of work, I think. Ultimately you may have to write a front-end filter, but that would be difficult with complex queries.
Update:
Here's how to do this in Solr without custom transformers, etc. Enable highlighting for all four columns:
hl=on&hl.fl=name,displayName,description,extra
Solr will return a "highlighting" structure containing the key and the field(s) that match the query. You will also get highlighted snippets, whether you use them is up to you. See here for additional params: http://wiki.apache.org/solr/HighlightingParameters

Solr complicated faceting

I have problems with faceting. Imagine this situation. Product can be in more than one category. This is common behavior for faceting:
Category
Android (25)
iPhone (55)
other (25)
Now when I select "Android", I make new query with "fq" => "category:Android", I will get:
Category
Android
iPhone (15)
other (2)
But this means that there is 15 products, that are in categories "Android" AND "iPhone". I would like something like this: ("Android" OR "iPhone")
Category
Android
iPhone (+5)
other (+1)
Meaning I will get 25 results by selecting "Android (25)" and another 5 by selecting "iPhone (+5)", so finally I will get 30 search results..
Does anyone know if this is possible with SOLR's faceting? Or perhaps with more than one query and calculate it manually?
Thanks for advice!
Try a new query with the negative of the selections, like "fq" => "-category:Android" - you should then get the facet counts you are looking for.
Depending on all the permutations you need, you probably want to look into query facets that enable you to get counts for arbitrary queries. For instance, you can do facet.query=category:("Android" OR "iPhone") and get a count results keyed on category:("Android" OR "iPhone"). And, you can do this for any number of queries you want counts for. So, in your case, you can probably get to a final solution with some combination of straight field facets and query facets.
Edit: Re-reading you question, you may also want to look into tagging and excluding parts of an extra fq, depending on how you are allowing your users to "select into" the choices. (The example in the docs is fairly close to your original setup, although I'm not sure the end behavior is exactly as you desire).

SOLR: Is it it possible to index multiple timestamp:value pairs per document?

Is it possible in solr to index key-value pairs for a single document, like:
Document ID: 100
2011-05-01,20
2011-08-23,200
2011-08-30,1000
Document ID: 200
2011-04-23,10
2011-04-24,100
and then querying for documents with a specific value aggregation in a specific time range, i.e. "give me documents with sum(value) > 0 between 2011-08-01 and 2011-09-01" would return the document with id 100 in the example data above.
Here is a post from the Solr User Mailing List where a couple of approaches for dealing with fields as key/value pairs are discussed.
1) encode the "id" and the "label" in the field value; facet on it;
require clients to know how to decode. This works really well for simple
things where the the id=>label mappings don't ever change, and are
easy to encode (ie "01234:Chris Hostetter"). This is a horrible approach
when id=>label mappings do change with any frequency.
2) have a seperate type of "metadata" document, one per "thing" that you
are faceting on containing fields for id and the label (and probably a
doc_type field so you can tell it apart from your main docs) then once
you've done your main query and gotten the results back facetied on id,
you can query for those ids to get the corrisponding labels. this works
realy well if the labels ever change (just reindex the corrisponding
metadata document) and has the added bonus that you can store additional
metadata in each of those docs, and in many use cases for presenting an
initial "browse" interface, you can sometimes get away with a cheap
search for all metadata docs (or all metadata docs meeting a certain
criteria) instead of an expensive facet query across all of your main
documents.

Resources