How to do a grouping(or range query) in Solr facet - solr

All:
[UPDATE] Thanks for Andrea's answer, currently I am using this way:
If you send also a facet.mincount=1 then the start bound is just an
indication because Solr will return only those values (i.e. ranges)
that have at least 1 document inside. So for instance you could
indicate a very low value as start.
Note: This is only proved works for my case, so use your judgement
&facet=true&facet.range=createDate&facet.range.gap=%2B2DAY&facet.range.end=NOW&facet.range.start=NOW-1000DAY&facet.mincount=1
I am pretty new to Solr facet. Suppose that I have some documents which has createDate:
"2015-03-23T17:59:00Z",
"2015-03-23T22:13:00Z",
"2015-03-17T20:48:00Z",
"2015-03-19T17:43:00Z",
"2015-03-19T21:58:00Z",
"2015-03-16T19:13:00Z",
"2015-03-16T22:26:00Z",
"2015-03-13T21:33:00Z",
"2015-03-13T21:39:00Z",
"2015-03-13T23:27:00Z",
"2015-03-16T16:46:00Z",
"2015-03-18T17:44:00Z",
"2015-03-18T18:10:00Z",
"2015-03-18T18:11:00Z"
.......
My questions are :
[1] How to get the result range by DAY?
[2] How to get facet result grouped by WEEK( or let us say 7 days)?
Thanks

It is called range faceting. I suggest you to read the Solr wiki but in the meantime:
facet.range=(field name)
facet.range.start=(start date or value)
facet.range.end=(end date or value)
facet.range.gap=(interval value or expression)
In your case you should use +1DAY or +7DAYS as facet.range.gap.
There are also other parmeters so it's better if you have a look at the wiki

Related

Solr facet on subset of documents

I have Solr documents that can have 3 possible states (state_s in {new, updated, lost}). These documents have a field named ip_s. These documents also have a field nlink_i that can be equal to 0.
What I want to know is: how many new ip_s I have. Where I consider a new ip is an ip that belong to a document whose state_s="new" that does not appear in any document with state_s = "updated" OR state_s = "lost" .
Using Solr facet search I found a solution using the following query parameters:
q=sate_s:"lost"+OR+sate_s:"updated"
facet=true&facet.field=ip_s&facet.limit=-1
Basically, all ip in
"facet_fields":{
"ip_s":[
"105.25.12.114",1,
"105.25.15.114",1,
"114.28.65.76",0,
...]
with 0 occurence (e.g. 114.28.65.76) are "new ips".
Q1: Is there a better way to do this search. Because using the facet query describe above I still need to read the list of ip_s and count all ip with occurence = 0.
Q2: If I want to do the same search, (i.e. get the new ip) but I want to consider only documents where nlink_i>0 how can I do?. If I add a filter : fq=nlink_i:[1 TO *] all ip appearing in documents with link_i=0 will also have their number of occurrence set to 0. So I cannot not apply the solution describe above to get new ip.
Q1: To avoid the 0 count facets, you can use facet.mincount=1.
Q2: I think the solution above should also answer Q2?
Alternatively to facets you can use Solr grouping functionality. The aggregation of values for your Q1 does not get much nicer, but at least Q2 works as well. It would look something like:
select?q=*:*&group=true&group.field=ip_s&group.sort=state_s asc&group.limit=1
In order for your programmatic aggregation logic to work, you would have to change your state_s value for new entries to something that appears first for ascending ordering. Then you would count all groups that contain a document with a "new-state-document" as first entry. The same logic still works if you add a fq parameter to address Q2.
I found another solution using facet.pivot that works for Q1 and Q2:
http://localhost:8983/solr/collection1/query?q=nbLink_i:[1%20TO%20*]&updated&facet=true&facet.pivot=ip_s,state_s&facet.limit=-1&rows=0

How do I create a Solr query that returns results even if one field in my query has no matches?

Suppose I want to create a recommendation system to suggest people you should connect with based off of certain attributes that I know about you and attributes I have about other people that are stored in a Solr index. Is it possible to query the index with a list of attributes (along with boosts for each attribute) and have Solr return scored results even if some of my fields return no matches? The way that I understand that Solr works is that if one of your fields doesn't contain a match in any documents found in your index, you get zero results for the entire query (even if other fields in the query matched) - is that right? What I would hope is that I could query the index and get a list of results back in order of a score given based on how many (and which) fields matched to something, even if some fields have no matches, for example:
Say that there are 2 people documents stored in the index as follows (figuratively):
Person 1:
Industry: Manufacturing
City: Oakland
Person 2:
Industry: Manufacturing
City: San Jose
And say that I perform a pseudo-Solr query that basically says "Search for everyone whose industry is equal to manufacturing and whose city is equal to Oakland". What I would like is to receive both results back in the result set, even though one of the "Persons" does not reside in Oakland. I just want that person to come back as a result with a lower score than Person1. Is this possible? What might a solr query look like to handle this? Assume that I have many more than 2 attributes for each person (so saying that I can use "And" and "Or" in my solr query isn't really feasible.. or is it?) Thanks in advance for your helpful input! (PS I'm using Solr 3.6)
You mention using the AND operator, which is likely your problem.
The default behavior of Lucene, and Solr, query syntax is exactly what you are asking for. A query like:
industry:manufacturing city:oakland
Will match either, with scoring preference on those that match both. See the lucene query syntax documentation
You can use the bq parameter (boost query) does not affect matching, but affects the scores only.
http://localhost:8983/solr/persons/select?q=industry:manufacturing&bq=City:Oakland^2
play with the boosting factor at the end to get the correct balance between matching score, and boosting score.

SolrNet query to get records between to fields value

hi guys i have started working on a project which need solr implemtation for searching.
I am using SolrNet Lib and my question is:
I have two field in solr index Maxsal and Minsal and i have Currentsal parameter which contains salary amount. What i want is, get all records which satisfy this condition:
currentsal< Maxsal && currentsal> Minsal
Take a look at Solr range query. It should allow to create query like this
minsal:[* TO PARAM] AND maxsal:[PARAM TO *]
For more information look here - http://www.solrtutorial.com/solr-query-syntax.html
Never noticed that Query() take string parameter too.
So,
Solr.Query("MaxSal<="+parameter && MinSal>=parameter")

Solr - How do I sort by geospatial distance and return the distance?

Doing a Bbox search with only location is returning accurate data; but if we add more search parameters, the returned distance score gets wrong-
For e.g-
case 1:
http://devtsg.truckertools.com/solr-4.4.0/collection1/select?wt=json&rows=1&fl=*,score&sort=score asc&q={!bbox score=distance sfield=geo pt=33.3232,-83.383 d=150}
-it returns correct distance for the store- "score":0.02656421
case 2:
But if I add another checking, with Bbox, it returns wrong distance-score
http://devtsg.truckertools.com/solr-4.4.0/collection1/select?wt=json&rows=1&fl=*,score&sort=score asc&q=({!bbox score=distance sfield=geo pt=33.3232,-83.383 d=150} AND :)
-the above one returns "score":0.7258905 , which is wrong. It should be same as the above one.
case 3:
Just to make sure, have added a checking with the id of the store-
http://devtsg.truckertools.com/solr-4.4.0/collection1/select?wt=json&rows=1&fl=*,score&sort=score asc&q=({!bbox score=distance sfield=geo pt=33.3232,-83.383 d=150} AND id:9220)
-now this one also returns wrong distance- "score":9.05333
Am not getting whats going wrong here.
Thanks in advance.
Put each 'AND'ed part of your query into Solr filter queries ('fq' param), and leave 'q' for keyword search relevancy. In your field list ('fl' param) you can put a function query to return the distance: fl=*,dist:geodist(). Other params like 'pt' and 'sfield' are required. To sort, use sort=geodist() asc.
However, you can't use the geodist() function query with a spatial "RPT" field in versions of Solr prior to v4.5. I see you are using 4.4. If you need to sort on an RPT field (only needed if you have multiple locations) in Solr 4.2 thru 4.4 then you have to approach this differently, and your attempt is close. I suggest always using 'q', and 'fq' params as you normally should use them (keyword and filters, respectively). Consider this echoParams output of my query to Solr:
"indent":"true",
"wt":"json",
"sort":"query({!bbox v='' filter=false score=distance}) asc",
"fl":"*,score,dist:query({!bbox v='' filter=false score=distance})",
"sfield":"geo",
"pt":"33.3232,-83.383",
"d":"150",
"q":"*:*",
"fq":"{!bbox}",
"fq":"id:9220"
Yeah, it's ugly. Again, as of Solr 4.5 you no longer have to resort to this.
By the way, the behavior you see is actually not a bug. You need to compose your query differently to get the results you want.

Solr 1.4 Date Facet Include

My goal is to get a document count for each month over the past year. Here is the faceted query I am using against Solr 1.4:
q=*:*
rows=0
facet=on
facet.date=myDateField
facet.date.start=NOW-11MONTH/MONTH
facet.date.end=NOW+1MONTH/MONTH
facet.date.gap=+1MONTH
The ranges this query produces are 2013-01-01T00:00:00Z to 2013-02-01T00:00:00Z, which is inclusive for the upper bound, meaning T00:00:00Z on the first of every month is being counted in 2 different ranges.
Solr 3.1 introduces the facet.date.include parameter that would solve my problem, except upgrading right now is not an option. Is there a workaround to achieving the same functionality? I tried facet.date.gap=+1MONTH-1SECOND which is close, but not close enough. It produces something like this where the end date is not correct:
2012-09-01T00:00:00Z
2012-09-30T23:59:59Z
2012-10-30T23:59:58Z
2012-11-30T23:59:57Z
2012-12-30T23:59:56Z
2013-01-30T23:59:55Z
2013-02-28T23:59:54Z
2013-03-28T23:59:53Z
2013-04-28T23:59:52Z
What you are asking can be done with facet queries instead of facet range.
Try something like this:
facet.query=myDateField:[NOW-11MONTH/MONTH TO NOW-10MONTH/MONTH]
facet.query=myDateField:[NOW-10MONTH/MONTH TO NOW-9MONTH/MONTH]
facet.query=myDateField:[NOW-9MONTH/MONTH TO NOW-8MONTH/MONTH] ...
and so on.
Now you have ful control over any single facet, so you can do -1DAY in the last facet if you need to.
Have a look at the reference for date math syntax:
http://lucene.apache.org/solr/4_4_0/solr-core/org/apache/solr/util/DateMathParser.html

Resources