Solr 4 - query with both group.facet and group.func fails - solr

When doing a search like
http://localhost:8983/solr/select?group=true&group.func=product(fildname1,fieldname2)&group.facet=true&facet=true&facet.field=fieldname3
an error is returned in response where facets are normally returned:
java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:358) ...
The function used can be any function, not product only. There is no such error if group.facet is omitted or group.field is used instead of group.func. It seems that group.field parameter is expected to be defined when calculating grouped facets.
The question: is there another way to use both query functions, or an appropriate work-around, or a tip on where in Solr source to look into this?
This question cross posted from a Solr Jira issue 3742. This issue relates to Solr 4 beta that launched very recently.

Related

Solr query to include documents with no indexed value for specific field

I am in the process of updating our system from Solr 4.1.0 to Solr 8.1.4. (Yes, I understand that is not the latest version available, but that is what has been approved for our system).
We regularly submit queries to find documents that "overlap" a time range. Let's say we have indexed fields "starttime_date" and "endtime_date". In case it matters, these fields were indexed as type TrieDateField in Solr 4.1.0, and in Solr 8.1.4 the fields are of type DatePointField.
Part of these "find overlapping documents" queries is to include any document that doesn't have an endtime_date value yet. So, the query would look like this:
(starttime_date:[* TO 2021-02-19T17:00:00.000Z] AND (endtime_date:[2021-02-19T15:00:00.000Z] OR (*:* NOT endtime_date:*)))
This should find all documents that started before 02/19/2021 at 17:00Z, and either haven't ended, or ended before 02/19/2021 at 15:00Z. I have it wrapped in parens here because this group of clauses is almost always "AND"ed with other clauses. Those other clauses are not what I am concerned about for this question.
This solution was built based on this answer to a similar question: https://stackoverflow.com/a/28859224/3586783
This solution worked in Solr 4.1.0, but doesn't appear to work in Solr 8.1.4. As soon as I add the OR (*:* NOT endtime_date:*) clause, it seems to match all documents. I have tried using -endtime_date:*, -endtime_date:[* TO *], !endtime_date:*, !endtime_date:[* TO *], and none of these have worked.
Is this something related to the change in field type (TrieDateField to DatePointField)? Our query syntax has not changed, but it appears that Solr is processing the query differently now.
Please let me know if more information is needed to understand the issue.

Syntax for Solr Query with Cassandra Datastax integration

I'm trying to use DataStax Cassandra/Solr integration to do a facet query with both pivot facets and interval facets
My query look like this:
select * from data where solr_query='{"facet":{"limit":5,"pivot":"event_type,key","interval":"past_visits","f.past_visits.facet.interval.set":["{!key=visit_13_month}[NOW-13MONTH/MONTH,NOW]","{!key=visit_1_month}[NOW-1MONTH/DAY,NOW]"]},"q":"*:*"}']
The error that I am getting back seems to show that the required parameter is not being set (but it is)
08:30:38.244 [New I/O worker #4] WARN c.d.driver.core.RequestHandler - /10.239.133.151:9042 replied with server error (Missing required parameter: f.past_visits.facet.interval.set (or default: facet.interval.set)), trying next host.
When i run an equivalent query directly to Solr (using query params), it works as expected.
/data/select?q=*:*&facet=true&facet.pivot=event_type,key&facet.limit=5&facet.interval=past_visits&f.past_visits.facet.interval.set=%7B!key=visit_13_month%7D[NOW-13MONTH/MONTH,NOW]&f.past_visits.facet.interval.set=%7B!key=visit_1_month%7D[NOW-1MONTH/DAY,NOW]"
I'm trying to follow the Datastax documentation at this link:
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchJSON.html
There seems to be something wrong with the way that I am creating the JSON for the Datastax Solr query, but I do not see what I should change.
We don't currently support facet intervals via CQL. Anyway, you don't need to repeat "facet", so it should eventually be something like the following:
select * from data where solr_query='{"facet" {"limit":5,"pivot":"event_type,key","interval":"past_visits","f.past_visits.interval.set":["{!key=visit_13_month}[NOW-13MONTH/MONTH,NOW]","{!key=visit_1_month}[NOW-1MONTH/DAY,NOW]"]},"q":"*:*"}']

Displaying information about SolR search result

I am struggling with a little problem where I have to display relevant information about the resultset returned from SolR but can't figure out how to calculate it without iterating the results (bad).
Basically I am storing my documents with a state field and while the search is supposed to return all documents, the UI has to show "Found 15 entities, 5 are in state A, 3 in state B and 8 in C".
At the moment I am using a rather brittle approach of running the query 3 times with additional scoping by type, but I'd rather get that information from the one query I am displaying. (There have been some edge cases where the numbers don't add up and since SolR can return facets I guess there has to be a way to use that functionality in this case)
I am using SolR 3.5 from Rails with the sunspot gem
As you mention yourself, you can use facets for this by setting
facet=true&facet.field=state
I'm not familiar with the sunspot gem, but by looking at the documentation you can use
facets like this(Assuming Entity is your searchable):
Entity.search do:
facet :state
end
This should return the states of all entities returned by your query with the number of entities in this state. The Sunspot documentation tells me you can read these facets in the following way:
search.facet(:state).rows.each do |facet|
puts "State #{facet.value} has #{facet.count} entities"
end
Essentially there are three main sets of functions you can use to garner stats from solr.
The first is faceting:
http://wiki.apache.org/solr/SimpleFacetParameters
There is also grouping (field collapsing):
https://wiki.apache.org/solr/FieldCollapsing
And the stats package:
https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
Although the stats, facet and group may be replaced by the analytic package known as olap which is aimed to be in solr V 5.0.0:
https://issues.apache.org/jira/browse/SOLR-5302
Good luck.

Solr edismax. How to retrieve the fields that gave the query result

I'm querying mulitple fields using the qf param. But I'm wondering how I can retrieve the field that gave the result.
Example(not a real example):
q={!edismax qf='tag content'}("tablet")AND("pc")
Field values:
doc1:
tag: tablet
content: The test has failled. Use a pc instead.
doc2:
tag: tablet pc
content: The test has worked.
As a result both documents will be returned because they both have the tablet and pc in their tag/content. Is it possible to know that doc2 had both hits in tag and doc1 only had 1 hit in tag and 1 in content? Debugquery doesn't seem to provide information about this.
I know I can increase the importance of a searchfield using the qf boost.
you can either:
use Highlighting
ask for debug info with debugQuery=true and parse the scoring info to find out.
I think 1 is easier, but it imposes some constraints on your fields (they must be stored for example)
Ok based on your response to my question.
Unfortunately, there is no way Solr currently returns which fields matched the query as part of your defaulting result docs . If the query is simple, looping over the returned stored fields is probably your best bet. Highlighting is be an option too.
There are couple of other options suggested here : http://grokbase.com/t/lucene/solr-user/117nkf36nq/determine-which-field-term-was-found

Solr get calculated distance while using dismax

I'm starting to think that what I want to do is not possible but thought I would give this a try.
I'm running Solr 3.5.
I currently have two types of search:
A basic spatial query which returns the calulated distance between two points in the score field.
Sample Query from my Solr logs:
?fl=*,score&sort=score+asc&start=0&q={!func}geodist()&sfield=coordinates&pt=59.2363514,18.092783&version=2
A dismax query which allows free text queries on a number of fields.
Sample Query from Solr log:
mm=1&d=100.0&sfield=coordinates&qf=field1^5.0+fields2^3.0&defType=edismax&version=2&fl=*,score&start=1&q=monkeyhopper&pt=59.2363514,18.0927830000&fq={!geofilt}}
I want to replace my first query with the dismax query but I really need to get the calculated distance in the response. Yes, I can calulate the distance programatically but I would prefer not having to do this as Solr has done it for me already.
I still want to be able to sort my dismax query "by relevance", distance or any other field so the score given by my boosts could be interesting for sorting but I don't need it to be returned.
If I understood correctly you want to have the result of a function in your Solr response. The SOLR-2444 issue is what you're looking for I guess: it allows to include in the fl parameter pseudo-fields, functions etc. The only problem is that it's been committed only on trunk, so it isn't available on the current Solr release, neither will be in the coming 3.6 release. You have to wait for the 4 release but I don't think it will take a lot of time. Maybe you can already start playing around with a snapshot of the last successful Jenkins build.
Pseudo-fields are now available in Solr 4+ which allow you to do just this.
http://localhost:8983/solr/collection1/browse?q=*:*&rows=1000&wt=xml&pt=37.763649,-122.24313&sfield=store&fl=dist:geodist()
For instance, this request allows me to return a field "dist" which contains the distance of each entry to the stated point.

Resources