I am optimizing queries for fetching from a solr backend and was asking myself:
If i query only for a facet for counting expected results, i query for 0 rows to be returned, because i just need the amount.
Does it additionally help (regarding execution performance) to explicitly query for an empty field list? Or is this not relevant for solr internal execution when already returning 0 rows?
What does the maxScore stand for?
// query #1: no explicit field list
/select?q=*:*&rows=0&wt=json
// query #2: explicit empty field list
/select?q=*:*&rows=0&fl=[]&wt=json
actual results for #1:
"response": {
"numFound": 1000,
"start": 0,
"maxScore": 1,
"docs": []
}
actual results for #2:
"response": {
"numFound": 1000,
"start": 0,
"docs": []
}
Thank you very much for your answers and explanations.
Related
I am testing Solr 9.0 with this tutorial:
https://solr.apache.org/guide/solr/latest/getting-started/tutorial-techproducts.html
I used this query:
http://localhost:8983/solr/techproducts/select?q=cat:electronics&fl=name
In the results displayed, it only gives a masScore. How to display each individual score for each result?
"response": {
"numFound": 12,
"start": 0,
"maxScore": 0.5244061,
"numFoundExact": true,
"docs": [
{
"name": "Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133"
},
{
"name": "Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300"
},
{
"name": "Belkin Mobile Power Cord for iPod w/ Dock"
},
You can read individual document scores as an additional field in the results via the fl (Field List) parameter.
The fl parameter limits the information included in a query response
to a specified list of fields. The fields must be either stored="true"
or docValues="true".
The field list can be specified as a space-separated or
comma-separated list of field names. The string score can be used to
indicate that the score of each document for the particular query
should be returned as a field. The wildcard character * selects all
stored fields in the document.
http://localhost:8983/solr/techproducts/select?q=cat:electronics&fl=name,score
Is there any way when I am searching products for one category just for facets alone. In this call I want to avoid result array Example :
{
"limit": 20,
"offset": 0,
"count": 20,
"total": 24,
"results": [],
"facets":{
"variants.attributes.productStyle.en-GB": {
}
}
There can be 20 products, but I need only facets results alone to avoid huge data coming to service. If there is any query It would be great.
Yes, for getting no products returned set the limit parameter to 0 in your API request.
I am using the Solr Admin UI to build this query:
http://localhost:8983/solr/gencat.imagemetadata/select?q=id:"TH-1961-46483-10968-9"&wt=json&indent=true&facet=true&facet.field=externalid
It returns:
{
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "TH-1961-46483-10968-9",
"externalid": "100700000_00024"
}
]
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"externalid": [
"100700000_00024",
1,
"005471837_00001",
0,
"005471837_00002",
0,
"005471837_00003",
0,
"005471837_00099",
0,
....
]
}
}
}
My assumption was it was only going to return facet counts for the one document it found (since I’m specifying the id I want). Instead, it returns a facet_counts structure with every externalid value indexed by Solr (granted…all but one entry is 0. The externalid count for the document matching the query is 1 as it aught to be). But I only want Solr facet counts for the documents in the search results. Not everything. It slows down the query significantly.
Yes, I can set facet.mincount = 1 to cause it to only return facet counts that actually have counts, but under the covers it still looks like it is looking at all of the documents…not just the queried result set. It is currently taking 2 minutes to execute the query above on our 2+ Billion items.
When I turn tracing on; in cqlsh I can see that it is processing across all 2+ Billion items. If it were to only count over the result set this query would be much, much faster.
externalid is defined like this in the schema file:
<field docValues="true" indexed="true" multiValued="false" name="externalid" stored="true" type="StrField"/>
What am I misunderstanding?
It is slowing down my query by having to go out and find all of the externalid’s just to say they have a count of 0.
Is there a way to tell Solr faceting to only look at the docs found from the query?
I am on Solr 6 under DSE 6.0
You can give the facet method through the facet.method parameter. fc is the default, and this is the behavior you're looking for - are you sure that DSE are actually using fc as the method by default? (since the definition for fc is that it should only iterate over documents matching the query):
fc
Calculates facet counts by iterating over documents that match the query and summing the terms that appear in each document.
This is currently implemented using an UnInvertedField cache if the field either is multi-valued or is tokenized (according to FieldType.isTokened()). Each document is looked up in the cache to see what terms/values it contains, and a tally is incremented for each value.
This method is excellent for situations where the number of indexed values for the field is high, but the number of values per document is low. For multi-valued fields, a hybrid approach is used that uses term filters from the filterCache for terms that match many documents. The letters fc stand for field cache.
I am pretty new to Solr, so I don't know if what I'd like to achieve is actually feasible or not.
Currently, I am querying my Solr to retrieve the amount of results that match the conditions in several facet queries.
For example:
localhost:8082/solr/dict/select?q=*:*&rows=0&wt=json&indent=true&facet=true&facet.query=dict1:"#tiger#"&facet.query=dict1:"#lion#"
With this kind of query, I am getting the count of Solr docs containing "tiger" and the count of those cointaining "lion", in field "dict1":
{
"responseHeader": {
"status": 0,
"QTime": 239,
"params": {
"facet.query": [
"dict1:\"#tiger#\"",
"dict1:\"#lion#\""
],
"q": "*:*",
"indent": "true",
"rows": "0",
"wt": "json",
"facet": "true"
}
},
"response": {
"numFound": 37278987,
"start": 0,
"docs": [ ]
},
"facet_counts": {
"facet_queries": {
"dict1:\"#tiger#\"": 6,
"dict1:\"#lion#\"": 10
},
[...]
}
}
The thing is that now I need to get also some results for each facet, aside as the count (for example, three results for "tiger" and three more for "lion")
I have read some similar questions (Solr Facetting - Showing First 10 results and Other or SOLR - Querying Facets, return N results per Facet ) , but none of their answers seems to work for me, maybe because I am doing the facets on all docs (q=*:*).
Any help will be welcome :)
As per mailing list, what about simply using grouping ?
solr/hotels/search?q=*%3A*&wt=json&indent=true&group=true&group.query=query1&group.query=query2&group.limit=3 [1]
Is this ok for you? This returns 2 groups (1 per query) with the related count and max number of documents.
[1] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
Imagine I have the following facets:
Speakers: [Mike Thompson, Thomas Wilkinson, Sally Jeffers]
Venues: [Weill Thomas Medical Center, BB&R Medical Associates, KLR Thompson]
Solr seems to allow a &facet.prefix=Thom where I can get the facets that START with "Thom" and that will return "Speaker: Thomas Wilkinson" but no others.
How can I do the equivalent of &facet.substring=Thom which will return Mike Thompson and Weill Thomas....
I tried &facet.query=Thom but that doesnt seem to work at all.
Thanks
It is not possible to be sure as you did not provide your full query string, but it may be that the facet is not returning Weill Thomas in facet results because you are only specifying facet.field=speakers in your query, and Weill Thomas is actually in the venues field. You would require second facet.field=venues parameter in your search query to retrieve those.
Facet prefix is only used to filter results once the search is already done, so don't use that parameter for searching purposes. Check this question: SOLR facet search by prefix with results highlighting
Edit based on comment:
You don't necessarily need to filter results returned by faceting after the fact, just make sure that only the facets you want match the original query. The facets that were not part of the search query will have 0 occurances on them if you return all facets. You can then set facet.mincount=1 to only get facets that are found within the search results. Here's an example that I mocked up with test data:
q=*Thom*&rows=0&df=speakers&wt=json&indent=true&facet=true&facet.field=speakers&facet.field=venues&facet.mincount=1&json.nl=map
And the response from Solr:
"responseHeader": {
"status": 0,
"QTime": 3,
"params": {
"q": "*Thom*",
"df": "speakers",
"facet.field": [
"speakers",
"venues"
],
"json.nl": "map",
"indent": "true",
"facet.mincount": "1",
"rows": "0",
"wt": "json",
"facet": "true",
"_": "1431772681445"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": []
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"speakers": {
"Mark Thomas": 1,
"Thomas Moore": 1
},
"venues": {
"Weill Thomas": 1
}
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
}
Just wanted to point out a caveat of the proposed solution (i.e. which is to basically just do your facet substring query as the main Solr query, and then the facet values will be what you want). This won't work correctly for multi-valued fields. For example, if a document had 3 values for speaker of "Mark Thomas", "Fred Jones", "John Doe", then the query 'q=*Thom*' would return as facets "Fred Jones" and "John Doe", in addition to "Mark Thomas", and this would not be the desired result (i.e. "Fred Jones" and "John Doe" should not be returned). So for single-valued fields this solution could work, but for multi-valued fields you would probably have to write an intermediary web service that would filter out the non-matches (like "Fred Jones" and "John Doe"). Solr should really add a facet.substring parameter that would work like the facet.prefix parameter, but do substring filtering on the facet values instead of prefix filtering.