Solr facet on subset of documents - solr

I have Solr documents that can have 3 possible states (state_s in {new, updated, lost}). These documents have a field named ip_s. These documents also have a field nlink_i that can be equal to 0.
What I want to know is: how many new ip_s I have. Where I consider a new ip is an ip that belong to a document whose state_s="new" that does not appear in any document with state_s = "updated" OR state_s = "lost" .
Using Solr facet search I found a solution using the following query parameters:
q=sate_s:"lost"+OR+sate_s:"updated"
facet=true&facet.field=ip_s&facet.limit=-1
Basically, all ip in
"facet_fields":{
"ip_s":[
"105.25.12.114",1,
"105.25.15.114",1,
"114.28.65.76",0,
...]
with 0 occurence (e.g. 114.28.65.76) are "new ips".
Q1: Is there a better way to do this search. Because using the facet query describe above I still need to read the list of ip_s and count all ip with occurence = 0.
Q2: If I want to do the same search, (i.e. get the new ip) but I want to consider only documents where nlink_i>0 how can I do?. If I add a filter : fq=nlink_i:[1 TO *] all ip appearing in documents with link_i=0 will also have their number of occurrence set to 0. So I cannot not apply the solution describe above to get new ip.

Q1: To avoid the 0 count facets, you can use facet.mincount=1.
Q2: I think the solution above should also answer Q2?

Alternatively to facets you can use Solr grouping functionality. The aggregation of values for your Q1 does not get much nicer, but at least Q2 works as well. It would look something like:
select?q=*:*&group=true&group.field=ip_s&group.sort=state_s asc&group.limit=1
In order for your programmatic aggregation logic to work, you would have to change your state_s value for new entries to something that appears first for ascending ordering. Then you would count all groups that contain a document with a "new-state-document" as first entry. The same logic still works if you add a fq parameter to address Q2.

I found another solution using facet.pivot that works for Q1 and Q2:
http://localhost:8983/solr/collection1/query?q=nbLink_i:[1%20TO%20*]&updated&facet=true&facet.pivot=ip_s,state_s&facet.limit=-1&rows=0

Related

Filter in Graphical Query parser Solr

I have a structure where I want to search my documents and filter/rank/set conditions on my parents. Example, a doc is a match because it contains my searched string, but also because its parent contains a certain value.
Using the graph parser and experimenting with the filter is the best way I have noticed doing this. I tried block join child parser first but it wouldn't do it for me.
The problem I am facing now is that I can't seem to get the filter to work in this way:
traversalFilter="(-field:x) OR (field2:y)"
Meaning, if field does not have value x it is ok, if field has value x and field2 has y its also ok. Other cases is filtered away.
But it won't work. Any help is appreciated!
Edit for more information:
I have set up a test core with all my fields stored in a text_general field. Default solrconfig. I have a simple chaining I'm using from parameter as document id. And a to field storing all ids of each documents children. And the graph parser works fine, its just this kind of filter that does not work for me.
I have documents with field with value a or b.
A query like this:
q=*
fq={!graph from=id to=to returnRoot=false traversalFilter="(field:b)" }id:0
This query filters away any document and its children that do not have b as value on field.
q=*
fq={!graph from=id to=to returnRoot=false traversalFilter="(-field:b)" }id:0
Should then work in the opposite. Filter away documents with b as value. But this does not work for some reason.
Edit:
from solrquerysyntax:
https://wiki.apache.org/solr/SolrQuerySyntax
Pure negative queries (all clauses prohibited) are allowed.- inStock:false finds all field values where inStock is not false
Which is why q=* fq=-(field:x) works fine, in returning all documents not containing value x in field.
So why can't I add the same filter in the graph traversal
EDIT3:
I have now started looking on the graph parser and have noticed that when filtering -(-field:x) is the same as +field:x. But +(-field:x) is not the same as -field:x and does not work.

How to tune Solr query with field priority?

I have to make query to Apache Solr, like this:
name:ABCXYZ AND address:Ame*
I find that the the query with just name:ABCXYZ receives a quick response, with only 1 result. However, the response time is much higher for the query above, which includes a second field, address.
How can I tune Solr, or my query, so that it prioritizes the search for each field? In my case, this would mean to search name before address.
"name:ABCXYZ" gives an answer straight away, because the query is very specific and yields one distinct result. Normally you would expect a query like "name:ABCXYZ AND address:Ame*" to find all entries where name is ABCXYZ, then further find entries where address starts with "Ame".
The thing about Solr is that it is possible to configure it in such a way that even though "name:ABCXYZ AND address:Ame*" only yields one result, solr will continue to search for other entries that doesn't match the entire query string, but matches part of string.
This means that perhaps your search is too "kind"?
We use a query parameter named "mm" (Minimum Match) which you can set on your search handler in solrconfig.xml. This parameter specifies the minimum number of clauses that the query must match. So if mm=1 for instance, your query will find entries where name is "ABCXYZ', but also adresses that start with "Ame". Maybe you should look into this mm parameter? It's possible to set mm=100% which should force your search handler to find exact matches I imagine.
Edit: "mm" is for DisMax Query Parsers by the way.
search name before address
name:ABCXYZ^10 OR address:Ame^6
assigns name a boost of 10, and address a boost of 7.
These boost factors make matches in name much more significant than matches in address

Lucene OR query not working

I am trying to query Solr with following requirement:
_ I would like to get all documents which not have a particular field
-exclusivity:[* TO *]
I would like to get all document which have this field and got the specific value
exclusivity:(None)
so when I am trying to query Solr 4 with:
fq=(-exclusivity:[* TO *]) OR exclusivity:(None)
I have only got results if the field exists in document and the value is None but results not contain results from first query !!
I cannot understand why it is not working
To explain your results, the query (-exclusivity:[* TO *]) will always get no results, because you haven't specified any result to retrieve. By default, Lucene doesn't retrieve any results, unless you tell it to get them. exclusivity:(None) isn't a limitation placed on the full result set, it is the key used to find the documents to retrieve. This differs from a database, which by default returns all records in a table, and allows you to limit the set.
(-exclusivity:[* TO *]) only specifies what NOT to get, but doesn't tell it to GET anything at all.
Solr has logic to handle Pure negative queries (I believe, in much the same way as below, by implicitly retrieving all documents first), but from what I gather, only as the top level query, and it does not handle queries like term1 OR -term2 documented here.
I believe with solr you should be able to use the query *:* to get all docs (though that would not be available in raw lucene), so you could use the query:
(*:* -exclusivity:[* TO *]) exclusivity:(None)
which would mean, get (all docs except those with a value in exclusivity) or docs where exclusivity = "None"
I have founded answer to this problem. I have made bad assumption how "-" works in solr.I though that
-exclusivity:[* TO *]
add everything without exclusivity field to the data set but it is not the case. The '-' could only exclude things from data set. BTW femtoRgon you are right but I am using it as fq (filter query) not as a master query I have forgotten to mention that.
So the solution is like
-exclusivity:([* TO *] AND -(None))
and full query looks like
/?q=*:*&fq=-exclusivity:([* TO *] AND -(None))
so that means I will get everything does not have field exclusivity or has this field and it is populated with value None.

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

solr sort,i want Specify a particular document at the first

solr sort,i want Specify a particular document at the first
for example:
Results :5,2,3,1
I want 2 at the first ,Other sorted in accordance with the rules
2,1,3,5
how to do this ?
I know of two ways you can try to tackle this using Solr.
The first is to use the QueryElevationComponent. This lets you define the top results at index time. As suggested in the documentation, this is good for placing sponsored results or popular documents at the top of the search results. The potential downside is that you have to be able to identify those documents at index time and not at query time.
The other approach is to boost the desired documents at query time using the bq parameter. To boost document 435, you would do something like this:
...&bq=id:435^10
Unfortunately, neither of these approaches give you absolute control over the order of the results.
The solution provided by Riking would certainly do the job if you don't mind processing the results after performing the search. Another approach you could consider is to add a field to your Solr schema that defines a display order or priority. You can then sort on that field to get the desired sort order.
If you are using Solr 3.1 or later, you can sort by a function query. The map function is useful for this.
sort=map(field_name,5,5,0) asc
In the above, field_name is the name of the field you want to sort by, 5 is the value you want to push to the front and 0 must be replaced with some number that you know is less than all other numbers.
Call the builtin sort() function, then shift the desired element to the front.
Pseudocode, in case you do not have a builtin method to shift it to the front:
tmp = desired;
int dIndex = array.indexOf(desired);
for(i=dIndex-1; i >= 0; i--)
{
array[i+1] = array[i]
}
In case you use standart query (not dismax) add "OR id:2^1000" to you query. Like this:
q=(text:lalala AND author:Bob) OR id:2^1000
that will place document with ID=2 at the top of results.

Resources