Solr query, count of different fields - solr

Can somebody give an example where Solr query gives the following count result of three fields (google, flickr, yahoo) - where the field value is true ?
google:20
flickr:10
yahoo:100
from the document like this:
{...., google:"123", flickr:"", yahoo:"8910", ....}
Thanks in advance.
Cs.

You could probably use the Terms Component to do that.
https://wiki.apache.org/solr/TermsComponent
The output is a list of the terms and their document frequency values.
use qt=/terms to call the terms handler, and then specify terms.fl multiple times for each field type
terms.fl=google&terms.fl=flicker&terms.fl=yahoo
That should give you a count of the "true" term for each of the fields

Related

Sort or filter results by function query defined in a field

I have a Solr 6.2 instance running, and I'm exploring its advantages and limitations. One limitation I've run into seems to be that you can't sort or filter the data based off of a field function query.
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq=total:[10 TO *]
Solr responds with an error stating that the total field does not exist. Indeed, the field is not defined in my schema because it's not a stored part of the dataset - it's calculated at query time. They call it a pseudo field. I haven't been able to find an example in the documentation or a solution online. So, is there a way around this?
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq={!frange l=10} sum(v1,v2)
I have very same problem as you.
I want to query particular division value of two fields.
I tried to used [0.3 TO *] like you.
You can also use upper bound for your range if you need.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.6.pdf
"l" is for lower bound.
"u" is for upper bound.
fq={!frange l=0 u=2.2} sum(user_ranking,editor_ranking)
Maybe this works for you?
you can do this. instead of total try sum.
you can find more info here. https://wiki.apache.org/solr/FunctionQuery#What_is_a_Function.3F
an example from the sole wiki.
Example Function Queries
To give you a better understanding of how function queries can be used in Solr, suppose an index stores the dimensions in meters x,y,z of some hypothetical boxes with arbitrary names stored in field boxname. Suppose we want to search for box matching name findbox but ranked according to volumes of boxes. The query parameters would be:
q=boxname:findbox val:"product(x,y,z)"
This query will rank the results based on volumes. In order to get the computed volume, you will need to request the score, which will contain the resultant volume:
&fl=*, score
Suppose that you also have a field storing the weight of the box as weight. To sort by the density of the box and return the value of the density in score, you would submit the following query:
http://localhost:8983/solr/collection_name/select?q=boxname:findbox val:"div(weight,product(x,y,z))"&fl=boxname x y z weight score`
you can read more about it here. https://cwiki.apache.org/confluence/display/solr/Function+Queries
Try this
solr/collection/select?q=*:* _val_:"sum(v1,v2)"&fl=* score&fq={!frange l=10 }sum(v1,v2)

SOLR custom similarity for locations

I'd like to store in SOLR some items with addresses (City, State, ...) and I'd like to change how similarity is computed. The thing is that when comparing for example city I'm only interested if they are same and not if those strings are similar. Is there a way how to that? Is it through the custom similarity?
If so, can somebody please point me to how it can be done in Solr 6.2?
Thank you very much.
If you're only interested if something matches exactly, use a StrField (a StrField is case sensitive, so the case has to match as well). As you're only getting exact matches, the scoring will be the same for all documents.
The only time you need to implement a custom similarity class is if you want to score documents in a different way than what the built in similarities (or function queries) allow.
Matching exactly would be a regular query: city:Frankfurt. As long as the field is a StrField, only documents with exactly Frankfurt in that field will be returned (and unless you've added an index time boost for one of them, they'll all score identical).
Also, if you're sorting by a field (such as city), any score calculation will be thrown out.

Solr facet on subset of documents

I have Solr documents that can have 3 possible states (state_s in {new, updated, lost}). These documents have a field named ip_s. These documents also have a field nlink_i that can be equal to 0.
What I want to know is: how many new ip_s I have. Where I consider a new ip is an ip that belong to a document whose state_s="new" that does not appear in any document with state_s = "updated" OR state_s = "lost" .
Using Solr facet search I found a solution using the following query parameters:
q=sate_s:"lost"+OR+sate_s:"updated"
facet=true&facet.field=ip_s&facet.limit=-1
Basically, all ip in
"facet_fields":{
"ip_s":[
"105.25.12.114",1,
"105.25.15.114",1,
"114.28.65.76",0,
...]
with 0 occurence (e.g. 114.28.65.76) are "new ips".
Q1: Is there a better way to do this search. Because using the facet query describe above I still need to read the list of ip_s and count all ip with occurence = 0.
Q2: If I want to do the same search, (i.e. get the new ip) but I want to consider only documents where nlink_i>0 how can I do?. If I add a filter : fq=nlink_i:[1 TO *] all ip appearing in documents with link_i=0 will also have their number of occurrence set to 0. So I cannot not apply the solution describe above to get new ip.
Q1: To avoid the 0 count facets, you can use facet.mincount=1.
Q2: I think the solution above should also answer Q2?
Alternatively to facets you can use Solr grouping functionality. The aggregation of values for your Q1 does not get much nicer, but at least Q2 works as well. It would look something like:
select?q=*:*&group=true&group.field=ip_s&group.sort=state_s asc&group.limit=1
In order for your programmatic aggregation logic to work, you would have to change your state_s value for new entries to something that appears first for ascending ordering. Then you would count all groups that contain a document with a "new-state-document" as first entry. The same logic still works if you add a fq parameter to address Q2.
I found another solution using facet.pivot that works for Q1 and Q2:
http://localhost:8983/solr/collection1/query?q=nbLink_i:[1%20TO%20*]&updated&facet=true&facet.pivot=ip_s,state_s&facet.limit=-1&rows=0

Solr terms component complete field match

i am new to Solr.
I am working with the terms component to get the Top Terms from a Field.
For Example:
I got the field "Firm" and there are many types of firms in it with the endings "gmbh" and "ag".
But i need this Field sepperated by the full content of it.
For Example: Mustermann gmbh, max gmbh, etc .....
I've tried many different fieldtypes in the schema.xml but nothing worked.
Thank you in advance.
Best regards,
Lorenzo :-)
You can use Facets in your request to get the "Top X of field Y"
E.g.
q=*&facet=true&facet.field=Firm&facet.limit=50&facet.minCount=1
When you use facet.limit you get the top X results.
Your field Firm in the schema.xml should not use a Tokenizer, because you would get "mustermann" and "gmbh" instead of "mustermann gmbh" (I think "string" is in standard a field without a Tokenizer)
Don't forget to reindex if you have to change field values.

Can I use Solr term component with filtering on non-term fields

http://localhost:8080/search/terms?terms.prefix=ab&terms.fl=text&terms.sort=count
I have the above terms query which works as I expect. Returns all the terms from the "text" field that have a certain prefix, sorted by count.
I want to return only the terms where another field "language" is "en" can I add such a filter to a terms query?
Unfortunately you can't filter while accessing the indexed terms within a field through the TermsComponent. That's one of the limitations you face when you make auto suggestions for example. If you're making auto-suggestions, one of the ways that supports filtering is based on a facet and the prefix parameter like explained here.

Resources