Solr terms component complete field match - solr

i am new to Solr.
I am working with the terms component to get the Top Terms from a Field.
For Example:
I got the field "Firm" and there are many types of firms in it with the endings "gmbh" and "ag".
But i need this Field sepperated by the full content of it.
For Example: Mustermann gmbh, max gmbh, etc .....
I've tried many different fieldtypes in the schema.xml but nothing worked.
Thank you in advance.
Best regards,
Lorenzo :-)

You can use Facets in your request to get the "Top X of field Y"
E.g.
q=*&facet=true&facet.field=Firm&facet.limit=50&facet.minCount=1
When you use facet.limit you get the top X results.
Your field Firm in the schema.xml should not use a Tokenizer, because you would get "mustermann" and "gmbh" instead of "mustermann gmbh" (I think "string" is in standard a field without a Tokenizer)
Don't forget to reindex if you have to change field values.

Related

SOLR: Search for a value in multiple fields

I am looking for a way of querying for values in multiple fields. Basically i am building a simple search engine where user can type ie. "Java How to XML JSON" and it will search for these values in 3 different fields categories, tags, description.
I read on some blog I should query all fields q=*:* and then filter based on those fields for example fq=categories:java,xml,how,to,json description:java,xml,how,to,json tags:java,xml,how,to,json
This works :| But it seems incorrect to just copy paste values like this.
Is there a correct way of doing this? I have been researching this for some time but i havent found a solution.
Any help is appreciated,
Thank you
You can use defType=edismax to get the extended dismax handler. This is meant to handle user typed queries (i.e. what you'd type in). You can then use qf (query fields) to tell the edismax handler which fields you want to search (and an optional weight for each field):
q=Java How to XML JSON&defType=edismax&qf=categories^5 tags description
.. will search each part of the string "Java How to XML JSON" in all the fields, and any hits in the categories field will be weighted five times higher than hits in the other two fields.

Solr query, count of different fields

Can somebody give an example where Solr query gives the following count result of three fields (google, flickr, yahoo) - where the field value is true ?
google:20
flickr:10
yahoo:100
from the document like this:
{...., google:"123", flickr:"", yahoo:"8910", ....}
Thanks in advance.
Cs.
You could probably use the Terms Component to do that.
https://wiki.apache.org/solr/TermsComponent
The output is a list of the terms and their document frequency values.
use qt=/terms to call the terms handler, and then specify terms.fl multiple times for each field type
terms.fl=google&terms.fl=flicker&terms.fl=yahoo
That should give you a count of the "true" term for each of the fields

Why does Dismax not work in simple query?

All:
I am pretty new to SOLR, I upload some documents which have "season" in content field(store but not indexed, copy to text field) and in title field(store and indexed copy to text field)
When I use basic query without dismax like:
http://localhost:8983/solr/collection1/select?q=season&rows=5&wt=json&indent=true
It works very well and return correct results, but when I want to boost those documents which have more "season" in content rather than title, I used dismax like(I guess the way I use it is totally, cos the content is not indexed, but I at least expect certain return result even incorrect ):
http://localhost:8983/solr/collection1/select?q=season&rows=5&wt=json&indent=true&defType=dismax&qf=content%5E100+title%5E1
There is no match result returned, I wonder if anyone could help me with this? Or could anyone show me how to use dismax correctly
Thanks
In your second query you specify the "content" field as the one and only query field but earlier you write that this field is stored but not indexed. If a field is not indexed you can not search against it.
I faced the same problem. Tracked it down to the schema definition where for dismax to work, field type should be text and not string
for e.g text_general,text_en_splitting,text_en
Its because of the tokenizers used for this field types.
-->

what is the advantages of mutivalued option in solr

What is the advantages of mutivalued field option in solr.
I have a field with comma separated keywords.
I can do 2 things
make a non-multivalued text field
make a multivalued text field which contains each keyword
I can still query in both the cases. So whats the advantages of multivalued over non-multivalued?
advantages of multivalued: you don't need to change the document design. If en document containes multiple values in one filed, so solr/lucen can handle this field.
Also an advantage: multiple values could describe an document more exact (thing about tags of an blog post, or so)
advantages of non-multivalued: you can use specific features, which required an single term (word) in one filed, like spell checking. It's also a benefit for clustering (carrot) or grouping, which works mostly better on non-multivalued fields
Querying by the multivalue field will receive what you want.
Example: doc1 has a keyword 'abc', and doc2 has a keyword 'abcd'. If query by keyword 'abc' only doc1 should be matched.
So in non-multivalue approach both documents will matched, case you'll use like syntax.
multivalue fields can be very handy, let say you have many fields and you wish to search for several fields but not in all of them. you can create multivalue field that include all the fields that you wont to search for them on this field and search in it.
for example, let say you have fields that may have value of string or value of number. and than you wish to search on all string values that were found in the document. so you can create multivalue field for all string values and search in it.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources