solr use both n-gram search and default search - solr

I'm trying to create a corpus using Solr. I have a field named "content" and I need to index and search bigrams and trigrams. Also need to index and search using the default searching.
How to configure these things?

You'll have to add the ShingleFilterFactory to your field definition, after the tokenization has been performed. You can configure the ShingleFilter to generate bigrams or trigrams.
There is no such thing as "default searching", but the bundled schema includes a field named text_general that might be a good match for regular search. You'll have two different fields, one for searching shingles (where you'd want to match the whole bigram / trigram, probably), and one for the "regular search".
You can add the same content to both fields by using a copyField directive, such as <copyField source="content" dest="content_ngrams" />. You can use qf when querying to say which field you want to query, or if you want to score the fields differently for matches (i.e. boosting a match in a bi/trigram). You could also query for a direct match with fieldname:value, depending on how you need to query the index.

Related

Incorrect results for Solr search with multiple terms

Perhaps someone can enlighten me on how Solr matches terms. So I have a string attribute named assignedBy, and I do a query against this attribute with the value "Aaron Mason" (no quotes). Solr returns more matches than I anticipated because the term "Mason" also matches documents whose other fields contain the word "Mason" in it. By turning on debugging feature (from Solr admin), I see Solr breaks down the query into two attribute queries - "aaron" for assignedBy and "mason" for the catch-all text (see below). Is this the correct behavior? How do I ensure that it only finds matches against the attribute I specify? Thanks.
"debug":{
"rawquerystring":"assignedBy:Aaron Mason",
"querystring":"assignedBy:Aaron Mason",
"parsedquery":"assignedBy:aaron _text_:mason",
"parsedquery_toString":"assignedBy:aaron _text_:mason",
yes you are correct. when you q=assignedBy:Aaron Mason
after parsing the query, based on you query tokenizers in schema file, it looks like
assignedBy:aaron and _text_:mason.
if you don't specify field name queryterm is searched in default field (which is set in solrconfig.xml file) you can look for <str name="df">text</str> under /select handler. in your case it might be _text_.
So, Solr search for its index and retrieve combined results of all documents which has field assignedBy with term "Aaron" and all documents which has field _text_ with term "mason".
you might have used copyfield to copy some field values to text field. check for it.
You can use dismax/edismax where you can specify in which field all your terms to search for
example:
q=Aaron Mason&wt=json&debugQuery=on&defType=dismax&qf=assignedBy
This only finds matches against the field "assignedBy" specified in qf

How to create a solr query that searches by multiple keywords in all fields

I want to perform a solr query on all fields for multiple keywords. For example, I want to search for the word "dog" AND the word "cat".
So far, I've tried to do something like this:
q=dog cat
or something like:
q=dog,cat
However, I think my queries are actually doing an OR instead of an AND.
Your question is about the default operator (AND/OR) and you want to search in "all fields".
For most parsers you can use the parameter q.op to change the default parser (e.g. for the Standard Query Parser and the DisMax Query Parser) or you can use the defaultOperator in schema.xml or Schema API.
Be aware that you will search only in the default field.
If you want to search in "all fields" you have to copy all your fields to one field (and use this as default field) or you have to list all your fields in the DisMax qf-parameter.
The results will not be the same: In the second case your "AND"-Search must match one of the fields (with its special tokenizer), in the first each term could be in different fields to match (because in the end all terms are in the default field).

Solr dynamicField not searched in query without field name

I'm experimenting with the Example database in Solr 4.10 and not understanding how dynamicFields work. The schema defines
dynamicField name="*_s" type="string" indexed="true" stored="true"
If I add a new item with a new field name (say "example_s":"goober" in JSON format), a query like
?q=goober
returns no matches, while
?q=example_s:goober
will find the match. What am I missing?
I would like to see the SearchHandler from solrconfig.xml file that you are using to execute the above mentioned query.
In SearchHandler we generally have Default Query Field i.e. qf parameter.
Check that your dynamic field example_s is present in that query field list of solrconfig file else you can pass it while sending query to search handler.
Hope this will help you in resolving your problem.
If you are using the default schema, here's what's happening:
You are probably using default end-point (/select), so you get the definition of search type and parameters from that. Which means, it is default (lucene) search and the field searched is text.
The text field is an aggregate and is populated by copyField instruction from other fields.
Your dynamic field definition for *_s allows you to index the text with any name ending in _s, such as example_s. It's indexed (so you could search against it directly) and stored (so you can see it when you ask for all fields). It will not however search it as a general text. Notice that (differently from ElasticSearch), Solr strings have to be matched fully and completely. If you have some multi-word text in it, there is barely any point searching it. "goober" is one word so it's not a very good example to understand the difference here.
The easiest solution for you is add another copyField instruction:
<copyField source="*_s" dest="text"/>, then all your *_s dynamic fields would also be searchable. But notice that the search analyzers will not be the ones for *_s definition, but the ones for the text field's definition, which is not string, but text_general, defined elsewhere in the file.
As to Solr vs. ElasticSearch, they both err on the different sides of magic. Solr makes you configure the system and makes it very easy to see the exact current configuration. ElasticSearch hides all of the configuration, but you have to rediscover it the second you want to change away from the default behaviour. In the end, the result is probably similar and meets somewhere in the middle.

Apache Solr or Lucene proximity search on multiple fields

Is it possible in solr/lucene to search on different multivalued fields?
Imagine to have an XML fragment like this:
<normative>
<ref><aut>State</aut><num>70</num>><year>2007</year><article>13</article></ref>
<ref><aut>TreasuryMinistry</aut><num>350</num><year>2011</year><article>21</article></ref>
</normative>
Is it possible to retrieve documents containing for instance:
num:70 AND year:2007
inside the same ref ?
i.e. this document should not be found for a query like
num:70 AND year:2011.
I could create catenated fields like
<ref cat='state-0070-2007-0013'/>
<ref cat='TreasuryMinistry-0350-2011-0021'/>
but the user must be able to find by every combination of fields, i.e.
num and year,
year and article,
num and article,
aut and num and year,
on the same ref!
I am not experienced with solr/lucene, so I fear that a wild card search like
cat:'*-0070-2007-*'
could not be not performant over our normative document corpus.
Is there a way to make a search based on relative position?
Something like using copyField to a multivalue field with different positionincrementGaps?
Not directly answering your proximity question, but can you treat each as a document? If so, then a search like 'num:70 AND year:2007' should work fine, assuming you create the 'num' and 'year' fields.

Solr Ngram Synonyms Dismax

I have ngram-indexed 2 fields (columns in the database) and the third one is my full text field. Now my default text field is the full text field and while querying I use dismax handler and specify in it both the ngrammed field with certain boost values and also full text field with a certain boost value.
Problem for me if I dont use dismax and just search full text field(i.e. default field specified in schema) synonyms work correctly i.e. ca returns all results where california is there whereas if i use dismax ca is also searched in the ngrammed fields and return partial matches of the word ca and does not go at all in the synonym part.
I want to use synonyms in every case so how should I go about it?
Ensure you already correctly configured the "SynonymFilterFactory" filter in your ngram field's query analyzer.
If still doesn't work, the Solr admin's analysis interface can give more details of the tokenize/filter procedures, through which can check if the Synonym part already works as expected.

Resources