I have ngram-indexed 2 fields (columns in the database) and the third one is my full text field. Now my default text field is the full text field and while querying I use dismax handler and specify in it both the ngrammed field with certain boost values and also full text field with a certain boost value.
Problem for me if I dont use dismax and just search full text field(i.e. default field specified in schema) synonyms work correctly i.e. ca returns all results where california is there whereas if i use dismax ca is also searched in the ngrammed fields and return partial matches of the word ca and does not go at all in the synonym part.
I want to use synonyms in every case so how should I go about it?
Ensure you already correctly configured the "SynonymFilterFactory" filter in your ngram field's query analyzer.
If still doesn't work, the Solr admin's analysis interface can give more details of the tokenize/filter procedures, through which can check if the Synonym part already works as expected.
Related
I am studying/getting familiar Apache Solr database.
I created a simple document via the admin UI:
{
"company_name":["Rikotech inc"],
"id":"12345",
"full_title":["ft rikotech marinov"],
"_version_":1681062832169287680}]
}
Here is the document fetched:
But when I type rikotech in the standard query field, I get no result:
Both full_title and company_name are of type text_general .
I watched YouTube video with some Indian guy, and it worked for him ;|
What am I missing here?
Solr will not search all fields (under any configuration, really) without specifying the fields. However, the tutorial you watched probably had the default copyField rule enabled where everything is copied into a field named _text_, and then that field is configured as the default search field. This effectively means that everything is being copied into a specific field, and then that (single) field is being searched by default.
In your case it's probably better to use the edismax query parser (check the box in front of edismax in the user interface), and then give full_title company_name as the query fields (qf). That will allow you to adjust the weights between the fields as well. full_title company_name^5 will give 5x as much weight to any hits in company_name compared to those in full_title.
I found the problem.
It was that the fields I want to search through by default were copied to some strange fields like full_title_str, instad of text . This is the correct schema setting:
Perhaps someone can enlighten me on how Solr matches terms. So I have a string attribute named assignedBy, and I do a query against this attribute with the value "Aaron Mason" (no quotes). Solr returns more matches than I anticipated because the term "Mason" also matches documents whose other fields contain the word "Mason" in it. By turning on debugging feature (from Solr admin), I see Solr breaks down the query into two attribute queries - "aaron" for assignedBy and "mason" for the catch-all text (see below). Is this the correct behavior? How do I ensure that it only finds matches against the attribute I specify? Thanks.
"debug":{
"rawquerystring":"assignedBy:Aaron Mason",
"querystring":"assignedBy:Aaron Mason",
"parsedquery":"assignedBy:aaron _text_:mason",
"parsedquery_toString":"assignedBy:aaron _text_:mason",
yes you are correct. when you q=assignedBy:Aaron Mason
after parsing the query, based on you query tokenizers in schema file, it looks like
assignedBy:aaron and _text_:mason.
if you don't specify field name queryterm is searched in default field (which is set in solrconfig.xml file) you can look for <str name="df">text</str> under /select handler. in your case it might be _text_.
So, Solr search for its index and retrieve combined results of all documents which has field assignedBy with term "Aaron" and all documents which has field _text_ with term "mason".
you might have used copyfield to copy some field values to text field. check for it.
You can use dismax/edismax where you can specify in which field all your terms to search for
example:
q=Aaron Mason&wt=json&debugQuery=on&defType=dismax&qf=assignedBy
This only finds matches against the field "assignedBy" specified in qf
I want to perform a solr query on all fields for multiple keywords. For example, I want to search for the word "dog" AND the word "cat".
So far, I've tried to do something like this:
q=dog cat
or something like:
q=dog,cat
However, I think my queries are actually doing an OR instead of an AND.
Your question is about the default operator (AND/OR) and you want to search in "all fields".
For most parsers you can use the parameter q.op to change the default parser (e.g. for the Standard Query Parser and the DisMax Query Parser) or you can use the defaultOperator in schema.xml or Schema API.
Be aware that you will search only in the default field.
If you want to search in "all fields" you have to copy all your fields to one field (and use this as default field) or you have to list all your fields in the DisMax qf-parameter.
The results will not be the same: In the second case your "AND"-Search must match one of the fields (with its special tokenizer), in the first each term could be in different fields to match (because in the end all terms are in the default field).
I'm trying to create a corpus using Solr. I have a field named "content" and I need to index and search bigrams and trigrams. Also need to index and search using the default searching.
How to configure these things?
You'll have to add the ShingleFilterFactory to your field definition, after the tokenization has been performed. You can configure the ShingleFilter to generate bigrams or trigrams.
There is no such thing as "default searching", but the bundled schema includes a field named text_general that might be a good match for regular search. You'll have two different fields, one for searching shingles (where you'd want to match the whole bigram / trigram, probably), and one for the "regular search".
You can add the same content to both fields by using a copyField directive, such as <copyField source="content" dest="content_ngrams" />. You can use qf when querying to say which field you want to query, or if you want to score the fields differently for matches (i.e. boosting a match in a bi/trigram). You could also query for a direct match with fieldname:value, depending on how you need to query the index.
I am trying to set up elevate handler in SOLR 3.5.0 and I need the equivalent of the below query in dismax format which defines different boost values on the same field based on the match type(exact match gets 200 whereas wildcard match gets 100).
q=name:(foo*^100.0 OR foo^200.0)
This is one way to solve this problem.
Keep a text field with only WhiteSpaceTokenizer (and maybe LowerCaseFilter depending on your case-sensitivity needs). Use this field for the exact match. Let's call this field name_ws.
Instead of using a wild-card query on name_ws, use a text-type copy field with EdgeNGramTokenizer in your analyzer chain, which will output tokens like:
food -> f, fo, foo, food
Let's call this field name_edge.
Then you can issue this dismax query:
q=foo&defType=dismax&qf=name_ws^200+name_edge^100
(Add debugQuery=on to verify if the scoring works the way you want.)