Store Solr analyzer result in separate field - solr

I have a field type with multiple analyzers (Keepword, Synonym, ...).
How can I store the result of all the analyzers into a separate field ?
Unfortunately, copyField is executed before the analyzers run...

You can't. The result of "all the analyzers" is the actual result stored in the field. You'll have to create separate fields that cut of the sequence of analyzers/filters earlier for each field type, then copyField into each field.
If you just want to watch what each step in the analysis process does, use the Admin interface and select Analysis. You can also access these results in a programmatic fashion through the end point that the Admin interface uses:
http://localhost:8983/solr/corename/analysis/field?wt=json&analysis.showmatch=true&analysis.fieldvalue=foo&analysis.query=foo&analysis.fieldname=fieldname

Related

How can I view actually stored transformed Solr text field values?

When Solr returns a document, the field values match those that where passed to the Solr indexer.
However especially for TextFields Solr typically uses a modified value where (depending on the definition in the schema.xml) various filters are applied, typicall:
conversion to lower case
replacing of synonyms
removal of stopwords
application of stemming
One can see the result of the conversion for specific texts by using Solr Admin > Some core > Analysis. There is a tool called Luke and the LukeRequestHandler but it seems I can only view the values passed to Solr but not the tranformed variant. One can also take a look at the index data on the disk but they seem to be stored in a binary format.
However, non of these seem to enable me to see the actual value as stored.
The reason for asking is that I've created a text field based on a certain filter chain which according to Solr Admin > Analysis transforms the text correctly. However when searching for a specific word in the transformed text it won't find it.

How to create a solr query that searches by multiple keywords in all fields

I want to perform a solr query on all fields for multiple keywords. For example, I want to search for the word "dog" AND the word "cat".
So far, I've tried to do something like this:
q=dog cat
or something like:
q=dog,cat
However, I think my queries are actually doing an OR instead of an AND.
Your question is about the default operator (AND/OR) and you want to search in "all fields".
For most parsers you can use the parameter q.op to change the default parser (e.g. for the Standard Query Parser and the DisMax Query Parser) or you can use the defaultOperator in schema.xml or Schema API.
Be aware that you will search only in the default field.
If you want to search in "all fields" you have to copy all your fields to one field (and use this as default field) or you have to list all your fields in the DisMax qf-parameter.
The results will not be the same: In the second case your "AND"-Search must match one of the fields (with its special tokenizer), in the first each term could be in different fields to match (because in the end all terms are in the default field).

solr use both n-gram search and default search

I'm trying to create a corpus using Solr. I have a field named "content" and I need to index and search bigrams and trigrams. Also need to index and search using the default searching.
How to configure these things?
You'll have to add the ShingleFilterFactory to your field definition, after the tokenization has been performed. You can configure the ShingleFilter to generate bigrams or trigrams.
There is no such thing as "default searching", but the bundled schema includes a field named text_general that might be a good match for regular search. You'll have two different fields, one for searching shingles (where you'd want to match the whole bigram / trigram, probably), and one for the "regular search".
You can add the same content to both fields by using a copyField directive, such as <copyField source="content" dest="content_ngrams" />. You can use qf when querying to say which field you want to query, or if you want to score the fields differently for matches (i.e. boosting a match in a bi/trigram). You could also query for a direct match with fieldname:value, depending on how you need to query the index.

Solr copyField mixed with RegexTransformer

Scenario:
In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|
What I want:
In Solr index, I want to have 2 fields:
Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130
For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml:
<field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml
What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.
As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.
Any help is appreciated, thanks.
Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,
Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.
Check this Question, it is similar.
You can create two columns from the same data and treat them separately.
SELECT categories, categories as categories_pipe FROM category_table
Then you can split the "categories" column, but index the other one as-is.

Can Solr search key words precisely?

For example:
I want to search "support", I hope it will only return the results containing "support", and do NOT return the result containing "supports" or any other relevant matches.
Is it possible to implement like this?
Thanks.
Yes, if you search against an unanalyzed field type, matches are exact. In the default Solr schema the unanalyzed field type is named "string" (of class "solr.StrField")
EDIT: it depends on what you mean by "precisely". If your field value is "support desk" and your query is "support", should it match?
If your answer is yes, then you should look into configuring stemming.
If your answer is no, i.e. the query must match the field value and nothing else, then you should use a string (i.e. unanalyzed) field type.
Furthermore, if your query is "supports" and the field value is "Supports", should it match?
If you answer yes, then you should use a LowerCaseFilterFactory (you can't do this on a string field type, you'll have to switch to a text field type).
If you answer no, then it's ok to use a string field type.
In summary, the Lucene/Solr text analysis pipeline is very configurable, take a look at the analyzer docs for a reference of all available options.
What you are describing is called stemming. There is another almost identical question on stack overflow, check it out : Solr exact word search
You will need to re-index and disable stemming in your configuration. I don't believe it's possible to do that at query time since what is stored in your index is the stemmed version of the word. In your case "support" is stored in the index even is "supports" is displayed.
This should get you started How to configure stemming in Solr?

Resources