Solr - Results that contain all terms, in any order - solr

In a SOLR install, when I search against a field with a multi-word search term I want SOLR to return documents that have all of the terms in the search, but they do not need to be in the exact order.
For example, if I search for title of Brown Chicken Brown Cow, I want to find all documents that contain all of the terms Brown, Chicken and Cow, irrespective of order in the title field. So, for example, the title "The chicken and the cow have brown poop" should match the query. AFAIK, this is how Google executes searches as well.
I have experimented with the following query formats:
1. Title:Brown AND Title:Chicken
2. Title:Brown AND Chicken
3. Title:Brown+Chicken
I am very confused by the results. In some instances, the first two queries return the same exact set of results. In other instances, the first version will return many results and the second version will return none. The third version seems to meet my needs, but I am confused by the different meaning of the queries.
All of my tests have been run against a field of type text_en.
<field name="Title" multiValued="false" type="text_en" indexed="true" stored="true"/>
So, what's the best SOLR query/set up for this type of search? Also, is there an easy way to make Solr.NET take a user entered search term and convert it to this type of format?
Also, will SOLR by default give documents that match the order of the search phrase a higher relevancy score? If not, what's the right levers to pull to make that happen?
Edit:
Some of my confusion was caused by searching against not default fields vs default fields. Knowing this, the only format that works consistently is the first format.

If I were you I would try to use:
Title:(Brown Chicken)
Brackets will make it equivalent to your query no 1. Quotation will force Solr to search for exact match, including space and order

Please try Title:"Brown Chicken" or use Dismax query parser to handle your queries.
The wiki for lucene query parser speaks (emphasis mine):
....Since text is the default field, the field indicator is not required.
Note: The field is only valid for the term that it directly precedes,
so the query
title:Do it right
Will only find "Do" in the title field. It will find "it" and "right"
in the default field (in this case the text field).
Do you have only the title field in your data model?
Please run debugQuery=on to explain your query to see how they are scored: see it in action https://stackoverflow.com/a/9262300/604511

Related

Boost SOLR Result Score based on search term and document type

I have a rule from my SMEs for SOLR Search relevancy. It goes like this.
When words "XX", "YY", or "ZZ" are in the User's search terms, heavily boost the document_type "MMMM" in the results. (But ONLY then, which means I can't weight the doc itself I think.)
I can imagine building a "Query Pre-Processor" that checks for the presence of the specified terms "XX", etc. and then plugs them into a pre-built query that heavily boosts document_type "MMMM".
That feels more than a little clunky to me. Doing this in code and handling a "union" situation where terms from two rules are in the search doesn't sound like something I'd like to maintain.
I'm wondering if there could be a way to leverage SOLR to do this? The first thing that comes to mind is to put those particular search terms "XX", etc.. into any document_type "MMMM" when pre-processing the data to go into SOLR.
Just tossing them into the document's text is probably not going to change the weighting all that much -- especially if the term is in other documents NOT part of that document_type -- and that suggests to my mind an "important_abbreviations" field on all documents and a "standard" practice of including a boost for that general field on all queries. I say that because I don't recall ever seeing a way to boost a particular field within a doc except in a query.
I'm wondering if anyone else out there has solved this problem and if so, how -- since both of these feel a little clunky to me.
Attempting One Possible Answer: Please feel free to critique, advise or warn.
(I'm aware that an "abbreviation" field feels a bit like synonyms, Please comment if you think synonyms would be a better way to approach this.)
Step 1: Make an "abbreviation" multivalued field in SOLR on all collection docs.
Step 2: Add "XX", "YY", "ZZ" to all documents of type "MMMM" when I build the solrInputDocument to send to SOLR.
Step 3: Boost the "abbreviation" field when adding the abbreviations in step 2 so that resulting xml looks like this:
<field name="abbreviation" boost="5.0">myXXAbbreviationGoesHere</field>
[Concern: Can I boost some fields of type "abbreviation" and not others? In other words, will SOLR respect/correctly calculate the field boost value if it's "2" on one document "5" on another and there is no boost on a 3rd document?]
Step 4: Do a copyField and drop "abbreviation" into the default "text" search field. [This probably looses me my field-specific weighting, yes? -- Thus 5 or 6 below.]
Step 5: OR - add a Request Handler that forces doing search on the abbreviations field directly on every incoming search. Not totally sure on this one, but I got the idea from this stackoverflow question: Solr - Boosting result if query is found in a special field
Step 6: OR - append the query text for searching "abbreviation" on every query entered in my UI - before submission to SOLR.
[In this case, I want to search the default field AND the "abbreviation" field with this single query. I assume that's possible, I just haven't tried to write the query yet. Comments gratefully accepted.]

Solr dynamicField not searched in query without field name

I'm experimenting with the Example database in Solr 4.10 and not understanding how dynamicFields work. The schema defines
dynamicField name="*_s" type="string" indexed="true" stored="true"
If I add a new item with a new field name (say "example_s":"goober" in JSON format), a query like
?q=goober
returns no matches, while
?q=example_s:goober
will find the match. What am I missing?
I would like to see the SearchHandler from solrconfig.xml file that you are using to execute the above mentioned query.
In SearchHandler we generally have Default Query Field i.e. qf parameter.
Check that your dynamic field example_s is present in that query field list of solrconfig file else you can pass it while sending query to search handler.
Hope this will help you in resolving your problem.
If you are using the default schema, here's what's happening:
You are probably using default end-point (/select), so you get the definition of search type and parameters from that. Which means, it is default (lucene) search and the field searched is text.
The text field is an aggregate and is populated by copyField instruction from other fields.
Your dynamic field definition for *_s allows you to index the text with any name ending in _s, such as example_s. It's indexed (so you could search against it directly) and stored (so you can see it when you ask for all fields). It will not however search it as a general text. Notice that (differently from ElasticSearch), Solr strings have to be matched fully and completely. If you have some multi-word text in it, there is barely any point searching it. "goober" is one word so it's not a very good example to understand the difference here.
The easiest solution for you is add another copyField instruction:
<copyField source="*_s" dest="text"/>, then all your *_s dynamic fields would also be searchable. But notice that the search analyzers will not be the ones for *_s definition, but the ones for the text field's definition, which is not string, but text_general, defined elsewhere in the file.
As to Solr vs. ElasticSearch, they both err on the different sides of magic. Solr makes you configure the system and makes it very easy to see the exact current configuration. ElasticSearch hides all of the configuration, but you have to rediscover it the second you want to change away from the default behaviour. In the end, the result is probably similar and meets somewhere in the middle.

Extending Solr Tutorial with custom fields/core

After standing up a basic jetty Solr example. I've tried to make my own core to represent the data my company will be seeing. I made a directory structure with conf and data directories and copied core.properties, schema.xml, and solrconfig.xml from the collection1 example.
I've editted core.properties to change the core name, and I've added 31 fields (most of type text_general, indexed, stored, not required or multivalued) to the schema.
I'm pretty sure I've set it up correctly as I can see my core in the admin page drop down and interact with it. The problem is, when I feed a document designed for the new fields, I cannot get a successful query for any of the values. I believe the data is fed as I got the same command line response:
"POSTing file incidents.xml...
1 file indexed. ....
COMMITting..."
I thought, the Indexing process took more time, but when I copy a field node out of an example doc (e.g <field name="name">Apple 60 GB iPod with Video Playback Black</field> from ipod_video.xml) into a copy of my file (incidents2.xml) searches on any of those strings instantly succeed.
The best example of my issue is both files have the field:
<field name="Brand" type="text_general" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="Brand">APPLE</field>
However, only the second document (with the aforementioned name field) is returned with a query for apple.
Thanks for reading this far; my questions are:
1) Is there a way to dump the analysis/tokenization phase of document ingestion? Either I don't understand it or the Analysis tab isn't designed for this. The debugQuery=true parameter gives relevance score data but no explanation of why a document was excluded.
2) Once I solve my overall issue, I we would like to have large text fields included in the index, can I wrap long form text in CDATA blocks in solr?
Thanks again.
To debug any query issues in Solr, there's a few useful things to check. You might also want to add the output of your analysis page and the field you're having issues with from your schema.xml to your question. It's also a good idea to have a smaller core to work with (use three or four fields just to get started and get it to work) when trying to debug any indexing issues.
Are the documents actually in the index? - Perform a search for : (q=*:*) to make sure that there are any documents present in the index. *:* is a shortcut that means "give me all documents regardless of value". If there are no documents returned, there is no content in the index, and any attempt to search it will give zero results.
Check the logs - Make sure that SolrLogging is set up, so you get any errors thrown in your log. That way you can see if there's anything in particular going wrong when the query or indexing is taking place, something which would result in the query never being performed or any documents being added to the index.
Use the Analysis page - If you have documents in the index, but they're not returned for the queries you're making, select the field you're querying at the analysis page and add both the value given when indexing (in the index column) and the value used when querying (in the query field). The page will then generate all the steps taken both when indexing and querying, and show you the token stream at each step. If the tokens match, they will be highlighted with a different background color, and depending on your setting, you might require all tokens present on the query side to be present on the indexing side (i.e. every token AND-ed together). Start with searching for a single token on the query side for that reason.
If you still doesn't have any hits, but have the documents in the index, be more specific. :-)
And yes, you can use CDATA.

How to query a specific document by id

From a previous query I already have the document ID (the uniqueKey in this schema is 'track_id') of the document I'm interested in.
Then I would like to query a sequence of words on that document while highlighting the match.
I can't seem to be able to combine the search parameters in a successful way (all my google searches return purple links :\ ), although I've already tried many combinations these past few days. I also know the field where the matches will be if that's any use in terms of improving match speed.
I'm guessing it should be something like this:
/select?q=track_id:{key_i_already_have} AND/&/{part_I_dont_know} word1 word2 word3
Currently, since I can't combine these two search parameters, I'm only querying the words and thus getting several results from several documents.
Thanks in advance.
From Solr 4 you can use the realtime get, which is much more faster than searching the index by id.
http://localhost:8983/solr/get?ids=id1,id2,id3
For index updates to be visible (searchable), some kind of commit must reopen a searcher to a new point-in-time view of the index. The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher. This is primarily useful when using Solr as a NoSQL data store and not just a search index.
You may try applying Filter Query for id. So it will filter your search query to that id, and then search in that document for all the keywords, and highlight them.
Your query will look like:
/select?fq=track_id:DOC_ID&q=word1 word2 word3
Just make sure your "id" field in schema.xml is defined of the type string to apply filter queries on it.
<field name="id" type="string" indexed="true" stored="true" required="true" />

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources