Salesforce SOSL: matching fields exactly - salesforce

Afternoon All.
I am using a simple SOSL query like this
find {"2099"} in phone fields
and it is returning objects that match 12320995, as an example. Is there a way to set up the query to only return exact matches?

See http://www.salesforce.com/us/developer/docs/apexcode/Content/langCon_apex_SOQL.htm
Which has the example
List> searchList = [FIND 'map*' IN ALL FIELDS RETURNING Account (id, name), Contact, Opportunity, Lead];
In your case, specify the objects you want to return.
Specify the fields you want to search in.

According to the documentation, "Use quotation marks around search terms to find an exact phrase match."
The " " operator should return exact matches. I haven't tested it but if it does not work it could be a bug. One suggestion is that you scan through the results with your own code and filter out the exact matches.

Related

Solr OR query on a text field

How to perform a simple query on a text field with an OR condition? Something like name:ABC OR name:XYZ so the resulting set would contain only those docs where name is exactly "XYZ" or "ABC"
Dug tons of manuals, cannot figure this out.
I use Solr 5.5.0
Update: Upgraded to Solr 6.6.0, still cannot figure it out. Below are illustrations to demonstrate my issue:
This works:
This works too:
This still works:
But this does not! Omg why!?
There are many ways to perform OR query. Below I have listed some of them. You can select any of it.
[Simple Query]
q=name:(XYZ OR ABC)
[Lucene Query Parser]
q={!lucene q.op=OR df=name v="XYZ ABC"}
Your syntax is right, but what you're asking for isn't what text fields are made for. A text field is tokenized (split into multiple tokens), and each token is searched by itself. So if the text inserted is "ABC DEF GHI", it will be split into three separate tokens, namely "ABC", "DEF" and "GHI". So when you're searching field:ABC, you're really asking for any document that has the token "ABC" somewhere.
Since you want to perform an exact match, you want to query against a field that is defined as a string field, as this will keep the value verbatim (including casing, so the matching will be case sensitive). You can tell Solr to index the same content into multiple fields by adding a copyFile instruction, telling it to take the content submitted for field foo and also copying it into field bar, allowing you to perform both an exact match if needed and a more general search if necessary.
If you need to perform exact, but case insensitive, searches, you can use a KeywordTokenizer - the KeywordTokenizer does nothing, keeping the whole string as a single token, before allowing you to add filters to the analysis chain. By adding a LowercaseFilter you tell Solr to lowercase the string as well before storing it (or querying for it).
You can use the "Analysis" page under the Solr admin page to experiment and see how content for your field is being processed for each step.
After that querying as string_field:ABC OR string_field:XYZ should do what you want (or string_field:(ABC OR XYZ) or a few other ways to express the same.
A wacky workaround I've just come up with:

Multiple word search with Solr

I have a problem with Solr where I can't reconcile inexact search with multiple words.
Currently my Solr is configured thus: query=ctnt_val:*keyword* where ctnt_val is the field I'm searching and keyword the value I pass.
So if I type lon it will return all results with longer, London, ... which is what I want.
The problem is that if my query is several words long (e.g., Gotham City), it returns all results containing Gotham and all results containing City, instead of returning only all results containing Gotham City.
If I change the query as query=ctnt_val:"keyword" it works but then I lose the ability to do inexact searches (lon will no longer return London). If I do query=ctnt_val:*"keyword"* I get ALL results from my DB, which is clearly not what I want.
Any ideas?
The default search handler uses to have the OR as the default query operator.
You can check using q=ctnt_val:gotham\ city&q.op=AND which will set the query operator to AND and would make all the query terms as mandatory.
gotham\ city - \ is to escape the space from the query

What fieldtype to choose and how to look my query

The problem is this: I've got a column (named name)which consist of names for Example "Иван Кирилов Петров", "Нина Семова Мариножа" and so on.
So I want to make a query which will get all the names that has first name 'Иван' and last name 'Петров'; The second name doesn't matter so i will put * wildcard character.
Also there is a bigger problem: I should be able in a case if the user writes "Иван Кирилов Петров" to find this exact person
what I have tried :
I made the field text_ws type
and tested the following queries:
q=name:Иван*Петров
perfect - it finds what I want - all the names with first Иван and last Петров;
But then i want to find Иван Кирилов Петров i get no response because I want to make an exact search and my type should be string
How can I solve this!
Try adding autoGeneratePhraseQueries="true" flag on your text_ws type definition. And use debugQuery=true flag to see how it does the matches against the field. If the basic thing work, you can then look at pf3 flag in eDismax configuration to boost the query matches.
Solr also comes with dedicated Token Filters for Russian, but you probably don't care about that for the people's names.
I don't think you need a wild-card query. If you are only splitting on white-space during index time (text_ws) and you get complete first, last and/or middle names for query, you can do an AND query like
q=name:(Иван AND Петров)
or
q=name:(ИВАН AND МИНЧЕВ AND ПЕТРОВ)
Update: After your comment, I see that this will do a bag-of-words search and won't preserve the order. I guess you need to keep a string copy field of name, say name_str, which will give you more search options. For example, if there are 2 spaces in the query, meaning you get the first, middle and last names, then you can do an exact match on name_str like
q=name_str:"ИВАН%20МИНЧЕВ%20ПЕТРОВ"
If you are using Solr 4.0 and above, then regex query on the string field can help you. You can do
q=name_str:/ИВАН.*ПЕТРОВ/
will match anything that begins with ИВАН and ends with ПЕТРОВ.
or even
q=name_str:/Иван.*?Кирилов.*?Петров/
Unfortunately, there is no Solr wiki page on regex search yet, but you can google around.
You need to distinguish between the different types of queries you want to do and do different searches. Maybe give a check-box to your users asking if they want an exact match or not.

Different indexing and search strategies on same field without doubling index size?

For a phrase search, we want to bring up results only if there's an exact match (without ignoring stopwords). If it's a non-phrase search, we are fine displaying results even if the root form of the word matches etc.
We currently pass our data through standardTokenizer, StopFilter, PorterStemFilter and LowerCaseFilter. Due to this when user wants to search for "password management", search brings up results containing "password manager".
If I remove StemFilter, then I will not be able to match for the root form of the word for non-phrase queries. I was thinking if I should index the same data as part of two fields in document.
For the first field (to be used for phrase searches), following tokenizers/filters will be used:
StandardTokenizer, LowerCaseFilter
For the second field (Non-phrase searches)
StandardTokenizer, StopFilter, PorterStemFilter, LowerCaseFilter
Now, based on whether it's a phrase search or not, I need to rewrite user's query to search in the appropriate field.
Is this the right way to address this issue? Is there any other way to achieve this without doubling index size?
let's say user's query is
summary:"Furthermore, we should also fix this"
Internally this will be translated to
summary_field1:"Furthermore, we should also fix this"
If user's query is
summary:(Furthermore, we should also fix this)
Internally this will be translated to
+summary_field2:furthermor +summary_field2:we +summary_field2:should +summary_field2:also +summary_field2:fix
both summary_field1 and summary_field2 index the same data. summary_field1 passes through only StandardTokenizer and LowerCaseFilter, whereas summary_field2 passes through StandardTokenizer, StopFilter, PorterStemFilter and LowerCaseFilter.
Please let me know if I'm missing something here.
By defining two different fields you can search for exact matches.
By using boosts you can also bring results in one query. For example:
(firstField:"password management")^5 OR (secondField:"pasword management")^1

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources