Cloudant Search: match a whole phrase using a full text index - cloudant

I want to be able to match a whole phrase using a full text index, but I can't seem to work out how to do it. The Lucene Query Parser syntax states that:
A Phrase is a group of words surrounded by double quotes such as "hello dolly".
But when I specify the following selector, it returns all records with either "sign" or "design" in the name but I would expect it to return only those with "sign design".
POST https://foo.cloudant.com/remote/_find
{"selector":{"$text":"\"SIGN DESIGN\""}}
My index is defined as follows:
db.index({
name: 'subbies_text',
type: 'text',
index: {},
})
Alternatively, is it possible to do a substring match on a field in json index?

You are using the index API to create the index, correct?
Would you please try creating this design document?
{ "_id": '_design/library',
"indexes": {
"subbies_text": {
"analyzer": {
"name":'standard'
},
"index": "function(doc) { index('XXX', doc.YYY); }"
}
}
}
(However, change the "XXX" and "YYY" to your field name.

If you know how many maximum words to allow, you can make a searchable index with a map-reduce view. I think it is not ideal, but just for posterity:
You can emit() every consecutive pair of words that you see. So, for example, given the phrase "The quick brown fox" then you can emit ["the","quick"], ["quick","brown"], ["brown", "fox"]. I think this can be nice and simple, but it's really only appropriate for small amounts of data. The index will likely grow too large.

If you want to use cloudant search, you should create a search index first just like JasonSmith said. Then you can use this search index to do the specific queries.
Suppose you have a document which has a "name:SIGNDESIN" field.
1.If you want to query a whole phrase ,you can query like this:
curl https://<username:password>#<username>.cloudant.com/db/_design/<design_doc>/_search/<searchname>?q=name:SIGNDESIN | jq .
2.If you want to query a substring phrase, you can query like this:
curl https://<username:password>#<username>.cloudant.com/db/_design/<design_doc>/_search/<searchname>?q=name:SI* | jq .

Related

Solr syntax for phrase query

I have a field with definition:
"replace-field": {
"name":"search_words",
"type":"lowercase",
"stored":true,
"indexed": true,
"multiValued": true
}
that contains sentences as array (thus multiValued: true):
"id":500
"search_words":["How much oil should you pour into the engine",
"How important is engine oil?]
How should I create a query thatwould return that document (with id = 500) when user inputs phrase "engine oil"?
With single term queries I can user *engine* and it would find that document becasue engine is in the middle of the sentence but I can't find a way to be able to seearch for phrases in sentences. Is it even possible using solr?
Solr supports phrase search, and is what it was actually designed for. Wildcard searches are not really the way you should use Solr by default - the field type should tell Solr how to process the text in the field to make you get hits when querying it in a regular way.
In this case the standard text_en would probably work fine, or a field definition with a Standard Tokenizer and a lowercasefilter (and possibly a WordDelimiterGraphFilter to get rid of special characters).
The query would then be search_words:"engine oil".

MongoDB OR with Regex not using compound index

Really at wits end here; I'm using the following query to search a collection with about 300K documents
query = { $or: [
{description: { $regex: ".*app.*"}},
{username: { $regex: ".*app.*"}},
]};
and simply putting that in a .find() function. It is tremendously slow. Like every single query takes at least 20 seconds.
I have tried individual indices on both username and description, and now have a compound index on {description: 1, username: 1}, but it does not seem to make a difference at all. If I check the MongoDB live metrics, it does not use the index at all.
Any pointers would be greatly appreciated.
Regex using partial string matching never use an index, because, as the name implies, with a partial string match it has no idea where to start looking for the match, and has to go over all strings.
As a solution, you can hook your database up to something like Lucene, which specializes in such queries.

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.
The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Queries with stopwords and searchMode=all return no results

if I have a document with this words in the content:
"dolor de cabeza" using the spanish analyzer, searching for "dolor de cabeza" returns the document ok. but using dolor de cabeza (without quotes) returns nothing.
Actually, every stop word in the search query will make it to return no documents when using queryType=Full and searchMode=All.
the problem with using the quote approach is that it will only match the exact sentence.
is there any workaround? I think this is a BUG.
Short version:
This happens when you issue a search query with searchMode=All against fields that use analyzers that process stopwords differently. Please make sure you scope your query only to fields analyzed with the same analyzer using the searchFields search request parameter. Alternatively, you can set the same searchAnalyzer on all your searchable fields that removes stopwords from your query in the same way. To learn more about custom analyzers and how to search indexAnalyzer and searchAnalyzer independently, go here.
Long version:
Let’s take an index with two fields where one is analyzed with English Lucene analyzer, and the other with standard (default) analyzer.
{
"fields":[
{
"name":"docId",
"type":"Edm.String",
"key":true,
"searchable":false
},
{
"name":"field1",
"type":"Edm.String",
"analyzer":"en.lucene"
},
{
"name":"field2",
"type":"Edm.String"
}
]
}
Let’s add these two documents:
{
"value":[
{
"docId":"1",
"field1":"Waiting for a bus",
"field2":"Exploring cosmos"
},
{
"docId":"2",
"field1":"Run to the hills",
"field2":"run for your life"
}
]
}
The following query doesn’t return any results search=wait+for&searchMode=all
It's because terms in this query are processed independently for each of the fields in the index by the analyzer defined for that field.
For field1 the query becomes search=wait (‘for’ was removed as it is a stop word)
For field2 it stays search=wait+for (the standard analyzer doesn’t remove stop words).
Only the first document matches ‘wait’ (in the first field), however the second field in the first document doesn’t match ‘for’, thus no results. When you set searchMode=all you tell the search engine that all query terms must be matched at least once.
For comparison, another query with a stopword search=running+for&searchMode=all returns the second document as a result. Term ‘running’ matches in field1 (it’s stemmed) and ‘for’ matches in field2.
To learn more about query processing in Azure Search read How full text search works in Azure Search

mongo search for part of string match in words array

I want to find partial string matching in mongodb list element
for example my search string is:
"Hello world we are on mars"
my records tags are:
words : ["hell", "bubu world"]
words : ["we are", "cookie"]
words : ["are nono mars", "w"]
I want to get bask only record number 2 where one of the array elements is matched
This may not be the exact answer you are looking for. However, I have outlined my thoughts in order for you to rethink about the requirement and possible solution.
You may need to rethink about how you wanted to design the solution. You may not be able to achieve what you expected in the single Mongo query because normally the database attributes would have more text and search string would have less words. As per your question, your requirement is opposite to it.
One possible solution for a typical text search in MongoDB is "Text" Index and use "$text" and "$search" in find.
https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text
Create Text Index:-
db.collectionname.createIndex({words : "text"})
db.words.find( { $text: { $search: "Hello world we are on mars", $caseSensitive: true } } )
The result would be : 1 and 3
You can also perform phrase search by enclosing the pharse in escaped double quotes (\").

Resources