mongo search for part of string match in words array

mongo search for part of string match in words array - arrays

I want to find partial string matching in mongodb list element
for example my search string is:
"Hello world we are on mars"
my records tags are:
words : ["hell", "bubu world"]
words : ["we are", "cookie"]
words : ["are nono mars", "w"]
I want to get bask only record number 2 where one of the array elements is matched

This may not be the exact answer you are looking for. However, I have outlined my thoughts in order for you to rethink about the requirement and possible solution.
You may need to rethink about how you wanted to design the solution. You may not be able to achieve what you expected in the single Mongo query because normally the database attributes would have more text and search string would have less words. As per your question, your requirement is opposite to it.
One possible solution for a typical text search in MongoDB is "Text" Index and use "$text" and "$search" in find.
https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text
Create Text Index:-
db.collectionname.createIndex({words : "text"})
db.words.find( { $text: { $search: "Hello world we are on mars", $caseSensitive: true } } )
The result would be : 1 and 3
You can also perform phrase search by enclosing the pharse in escaped double quotes (\").

Related

OrderBy does not order correctly

I have the following query
var predicate = PredicateBuilder.True<SearchModel>();
predicate = predicate.And(x => (templateIDs.Contains(x.TemplateId))); // To match certain templates only
var products = searchContext.GetQueryable()
.Where(predicate)
.OrderBy(i => i.Title) //will order the items by display
.Take(pageSize); //pageSize is passed through and is an integer
.GetResults();
I never get the results ordered alphabetically.
If I do the following however, the results are ordered correctly.
var fullResults = searchResults.ToList().OrderBy(x=>x.Title).Take(pageSize); //pageSize is passed through and is an integer
Does anyone know why?
And If I do it the second way (.ToList().OrderBy().Take().), would it have any performance implications, as I believe the results will be ordered and paged after the results are fetched from Solr?

Sorting in Solr can be a bit confusing. Historically, there has also been some issues around sorting in the ContentSearch provider for Solr, where orderby clauses has been inversed etc. Assuming you're on a fairly recent Sitecore version, these issues has been fixed as far as I know. In order to pinpoint where your problem is, you should look in the search.log and verify that the predicate builder/content search actually performs the query you expect.
Given that the above is ok, you're probably facing a common misunderstanding on how Solr works. I'd guess your Title field is indexed as a "text" field, such as title_t_en, and not a string field. Text fields are tokenized and stemmed before being stored in Solr. Sorting will therefore be performed on the tokenized terms instead of the whole string. This means a text, such as "The quick brown fox jumps over the lazy dog", will be stemmed to something like ["quick", "brown", "fox", "jump", "over", "lazy", "dog"] and an order by statement will sort it on "brown". This may cause the sorting to look very wrong, as the result may not be sorted on the first word in the string.
One way of solving this is to save a copy of your Title field into a title_s field. You can either do this with a copy statement in the solr schema or if you want to keep the schema untouched, you can make a computed field mapped as a string. Thereby you can perform a lingual text query on the stemmed text field and order the result on the string field.

Tracking if phrase exists within a list of terms

I am having difficulty finding a formula to do exactly what I am looking for.
I have two lists, one containing search phrases like ("Sound bars for tv") and another list that contains individual terms like ("TV", "Sound", "bars").
My goal is to see if any of the search phrases match for each keyword within the individual term list.
So for "Sound bars for TV", I would need each of those words to be in the term list for it to come back as a TRUE. Also, and more complicated, if I have the search phrase "Soundbar" and "Sound Bar" these should both pass if both terms are in the list.
Any idea what is the best way to approach this.
I have tried the following unsuccessfully:
Individual terms = the list of terms like "TV", "Sound", "Bars"
Phrase = search phrases like "Sound bars for TV"
The goal would be to create a formula that says "Yes" every word in "Sound bars for TV" is within the Individual terms list.
=SUMPRODUCT(--ISNUMBER(SEARCH(individual terms,phrase)))=COUNTA(individual terms)
=IF(ISNUMBER(SEARCH(phrase,individual terms)), "Yes", "No")
=SUMPRODUCT(--ISNUMBER(SEARCH(individual terms,phrase)))>0

Let's pretend you have a data setup like this:
Column D was made into an Excel table (with Insert -> Table) and named tblTerms. This lets you add and remove terms from the list dynamically.
Now in cell B2 and copied down is this formula:
=SUMPRODUCT(--(COUNTIF(tblTerms[Search Terms],TRIM(MID(SUBSTITUTE(A2," ",REPT(" ",LEN(A2))),LEN(A2)*(ROW(A$1:INDEX(A:A,LEN(A2)-LEN(SUBSTITUTE(A2," ",""))+1))-1)+1,LEN(A2))))=0))=0
Note that you'll have to add "Soundbars" separately to the search terms list. There's not really any way for Excel to recognize individual words in a compound word, and attempting to do that would be extremely unwieldy, even with VBA.

This will parse the string and count the matches then compare that to the number of "words" in the string. If they match then it will return Yes
=IF(SUMPRODUCT(COUNTIF(D:D,TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",999)),(ROW($XFD$1:INDEX($XFD:$XFD,LEN(A1)-LEN(SUBSTITUTE(A1," ",""))+1))-1)*999+1,999))))=LEN(A1)-LEN(SUBSTITUTE(A1," ",""))+1,"Yes","No")

Cloudant Search: match a whole phrase using a full text index

I want to be able to match a whole phrase using a full text index, but I can't seem to work out how to do it. The Lucene Query Parser syntax states that:
A Phrase is a group of words surrounded by double quotes such as "hello dolly".
But when I specify the following selector, it returns all records with either "sign" or "design" in the name but I would expect it to return only those with "sign design".
POST https://foo.cloudant.com/remote/_find
{"selector":{"$text":"\"SIGN DESIGN\""}}
My index is defined as follows:
db.index({
name: 'subbies_text',
type: 'text',
index: {},
})
Alternatively, is it possible to do a substring match on a field in json index?

You are using the index API to create the index, correct?
Would you please try creating this design document?
{ "_id": '_design/library',
"indexes": {
"subbies_text": {
"analyzer": {
"name":'standard'
},
"index": "function(doc) { index('XXX', doc.YYY); }"
}
}
}
(However, change the "XXX" and "YYY" to your field name.

If you know how many maximum words to allow, you can make a searchable index with a map-reduce view. I think it is not ideal, but just for posterity:
You can emit() every consecutive pair of words that you see. So, for example, given the phrase "The quick brown fox" then you can emit ["the","quick"], ["quick","brown"], ["brown", "fox"]. I think this can be nice and simple, but it's really only appropriate for small amounts of data. The index will likely grow too large.

If you want to use cloudant search, you should create a search index first just like JasonSmith said. Then you can use this search index to do the specific queries.
Suppose you have a document which has a "name:SIGNDESIN" field.
1.If you want to query a whole phrase ,you can query like this:
curl https://<username:password>#<username>.cloudant.com/db/_design/<design_doc>/_search/<searchname>?q=name:SIGNDESIN | jq .
2.If you want to query a substring phrase, you can query like this:
curl https://<username:password>#<username>.cloudant.com/db/_design/<design_doc>/_search/<searchname>?q=name:SI* | jq .

MongoDB - Search string for an element from an array of words

How do i check if a string contains a list of words from an array. I know how do it with one word however how do with an array?
I've tried the following with no luck:
db.getCollection('questions').find( {$text:{$search:{$in: ['chips', 'mars']} }} )
Any ideas would be appreciated.
Thanks

db.getCollection('questions').find( {$text:{$search:"chips mars" }} )
to find questions with either "chips" or "mars" or both.
db.getCollection('questions').find( {$text:{$search:"\"chips\" \"mars\"" }} )
to find questions with both "chips" and "mars".
Docs for text index search reads:
$search is a string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase.

Lucene search for a filename, using WordDelimiterFilterFactory

If I search for toto.pdf, a token "pdf" is created for the search tI'm indexing some data, including filenames.
What I want is, according to indexed filename:
MySupercool123girlfriend.jpg
And to be able tosearch it with:
supercool
supercool123
123
girlfriend
jpg
So at index it pretty easy to be able to use WordDelimiterFilterFactory so that some tokens are created, like:
my
supercool
mysupercool
mysupercool123
supercool123
123
girlfriend
jpg
girlfriend.jgp
etc...
The matter is that at search time, I don't really know what I should do.
If I use WordDelimiterFilterFactory at search time, MySupercool123girlfriend.jpg would match even with toto.jpg because in both cases a token jpg is created.
toto.jpg should not be in the result list at all, so it's not a solution for me to have both results with the appropriate one having a better scoring
Have you any recommendation to index and search for filenames?

For this specific example of yours i.e. if the search is for MySupercool123girlfriend.jpg and you want this to only return documents that have the entire string in it, you can keep a copyField, say named filename_str, whose fieldType is string. String matches will ensure you that you get an exact match. This could be a first-level "exact match" search you do.
However, I am guessing that you would want a search for 123girlfriend.jpg to return the document containing MySupercool123girlfriend.jpg. You can do a 2nd level search for this. Beginning Solr 4.0 you can do a regex search like
q=filename_str:/.*123girlfriend.jpg/
(This regex query should also work for filename field itself, if you are using preserveOriginal=1 in WordDelimiterFilterFactory at index time.)
Else you can do a leading wild-card search, which works in earlier Solr versions too.
If you also want MySupercool.jpg to match MySupercool123girlfriend.jpg, then I guess you would have to manually do the work of DelimiterFilterFactory and construct a regex query like
q=filename_str:/.*My.*Supercool.*.jpg/
Another issue is that jpg is going to match lot of documents, so you may want to split the filename and the extension and keep them as separate fields.

Can you come up with some meaningful for your use case DisMax mm parameter?
See http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
E.g.
mm=100% and "MySupercool123girlfriend.jpg" would match only filenames that have all ["my", "supercool", "123", "girlfriend", "jpg"] terms in them
You can find some less strict but still giving relevant results expression. See http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/util/doc-files/min-should-match.html