Azure search highlight not working if the sentence has special character - azure-cognitive-search

Team,
I am using below sample azure search query
`{azuresearchurl}/docs?api-version=2019-05-06?search=examples&highlight=title&$select=title
I am getting below response.
"#search.highlights": {
"title": [
": From <em>Examples</em> of our Projects"
]
},
"title": "Can we execute? : From Examples of our Projects"
}
In the above result, we can see text before '?' not added in highlight field value.
How to fix this issue. in highlight field I need to get same text which is available in search result field.

The highlight feature only returns the fragment of text where the searched term is found. Those fragments are generated by finding sentence boundaries. In this case, the interrogation mark indicates the end of a different sentence, so it's not included in the fragment.

Related

How to respect Solr conditions in order

I need to send a query to Solr with two conditions in OR, instead of sending the query twice:
{!complexphrase inOrder=true}title:"some tests*" || title:(some tests*)
.. where, in the first condition, I want the precise result. If not found, then it goes to OR and retrieves any result that has at least one word in the search phrase. But when I launch the query, I still get the right condition results first.
Here is my data:
{
"title": "some values"
},
{
"title": "data tests"
},
{
"title": "some tests"
}
The response I need is:
{
"title": "some tests"
},
{
"title": "data tests"
},
{
"title": "some values"
}
I already tried using boosting, like so: {!complexphrase inOrder=true}title:"some tests*"^2 || title:(some tests*)^1 but didn't work. I am NOT able to change the Solr configuration since it's a software that's already in production and not managed by me. I even cannot sort by rating, infact I don't receive best occurences first. Solr version is 7.3.1. Any help is appreciated, thanks in advance!
I solved it with a work-around. Instead of putting two OR conditions, I managed to apply a working boost on the title field, using edismax.
What I had to change in my Java application was:
From
SolrQuery q = new SolrQuery("*");
To
SolrQuery q = new SolrQuery("(" + query + "*)");
and added:
q.set("defType", "edismax");
q.set("qf", "title^100");
Now, I'm not making a precise query but I'm retrieving documents with a higher match first without changing any configuration! The Solr Frontend equivalent is similar, but the query should look like this:
http://localhost:8983/solr/mycollection/select?defType=edismax&q=(some%20test*)&qf=title^100
Hope it helps someone

Logic Apps: Data not parsed on the second query inside "foreach" loop

Hi Logic Apps Experts,
I'd like to check with you some of the foreach loop behaviors, and to check whether this is expected/ is there any workarounds for this.
So the steps with this logicapps is to "Run query and list results" search will do is searching SecurityIncident table. And foreach SecurityIncident record, find a corresponding SecurityAlert record in "Using IncidentId-Query Details of the Alert" step.
For the first query, the data is parsed properly and each fields can be used.
However, after the second query I can only use 'Body' and 'value' in the steps. Which contains unparsed values.
Questions:
Is this behavior expected?
Is there a better way to ensure the second query is parsed?
Any other room of improvements advice are greatly appreciated.
Thank you!
The selection list affected by the required type/format of the input box in the action. So I think the behavior is expected.
If you want to get the parsed field from the query action, you can use expression. I'm not clear about the details of query result body, here I just provide a sample for your reference:
For example, if the query result shows like:
{
"body": [
{
"TenantId": "111",
"xxxx": "xxx"
},
{
"TenantId": "222",
"xxxx": "xxx"
}
]
}
Then you can use the expression body('Run_query_and_list_results')[0]?['TenantId'] to get the value of first TenantId. In a word, use [index] to get array, use ?['key'] to get map.

Characters to split the user-query in Vespa engine

We split the user-query on ascii spaces to create a weakAnd(...).
The user-input "Watch【Docudrama】" does not contain a whitespace - but throws an error.
Question: Which codepoints beside whitespaces should be used to split the query?
YQL (fails):
select * from post where text contains "Watch【Docudrama】" limit 1;
YQL (works):
select * from post where weakAnd(text contains "Watch",text contains "【Docudrama】") limit 1;
Error message:
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 0
},
"errors": [
{
"code": 4,
"summary": "Invalid query parameter",
"source": "content",
"message": "Can not add WORD_ALTERNATIVES text:[ Watch【Docudrama】(1.0) watch(0.7) ] to a segment phrase"
}
]
}
}
Are you sure you need to use WAND for this? Try setting the user query grammar to "any" (default is "all"), which will use the "OR" operator for user supplied terms. There is an example here: https://docs.vespa.ai/documentation/reference/query-language-reference.html#userinput
The process of splitting up the query is known as Tokenization. This is a complex and language dependent process, Vespa uses Apache OpenNLP to do this (and more): https://docs.vespa.ai/documentation/linguistics.html has more information and also references to the code which performs this operation.
If you really want to use WAND, instead of reimplementing the query parsing logic outside Vespa, I suggest you create a Java searcher which descends the query tree and modifies it by replacing the created AndItem with WeakAndItem. See https://docs.vespa.ai/documentation/searcher-development.html and the code example here: https://docs.vespa.ai/documentation/advanced-ranking.html

How to perform a full-text search in Vespa?

I am trying to do a full-text search on a field of some documents, and I was looking for your advices on how to do so. I first tried to do this type of request:
GET http://localhost:8080/search/?query=lord+of+the+rings
But it was returning me the documents where the field was an exact match and contained no other information than the given string , so I tried the equivalent in YQL:
GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text CONTAINS "lord of the rings";
And I had the exact same results. But when further reading the documentation I fell upon the MATCHES instruction, and it indeed gives me the results I'm seem to be looking for, by doing this kind of request:
GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";
Though I don't know why, for some requests of this type I encountered a timeout error of this type:
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 0
},
"errors": [
{
"code": 12,
"summary": "Timed out",
"source": "site",
"message": "Timeout while waiting for sc0.num0"
}
]
}
}
So I solved this issue by adding greater than default timeout value:
GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";&timeout=20000
My question is, am I doing full-text search the right way, and how could I improve it ?
EDIT: Here is the corresponding search definition:
search site {
document site {
field text type string {
stemming: none
normalizing: none
indexing: attribute
}
field title type string {
stemming: none
normalizing: none
indexing: attribute
}
}
fieldset default {
fields: title, text
}
rank-profile post inherits default {
rank-type text: about
rank-type title: about
first-phase {
expression: nativeRank(title, text)
}
}
}
What does your search definition file look like? I suspect you have put your text content in an "attribute" field, which defaults to "word match" semantics. You probably want "text match" semantics which means you'll need to put your content in an "index" type field.
https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#match
The "MATCHES" operator you are using interprets your input as a regular expression, which is powerful, but slow as it applies the regular expression on all attributes (further optimizations to something like https://swtch.com/~rsc/regexp/regexp4.html are possible but not currently implemented).

MongoDB Nested - Search TrackingValues

We need to perform a phrased-based search (like Google's "") over a nested array of key words, by order.
For instance, let us suppose the data looks like:
{
Name: "question",
body: [
"We",
"need",
"to",
"perform",
"a",
"search",
"like",
"google's"
]
}
By searching: "we search" – I will get no result, but the document will be returned by searching any of the followings: "we need", "to perform a search", "we" etc.
I do need to tokenize the words for encryption, so saving them as a string could not do for me here…
Is that any possible?
Folks, I tried to solve it with the technical support of MongoDB. Apparently, there is not out-of-the-box solution.
I have been able to "solve" this by keeping another field, concatenating all the tokenized, encrypted words in one string, and use regex expression over it.
Not ideal, and required to duplicate some data – but it works foe our needs.

Resources