How to respect Solr conditions in order - solr

I need to send a query to Solr with two conditions in OR, instead of sending the query twice:
{!complexphrase inOrder=true}title:"some tests*" || title:(some tests*)
.. where, in the first condition, I want the precise result. If not found, then it goes to OR and retrieves any result that has at least one word in the search phrase. But when I launch the query, I still get the right condition results first.
Here is my data:
{
"title": "some values"
},
{
"title": "data tests"
},
{
"title": "some tests"
}
The response I need is:
{
"title": "some tests"
},
{
"title": "data tests"
},
{
"title": "some values"
}
I already tried using boosting, like so: {!complexphrase inOrder=true}title:"some tests*"^2 || title:(some tests*)^1 but didn't work. I am NOT able to change the Solr configuration since it's a software that's already in production and not managed by me. I even cannot sort by rating, infact I don't receive best occurences first. Solr version is 7.3.1. Any help is appreciated, thanks in advance!

I solved it with a work-around. Instead of putting two OR conditions, I managed to apply a working boost on the title field, using edismax.
What I had to change in my Java application was:
From
SolrQuery q = new SolrQuery("*");
To
SolrQuery q = new SolrQuery("(" + query + "*)");
and added:
q.set("defType", "edismax");
q.set("qf", "title^100");
Now, I'm not making a precise query but I'm retrieving documents with a higher match first without changing any configuration! The Solr Frontend equivalent is similar, but the query should look like this:
http://localhost:8983/solr/mycollection/select?defType=edismax&q=(some%20test*)&qf=title^100
Hope it helps someone

Related

Logic Apps: Data not parsed on the second query inside "foreach" loop

Hi Logic Apps Experts,
I'd like to check with you some of the foreach loop behaviors, and to check whether this is expected/ is there any workarounds for this.
So the steps with this logicapps is to "Run query and list results" search will do is searching SecurityIncident table. And foreach SecurityIncident record, find a corresponding SecurityAlert record in "Using IncidentId-Query Details of the Alert" step.
For the first query, the data is parsed properly and each fields can be used.
However, after the second query I can only use 'Body' and 'value' in the steps. Which contains unparsed values.
Questions:
Is this behavior expected?
Is there a better way to ensure the second query is parsed?
Any other room of improvements advice are greatly appreciated.
Thank you!
The selection list affected by the required type/format of the input box in the action. So I think the behavior is expected.
If you want to get the parsed field from the query action, you can use expression. I'm not clear about the details of query result body, here I just provide a sample for your reference:
For example, if the query result shows like:
{
"body": [
{
"TenantId": "111",
"xxxx": "xxx"
},
{
"TenantId": "222",
"xxxx": "xxx"
}
]
}
Then you can use the expression body('Run_query_and_list_results')[0]?['TenantId'] to get the value of first TenantId. In a word, use [index] to get array, use ?['key'] to get map.

Characters to split the user-query in Vespa engine

We split the user-query on ascii spaces to create a weakAnd(...).
The user-input "Watch【Docudrama】" does not contain a whitespace - but throws an error.
Question: Which codepoints beside whitespaces should be used to split the query?
YQL (fails):
select * from post where text contains "Watch【Docudrama】" limit 1;
YQL (works):
select * from post where weakAnd(text contains "Watch",text contains "【Docudrama】") limit 1;
Error message:
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 0
},
"errors": [
{
"code": 4,
"summary": "Invalid query parameter",
"source": "content",
"message": "Can not add WORD_ALTERNATIVES text:[ Watch【Docudrama】(1.0) watch(0.7) ] to a segment phrase"
}
]
}
}
Are you sure you need to use WAND for this? Try setting the user query grammar to "any" (default is "all"), which will use the "OR" operator for user supplied terms. There is an example here: https://docs.vespa.ai/documentation/reference/query-language-reference.html#userinput
The process of splitting up the query is known as Tokenization. This is a complex and language dependent process, Vespa uses Apache OpenNLP to do this (and more): https://docs.vespa.ai/documentation/linguistics.html has more information and also references to the code which performs this operation.
If you really want to use WAND, instead of reimplementing the query parsing logic outside Vespa, I suggest you create a Java searcher which descends the query tree and modifies it by replacing the created AndItem with WeakAndItem. See https://docs.vespa.ai/documentation/searcher-development.html and the code example here: https://docs.vespa.ai/documentation/advanced-ranking.html

Is it possible to run two scripts instead of sending two queries in Elasticsearch?

I have a simple question.
I'm trying to append "tags" to an Elasticsearch array.
I find it hard to wrap my head around the scripting function in Elasticsearch, but found two queries that does the job, but both has to be called. Meaning that PHP (in my case), has to send two POST requests to the Elasticsearch database, which I think might cause some problems in the long term.
I think it would be safer to send it all in just one query.
The queries bellow is to update the votes of an array called tags from 1, to 2. I'm sure there are better ways of doing this than I have done.
Here are the queries I use:
I'm not sure why this votes: 2 are there, but it seems to do the job of deleting the tag. As I said, I find it hard to wrap my head around scripting in Elasticsearch.
POST /db2/links/1/_update
{
"script": {
"inline": "for(int i=0;i<ctx._source.tags.size();i++){if(ctx._source.tags[i].name==\"tagname\"){ctx._source.tags.remove(i)}}",
"params": {
"votes": 2
}
}
}
This query is for appending the tag to the tags array of the document
POST /db2/links/1/_update
{
"script": {
"inline": "ctx._source.tags.add(params.appendtags)",
"params": {
"appendtags": {
"name": "tagname",
"votes": 2
}
}
}
}
Are there any ways to easily combine these two scripts together in one query?
You can use bulk api to make multiple updates in a single call.
POST _bulk
{"update":{"_id":"1","_type":"links","_index":"db2","retry_on_conflict":3}}
{"script":{"inline":"for(int i=0;i<ctx._source.tags.size();i++){if(ctx._source.tags[i].name==\"tagname\"){ctx._source.tags.remove(i)}}","params":{"votes":2}}}
{"update":{"_id":"1","_type":"links","_index":"db2","retry_on_conflict":3}}
{"script":{"inline":"ctx._source.tags.add(params.appendtags)","params":{"appendtags":{"name":"tagname","votes":2}}}}

MongoDB Nested - Search TrackingValues

We need to perform a phrased-based search (like Google's "") over a nested array of key words, by order.
For instance, let us suppose the data looks like:
{
Name: "question",
body: [
"We",
"need",
"to",
"perform",
"a",
"search",
"like",
"google's"
]
}
By searching: "we search" – I will get no result, but the document will be returned by searching any of the followings: "we need", "to perform a search", "we" etc.
I do need to tokenize the words for encryption, so saving them as a string could not do for me here…
Is that any possible?
Folks, I tried to solve it with the technical support of MongoDB. Apparently, there is not out-of-the-box solution.
I have been able to "solve" this by keeping another field, concatenating all the tokenized, encrypted words in one string, and use regex expression over it.
Not ideal, and required to duplicate some data – but it works foe our needs.

How do I get all unique documents by a max field

I am working on a search feature for a Liferay 6.2 app, but I am struggling with how to get the latest articles.
For reasons, the client wants to track all versions of the Liferay Journal Articles in Solr. This means that every "version" gets stored as a separate document with an incrementing version field. For the purpose of the search, I need to grab the latest one.
For example, if I have a Journal Article like this in Solr:
[{
articleId:"123456",
title:"Sample Doc 1",
content:"abc 123 xyz",
version:"1.0"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"1111",
version:"1.0"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"2222",
version:"1.1"
},
{
articleId:"123456",
title:"Sample Doc 1",
content:"xxx xxx 1234556",
version:"1.1"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"3333",
version:"1.2"
}]
And I queried all documents I would expect the results:
[{
articleId:"123456",
title:"Sample Doc 1",
content:"xxx xxx 1234556",
version:"1.1"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"3333",
version:"1.2"
}]
Noticing that I only retrieved each unique articleId that had the max version.
Exact versions I am working on are:
Liferay 6.2.ee sp11 (with some patches)
Solr 4.10.4 under Tomcat 7.0.64
I tried googling for answers, but I am not sure what I am googling for here. I don't think facets are the answer, and grouping doesn't seem to return the results I need.
You can use grouping or a collapse filter for that. From my experience collapse filter is much faster than grouping. Here is how it should be used for your case:
fq={!collapse field=articleId max=version}

Resources