Dynamic query fields in Sor / Macro substitution in solrconfig.xml - solr

We've got a multilingual search index with the "field-per-language" configuration with a lot of similar aliases in the search handler like this:
<str name="f.content_en.qf">Title_en^10 Text_en^1 ...</str>
<str name="f.content_de.qf">Title_de^10 Text_de^1 ...</str>
...
They are used in the q parameter:
<str name="q">{!edismax qf=$searchField pf=$searchField v=$searchText}</str>
The client knows, which language should be used and calls Solr like this, e.g.: /solr/core/search?searchText=TEXT&searchField=content_en
That works fine, but the configuration contains a lot of similar stuff.
I'd like to optimize the config to something like this:
<str name="df">content</str>
<str name="f.content.qf">Title_${lang}^10 Text_${lang}^1...</str>
Then the client would need to provide the lang parameter only.
I tried to use concat function like this:
paramLang=en
searchFields=concat("Title", "_", "${paramLang}", " ", "Text", "_", "${paramLang}")
and use it as the qf:
q={!edismax qf=$searchFields v=$searchText}
But it seems, the local params qf does not support Solr functions.
Is is possible with Solr at all?

Actually, the Parameter substitution / Macro Expansion works fine.
The issue was with those macros in the solrconfig.xml: there is a conflict with Solr system properties substitution. Solr could not resolve the query parameter macros.
I could not find a proper way, how to escape query parameters (macros) and used the following workaround:
<lst name="invariants">
<str name="defType">edismax</str>
<str name="searchFields">
Title_${lang:${lang}}^10
Text_${lang:${lang}}^1
...
<lst name="defaults">
<str name="q">*</str>
<str name="qf">${searchFields:${searchFields}}</str>
<str name="pf">${searchFields:${searchFields}}</str>
<str name="lang">en</str>
...
Query URL: /search?q=TEXT&lang=en
Update: proper way to deal with var substitution in solrconfig.xml - escape the dollar char by $$:
<str name="searchFields">
Title_$${lang}^10
Text_$${lang}^1
...
Update #2: do NOT define macros in the invariant or append sections when using a Solr Cloud! Otherwise, you'll a weird exception, e.g.:
"undefined field: \"Text_$\"
or
"msg": "Error from server at null: org.apache.solr.search.SyntaxError: Query Field '${searchFields}' is not a valid field name"
P.S. wt=json as "invariant" is also NOT compatible with Solr Cloud, giving "unexpected" content-type error.
So many "surprises" :(

Related

How to highlight multiple words using different formatters in Solr?

I need to perform highlighting for multiple words into the same field but for each one using a specific formatter (prefix and postfix).
Let's say that I have the description field and for a document it has the value: Einstein always excelled at math and physics from a young age. How to highlight math with a pair of a specific prefix and postfix AND ALSO physicswith a different prefix-postfix pair? So, in the end I would like to obtain:
Einstein always excelled at <em class="hl-red">math</em> and <em class="hl-green">physics</em> from a young age
The reason is that in the frontend I have different CSS classes with background-color: red; for hl-red and background-color: green for hl-green for example.
However, I was managed to highlight multiple words into the same field but with the same prefix-postfix pair all over the places, which is not what I want actually. In addition, I tried to add multiple HtmlFormatter entries in solrconfig.xml:
<highlighting>
..............
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
<lst name="hl-red">
<str name="hl.simple.pre"><![CDATA[<em class="hl-red">]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
<lst name="hl-green">
<str name="hl.simple.pre"><![CDATA[<em class="hl-green">]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
..............
</highlighting>
but I got HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr: Unknown formatter: hl-green. Also, I didn't find a way to specify an array of prefixes in Solr Admin UI nor in spring-data-solr, just a simple query like this:
SimpleHighlightQuery query = new SimpleHighlightQuery(Objects.requireNonNull(criteria));
HighlightOptions highlightOptions = new HighlightOptions()
.addFields(fields)
.setSimplePrefix(prefix)
.setSimplePostfix(postfix);
query.setHighlightOptions(highlightOptions);
query.setPageRequest(pageable);
return solrTemplate.queryForHighlightPage(MY_CORE, query, MyModel.class);
My assumption is that it is a limitation of the Solr itself.
I was thinking about to write a custom fragmentsBuilder but I do not know exactly if it is the case nor how to do that. For another workaround I was thinking to execute for each word a highlight query, then to store the result, then to execute for the second word another highlight query, store the result and so on. But I don't think it is a good and elegant solution because I will have problems when executing the second query if the second word is: "em" or "class" or "red"/"green" (nested undesired highlighting will occur).
I am using spring-data-solr into a Spring Boot application and Solr 6.6.5 as a (http) service.
Does anyone know how to solve this? Please give me an advice! Any idea will be much appreciated!

How to config Solr to exclude certain documents from search result on any wild card search

To exclude some documents from the search result, I can use the not in or - negative sign to specify the ids like this through a query.
select/?q=:&fq=-id:86+-id:338
But i want to pre configure in solr that on any search the results of certain documents will never show up
You can add a parameter list to a definition for a requestHandler that appends a fq statement to all requests. The example from the wiki does something similar:
<lst name="appends">
<!-- no matter what other fq are also used, always remove these two documents -->
<str name="fq">-id:(86 338)</str>
</lst>
This fq will then always be appended to the request made.
We can also use following :
<lst name="appends">
<str name="excludeIds">86,338</str>
</lst>
I tried and it also gives expected result.

How to set up SOLR parameter substitution in solrconfig.xml

This is my first question at stackoverflow so apologies in advance if I break any rules but I did study them and also made sure this isn't a duplicate question.
So, according to this http://yonik.com/solr-query-parameter-substitution/ one can set up a search handler in solrconfig in a way that the
request handler defaults, appends, and invariants configured for the
handler may reference request parameters
I have the following query which works just fine with curl
curl http://localhost:7997/solr/vb_popbio/select -d 'q=*:*&fq=bundle:pop_sample_phenotype AND phenotype_type_s:"insecticide%20resistance"
&rows=0&wt=json&json.nl=map&indent=true
&fq=phenotype_value_type_s:${PFIELD}&
&PGAP=5&PSTART=0&PEND=101&PFIELD="mortality rate"&
json.facet = {
pmean: "avg(phenotype_value_f)",
pperc: "percentile(phenotype_value_f,5,25,50,75,95)",
pmin: "min(phenotype_value_f)",
pmax: "max(phenotype_value_f)",
denplot : {
type : range,
field : phenotype_value_f,
gap : ${PGAP:0.1},
start: ${PSTART:0},
end: ${PEND:1}
}
}'
I have translated this query to a search handler configuration in solrconfig.xml so a user only has to provide the PFIELD, PGAP, PSTART and PEND parameters. Here's how the configuration for the handler looks
<!--A request handler to serve data for violin plots (limited to IR assays)-->
<requestHandler name="/irViolin" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">0</int>
<str name="df">text</str>
<str name="wt">json</str>
<str name="json.nl">map</str>
<str name="json.facet">{
pmean: "avg(phenotype_value_f)",
pperc: "percentile(phenotype_value_f,5,25,50,75,95)",
pmin: "min(phenotype_value_f)",
pmax: "max(phenotype_value_f)",
denplot : {
type : range,
field : phenotype_value_f,
gap: ${PGAP:0.1},
start: ${PSTART:0},
end: ${PEND:1}
}
}
</str>
</lst>
<lst name="appends">
<str name="fq">bundle:pop_sample_phenotype</str>
<str name="fq">phenotype_type_s:"insecticide resistance"</str>
<str name="fq">has_geodata:true</str>
<str name="fq">phenotype_value_type_s:${PFIELD:"mortality rate"}</str>
</lst>
<lst name="invariants">
</lst>
</requestHandler>
Notice that I provided default values for all the parameters otherwise SOLR will fail to load the configuration. The problem is that using a query like this
curl http://localhost:7997/solr/vb_popbio/irViolin?q=*:*&
&PGAP=5&PSTART=0&PEND=101&PFIELD="mortality rate"
is not working. SOLR will read the request parameters fine (I can see them on the debug output) but will ignore them and use the default values in the configuration instead.
SOLR version is 5.2.1.
I tried moving the configuration parameters to either defaults, appends or invariants but nothing is working. After researching this for the past 2 days I'm almost ready to give up and just build the whole query on-the-fly instead.
Any help will be greatly appreciated.
Many thanks
I think (the post) is too old, but using a search engine I arrived at this page. A simple solution was to escape the dollar symbol. After that, you should achieve your desired result.
Example:
<str name="json.facet">{
pmean: "avg(phenotype_value_f)",
pperc: "percentile(phenotype_value_f,5,25,50,75,95)",
pmin: "min(phenotype_value_f)",
pmax: "max(phenotype_value_f)",
denplot : {
type : range,
field : phenotype_value_f,
gap: $${PGAP:0.1},
start: $${PSTART:0},
end: $${PEND:1}
}
}
</str>
I'm not sure when the Config API came to Solr but if query parameter substitution does work when added to configoverlay.json
{
"requestHandler": {
"/myHandler": {
"name": "/myHandler",
"class": "solr.SearchHandler",
"defaults": {
"fl": "id,name,color,size",
},
"invariants": {
"rows": 10,
},
"appends": {
"json": "{filter:[\"color:${color:red}\",\"size:${size:M}\"]}"
}
}
}
}
Now you can pass URL parameters &color=green&size=XXL to the /MyHandler query.

Solr adds unwanted MatchAllDocsQuery and I don't know why

In my company we have a test string, which we use to ensure escaping issues are handled correctly throughout our many components:
!"§$%&/()?ß><öä€ü\ÖÄÄÜ#'
When I add a document to Solr with that title, all is well.
I now try to query that document using the same string, but with all special query parameters escaped (see here for details):
!\"§$%&\/\(\)\?ß><öä€ü\\ÖÄÄÜ#'
Surprisingly, all documents in my index match that query!
I can see in the debug output (see below), that Solr adds a MatchAllDocsQuery after my actual query. That's why all documents match, but the big question is:
Why does Solr add that match-all query? It doesn't make any sense to me.
Funnily enough, when I remove one of the escaping backslashes (e.g. the very first one before the double-quote), the query works like a charm and only finds my one expected document. For whatever reason, Solr then does not add that match_all query anymore.
!"§$%&\/\(\)\?ß><öä€ü\\ÖÄÄÜ#'
Any ideas???
Debug info:
"rawquerystring": "!\\\"§$%&\\/\\(\\)\\?ß><öä€ü\\\\ÖÄÄÜ#'",
"querystring": "!\\\"§$%&\\/\\(\\)\\?ß><öä€ü\\\\ÖÄÄÜ#'",
"parsedquery": "(+(-DisjunctionMaxQuery((((de_all:ss de_all:oa de_all:u >de_all:oaau)~4) | ((en_all:ß en_all:öä en_all:ü en_all:öääü)~4) | string_all:\"§$%&/()?ss><oa€u\\oaau#')) +MatchAllDocsQuery(*:*)))/no_coord",
"parsedquery_toString": "+(-(((de_all:ss de_all:oa de_all:u de_all:oaau)~4) | ((en_all:ß en_all:öä en_all:ü en_all:öääü)~4) | string_all:\"§$%&/()?ss><oa€u\\oaau#') +*:*)"
Request handler:
<requestHandler name="/custom" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">none</str>
<str name="wt">json</str>
<str name="defType">edismax</str>
<str name="qf">de_all^1 en_all^1 string_all^1</str>
<str name="fl">id,score</str>
<str name="indent">false</str>
</lst>
</requestHandler>
If you need any other info, please let me know!
Ahh, stupid mistake: I forgot to escape the leading '!', which makes this a query with a single negated phrase. AFAIK those are handled internally with a match all query.

OR query in Solr not working

My Solr server seems only working with AND but not OR, e.g.
/solr/select?q=(marsden AND emma)&qt=yodl_handler works, but
/solr/select?q=(marsden OR mackey)&qt=yodl_handler doesn't.
For each individual query, it returns resutls, e.g.
/solr/select?q=marsden&qt=yodl_handler returns 2 results
/solr/select?q=mackey&qt=yodl_handler returns 3 results
Any suggestions are appreciated!
Here is the definition of yodl_handler:
<requestHandler name="yodl_handler" class="solr.DisMaxRequestHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">
dc.title^1 dc3.title^1 dc.type^1 vra2.logical.creator^1 vra2.image.agent.name^1 vra2.image.agent.role^1 vra2.image.agent.role^1 vra2.image.location.name^1 vra2.image.rights.rightsholder^1 vra2.image.source.refid^1 vra2.image.title^1 vra2.image.worktype^1 vra2.work.agent.attribution^1 vra2.work.agent.name^1 vra2.work.agent.role^1 vra2.work.culturalContext^1 vra2.work.description^1 vra2.work.location-type^1 vra2.logical.location.place^1 vra2.logical.location.name^1 vra2.work.location.refid^1 vra2.work.material^1 vra2.work.rights.rightsHolder^1 vra2.work.StylePeriod^1 vra2.work.subject.term.name^1 vra2.work.subject.term.place^1 vra2.work.subject.term.keyword^1 vra2.work.technique^1 vra2.work.title^1 vra2.work.worktype^1 iris2.instrument.instrumentType^1 iris2.instrument.primaryInstrumentType^1 iris2.instrument.secondaryInstrumentType^1 iris2.instrument.instrumentType.alltypes^1 iris2.instrument.author^1 iris2.references.allauthors^1 iris2.instrument.researchArea^1 iris2.instrument.typeOfFile^1 iris2.instrument.software^1 iris2.instrument.dataType^1 iris2.instrument.linguisticTarget^1 iris2.instrument.sourceLanguage^1 iris2.instrument.funder^1 iris2.instrument.licence^1 iris2.participants.participantType^1 iris2.participants.firstLanguage^1 iris2.participants.targetLanguage^1 iris2.participants.gender^1 iris2.participants.proficiencyLearner^1 iris2.participants.proficiencyStudentsTaught^1 iris2.participants.yearsOfTeachingExperience^1 iris2.participants.domainOfUse^1 iris2.references.publicationType^1 iris2.references.author^1 iris2.references.author.lastnames^1 iris2.references.booktitle^1 iris2.references.journal^1 iris2.references.publicationDate^1 iris2.references.publicationLatestDate^1 iris2.references.publisher^1 iris2.references.placeOfPublication^1 iris2.references.editor^1 iris2.references.conferenceName^1
</str>
<int name="ps">100</int>
<str name="q.alt">*:*</str>
<str name="hl.fl">text features name</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
<str name="f.text.hl.fragmenter">regex</str>
</lst>
</requestHandler>
The simple answer is, the Solr DisMax query parser does not support boolean logic in queries. The appearance of it working with the query involving "AND" is probably a side-effect of the way your fields are indexed (stop words?) or appears in the underlying data.
You can get a better idea what's happening under the hood if you send the debugQuery parameter, e.g.:
/solr/select?q=(marsden AND emma)&qt=yodl_handler&debugQuery=true
There's further documentation on the Solr wiki about the Dismax parser:
The Dismax query parser supports an extremely simplified subset of the Lucene QueryParser syntax. Quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses ... but all other Lucene query parser special characters are escaped to simplify the user experience. (see DisMaxQParserPlugin)
The good news is, if you're using a modern version of Solr (3.1+), you have access to the new ExtendedDisMax parser, which DOES support boolean queries. `

Resources