In a Solr Implementation, I am trying to do some conditional highlight depending on others fields than the one we search on.
I want to get the matching result a field "content" highlighted only if it is indicated in Solr that this field can be exposed for this element.
Given a Solr base populated with :
[{ firstname:"Roman",
content: "A quick response is the best",
access:"" },
{ "firstname":"Roman",
"content": "Responsive is important",
"access":"contentAuthorized" }
]
I would like to get both document in my answer, and the highlight on the "content" field only for the one with the data "access":"contentAuthorized", so I am executing the query:
q:(firstname:r* OR (+tags:contentAuthorized AND +content:r*))
The expected answer would be:
...
{
{
"firstname":"Roman"
},
{
"firstname":"Roman"
}
},
highlighting":{
"0f278cb5-7150-42f9-8dca-81bfa68a9c6e":{
"firstname":["<em>Roman</em>"],
"105c6464-0350-4873-9936-b46c39c88647":{
"firstname":["<em>Roman</em>"],
"content":["<em>Responsive</em> is important],
}
}
But I actually get:
...
{
{
"firstname":"Roman"
},
{
"firstname":"Roman"
}
},
highlighting":{
"0f278cb5-7150-42f9-8dca-81bfa68a9c6e":{
"firstname":["<em>Roman</em>"],
"content":["A quick <em>response</em> is the best"],
"105c6464-0350-4873-9936-b46c39c88647":{
"firstname":["<em>Roman</em>"],
"content":["<em>Responsive</em> is important],
}
}
So, I get the "content" on the highlight of the second element while (+tags:contentAuthorized AND +content:r*) is false.
Does anyone have an idea of how I could do conditional highlighting with Solr so ?
Thank you for reading this and for taking your time to think about it :D
If you want highlighting to be applied on certain fields only, then you need to set the query parameter hl.fl to those fields. In your case hl.fl=content. You should then set hl.requireFieldMatch=true.
Refer to Solr Highlighting documentation:
By default, false, all query terms will be highlighted for each field to be highlighted (hl.fl) no matter what fields the parsed query refer to. If set to true, only query terms aligning with the field being highlighted will in turn be highlighted.
For further info on how to use the query parameters: https://solr.apache.org/guide/8_6/highlighting.html
Related
I have an array field in document named as IP which contains above 10000 ips as element.
for e.g.
IP:["192.168.a:A","192.168.a:B","192.168.a:C","192.168.A:b"...........]
Now i made a search query with some filter and i got the results but the problem is size of result very huge because of above field.
Now I want to fetch only N ips from array let say only 10 order doesn't matters.
So How do i do that...
update:
Apart from IP field there are others fields also and i applied filter on that field not on IP .I want whole document which satisfies filters .I just want to limit the number of elements in single IP fields.(Let me know if there is any other way apart from using script also ).
This kind of request could solve your problem :
GET ips/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"truncate_ip": {
"script": {
"source": """
String[] trunc_ip = new String[10];
for (int i = 0; i < 10; ++i) {
trunc_ip[i]= params['_source']['IP'][i];
}
return trunc_ip;
"""
}
}
}
}
You can use scriptedFields for generating a new field from existing fields in Elastic Search. Details added as comments.
GET indexName/_search
{
"_source": {
"excludes": "ips" //<======= Exclude from source the IP field (change the name based on your document)
},
"query": {
"match_all": {} // <========== Define relevant filters
},
"script_fields": {
"limited_ips": { // <========= add a new scipted field
"script": {
"source": "params['_source'].ips.stream().limit(2).collect(Collectors.toList())" // <==== Replace 2 with the number of i.ps you want in result.
}
}
}
}
Note:
If you remove _source then only the scripted field will be the part of the result.
Apart from accessing the value of the field, the rest of the syntax is Java. Change as it suits you.
Apart from non-analyzed text fields, use doc['fieldName'] to access the field with-in script. It is faster. See the below excerpt from E.S docs :
By far the fastest most efficient way to access a field value from a
script is to use the doc['field_name'] syntax, which retrieves the
field value from doc values. Doc values are a columnar field value
store, enabled by default on all fields except for analyzed text
fields
By default ES returns only 10 matching results so I am not sure what is your search query and what exactly you want to restrict
no of elements in single ip field
No of ip fields matching your search results
Please clarify above and provide your search query to help further.
I'm trying to perform faceting based on a dynamic value. Basically I want identical behavior to the def function, but that doesn't seem to be available with faceting.
Consider these two "products":
{
"id":"product1",
"defaultPrice":19.99,
"overridePrice":14.99
},
{
"id":"product2",
"defaultPrice":49.99
}
I want to add that overridePrice is just an example. The actual field is a dynamic value that will depend on what context a search is performed in, and there may be many overridden prices, so I can't just derive price at index time.
For the response, I'm doing something like this for fl:
fl=price:def(overridePrice, defaultPrice) and using the same def function to perform sorting on price. This works fine.
So now I want to apply the same logic to facets. I've tried using json.facet, which seemed like it would work:
json.facet={
price: "def(overridePrice, defaultPrice)"
}
I've tried other variations as well, such as using field:def(overridePrice, defaultPrice) as well as field:price, but def doesn't seem to be an available function for faceting, and the price derived field is not available when faceting.
So the question: How can I perform faceting based on a default field like I'm doing for fl and sorting? Will this require a custom aggregation function, or is there a clever way I can do this without a custom function? It would be much more preferable to be able to do this with built-in Solr functionality.
I was able to do a hacky solution based on a tip in another question.
We can use two facets with a query to filter documents depending on if a field exists.
Example:
{
price_override: {
type: query,
q: "overridePrice:[* TO *]",
facet: {
price_override:{
type:terms,
field: overridePrice
}
}
},
price_standard: {
type: query,
q: "-overridePrice:[* TO *] AND defaultPrice:[* TO *]",
facet: {
price_standard: {
type: terms,
field: defaultPrice
}
}
}
}
Explanation:
price_override: {
type: query,
q: "overridePrice:[* TO *]"
This range query only selects documents that have an overridePrice field.
price_standard: {
type: query,
q: "-overridePrice:[* TO *] AND defaultPrice:[* TO *]"
-overridePrice:[* TO *] omits documents with the overridePrice field, and selects documents with a defaultPrice field.
And the facet response:
"facets":{
"count":2,
"price_override":{
"count":1,
"price_override":{
"buckets":[{
"val":14.99,
"count":1}]}},
"price_standard":{
"count":1,
"price_standard":{
"buckets":[{
"val":49.99,
"count":1}]}}}
This does require manually grouping price_override and price_standard into a single facet group, but the results are as expected. This could also pretty easily be tweaked into a ranged query, which is my use case.
I have these fields in Solr
"IsFair": "True",
"IsHeight" : "True",
"IsFat" : "false",
"IsManly" : "False"
But while filtering data I want them to display as Fair, Height, Fat, Manly in a single field at the front end.
Something like FilterName "Appearance Type" and in that, I should have "Fair", "Height", "Fat", "Manly" as filters. Someone suggested me to use Hybrid Filter but I didn't understand how to achieve this.
I think the best way of implementation will be to create a multivalued field appearance_type of type String and generate facet on the field appearance_type, later while applying the filter you can use the same field.
So your example document will have,
{
"id":"doc1",
"appearance_type":["fair","height"]
}
I am trying to do a full-text search on a field of some documents, and I was looking for your advices on how to do so. I first tried to do this type of request:
GET http://localhost:8080/search/?query=lord+of+the+rings
But it was returning me the documents where the field was an exact match and contained no other information than the given string , so I tried the equivalent in YQL:
GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text CONTAINS "lord of the rings";
And I had the exact same results. But when further reading the documentation I fell upon the MATCHES instruction, and it indeed gives me the results I'm seem to be looking for, by doing this kind of request:
GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";
Though I don't know why, for some requests of this type I encountered a timeout error of this type:
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 0
},
"errors": [
{
"code": 12,
"summary": "Timed out",
"source": "site",
"message": "Timeout while waiting for sc0.num0"
}
]
}
}
So I solved this issue by adding greater than default timeout value:
GET http://localhost:8080/search/?yql=SELECT * FROM site WHERE text MATCHES "lord of the rings";&timeout=20000
My question is, am I doing full-text search the right way, and how could I improve it ?
EDIT: Here is the corresponding search definition:
search site {
document site {
field text type string {
stemming: none
normalizing: none
indexing: attribute
}
field title type string {
stemming: none
normalizing: none
indexing: attribute
}
}
fieldset default {
fields: title, text
}
rank-profile post inherits default {
rank-type text: about
rank-type title: about
first-phase {
expression: nativeRank(title, text)
}
}
}
What does your search definition file look like? I suspect you have put your text content in an "attribute" field, which defaults to "word match" semantics. You probably want "text match" semantics which means you'll need to put your content in an "index" type field.
https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#match
The "MATCHES" operator you are using interprets your input as a regular expression, which is powerful, but slow as it applies the regular expression on all attributes (further optimizations to something like https://swtch.com/~rsc/regexp/regexp4.html are possible but not currently implemented).
i am using solr for search and i implemented highlighting feature for my search results.
when i my search string is ring it highlight ring but when search string is "gold ring" than also it highlight only gold, where i wanted to highlight whole gold ring
for highlighting i use description field which i got as
highlighting ={
"8252": {
"text": [
" and <em>gold</em><em>Ring</em> design was finely crafted in Japan."
]
},
"8142": {
"text": [
"This <em>elegant</em> <em>Ring</em> has an Akoya cultured pearl with a band of bezel-set round diamonds making"
]
}
};
Now i am parsing it as
$.each(newresult.response.docs, function(i,item){
var word = highlight[item["UID_PK"]];
var result="";
var j=0;
for (j=0 ;j<=item.text.length;j++)
{
result = result+item.text[j]+"<br>";
}
result=result.replace(word,'<em>' + word + '</em>');
});
Now how should i parse so that i got gold ring highlighted
Have you passed -
hl.highlightMultiTerm=true
in query parameters? This value tells solr to highlight multi term. Value of this parameter is by default set to false. So, you need to make it true by passing it with query parameter.
For more reference please visit Solr Documentation.