How to get text matching percentage in vespa? - vespa

Is there any functionality where we can search a text and we get the matching percentage that how much the test has been matched or string distance between searched text and result text.

See textSimilarity(name).queryCoverage and textSimilarity(name).fieldCoverage for example.
This document lists all available ranking features in Vespa https://docs.vespa.ai/documentation/reference/rank-features.html

Related

Hybris: Solr facet truncate Japanese words

I found a similar issue here.
solr facet search truncate words
When I use solr facet for manufacturer name of products, then although the actual manufacturer name is like "化学商品", but in solr navigation area it shows up as two options: "化学" and "商品", which means it is stemming. For English manufacturer name it is working fine.
I cannot use fieldType string. I am using text.
How do I avoid it for Japanese characters so that it shows only the full manufacturer name. I also tried using the Tokenizer class CJKTokenizerFactory, but it didn't work.
Any help is greatly appreciated!
You cannot use text for facets. If you wish to both search and facet on manufacturer name then extract this information twice, once as string and once as text and use these different representations in appropriate places.

Multiple Full Text Search SQL Queries Merged and Scored (Ranked Search Results)

I have a bunch of articles in one table that I'd like to query for search results. Using Full Text Search I can return a list of items that have the search keywords "near" each other.
Full text search does not seem to allow thesaurus (FORMSOF) with the NEAR delimiter.
What I'd like to do, in SQL, is create a query, or a number of queries, which search the same data, in different ways, and return a score (or RANK if using Full Text Search), then I would like to merge these results so there are no duplicates, and total up the ranks/scores, so that I can ORDER BY those scores.
Add in that I would also like to search a separate link table of "tags" that the documents have been assigned, and also assign extra score for those with corresponding tags.
What is the best practice way of fulfilling these requirements?
Full-text search can do search like ('"word*" near "another*"') in CONTAINSTABLE statement. The asterisk will help to search any words started with 'word' and 'another' near each other with ranking.
On the other side you can launch FORMSOF(Thesaurus, word) AND FORMSOF(Thesaurus, another) search with CONTAINSTABLE statement.
Then MERGE the results and use ORDER BY to sort by both given RANKs.

Solr docs must match one field

I have two fields
text field .. All important fields like category, product name, brand are copied into it.
attributes field .. All attributes are copied into this field.
I have a single search query e.g. "50 mm diameter drill"
I want to search this string in both fields. I am assuming that this will match all products that have drill in the text field.
I want to narrow down the result in case any attributes that match any of 50 mm diameter.
And in case none matches in the attributes field I want to return all documents that match text field.
Edit: I dont want any docs which don't match text field.
I only want that if search is matched to attributes field, and docs are found we return only those docs.
If not found we return all docs which match text field
This is getting a bit tricky and a lot of things depend on your field processing requirements.
You will need to use a combination of field weighting, to rank attributes field higher and edismax minimum match mm
Minimum match allows you to configure how many terms in the query must be hit in order for it to display results. This helps weed out documents that only hit on one term in one field.
Lastly, if you really want to have your own logic in here, you can prepend field with + to make it mandatory. For example +attributes:drill will only return items that have drill in the attributes field.
Whether "drill" will match depends on how your fields are processed, but probably, yes. The easiest way to do this is to not limit by "if not matched here, do this ..", but to score matches in the attributes field higher. You can do this by using qf (if using (e)dismax) together with their weights, such as attributes^20 text which will score any match in attributes 20 times more than a match in text. Any search matching documents with the correct term in attributes will then be scored higher than those just matching in text.
You can also do something similar in the q parameter, where you can weight each term separately: text:drill OR attributes:drill^20.

Solr Custom Boosting if a specific field matches the query

We are trying to implement a very interesting search logic with custom boosting and I am wondering if Solr can support this.
We have the following fields in our index:
Name
Description
Keywords (array)
Each keyword will have an amount(int value) paired to it.
A search is run across Name, description and keywords field. If a keyword matches the search text, the corresponding index must be boosted based on the amount of the matching keyword only.
I've read through Solr DisMax and they can only boost a field using a fixed amount.
My scenario will be to boost the result by X amount based on matching keywords only.
Thanks in advance
The only viable solution i see to this problem (assuming ofcourse you DO NOT know the number of keywords in advance) would be to just make the query as a filter query (to skip the scoring stage), get all documents matching ( a bit problematic), then just sort them on your side using the matched term to build the a java Comparator.
Problems may arise when you get a particularly large number of documents, but you could probably side step this issue by pagination
If you don't have too much different amounts maybe you can try this on index-time:
Store "keywords" in different fields(dynamicfields->boost-*) based on it's amount:
boost-1 = keyword1,keyword4,keyword6 <br/>
boost-10 = keyword2<br/>
boost-100 = keyword5
You can search across all your boost fields(edismax), boost every dynamicfield with his amount in your (e)dismax conf(boost-1^1,boost-10^10,boost-100^100).

Haystack/Solr boosting results if the query is found in a specific field

We're having issues with non relevant results being returned as the highest results in our search and we're trying to improve that behavior, but not really sure how.
We have SearchIndex with about a dozen fields. The document=True field is a template backed field that we have placed the majority of the content into. Some of the stuff found in there is much less relevant than other stuff, even if it's still useful.
To give a concrete example: if a user searches for "red rose", we want to return red roses as the top results...even better if lower results are just roses or just red, or even are described as being "rose red" in color.
The issue is our document=True field has a ton of items that are described as being "rose red". Worse the actual red roses don't have "red" and "rose" particularly close to each other as those values would come from disparate fields. As a result we get the top few hundred results that are completely irrelevant.
What we would like to do is either:
A. Search the primary document and then search each of our other fields and boost (but not hard filter) accordingly. If the term "rose" appears in one of the items names and "red" appears as one of it's attribute values than that result should have a higher score. This gives us the optimal results in theory sorted by relevancy.
B. Search all fields at once and boost if the value is any of the "boosted" fields.
It seems like using field boost should be the answer, but we can't figure out how to express it since filtering based on a field is a harsh exclude and we want it to only impact the relevance scoring.
The result of both of these is effectively the same. We just can't figure out how to do either of them with Haystack. Or if we'd have to fall back to raw queries how to write a solr query that accomplishes this.
I can give you some pointers, as I did not get the exact use case :-
You can check on Solr edismax query parser to configure:-
Fields you want to search on - Mainly to select the results
Variable boost on fields for relevancy - To determine the importance on fields
Variable boost for different words combination e.g. single words, phrase match, shingle match with slop to determine relevancy
Provide additional boost on other fields
This will help you to filter the results and order them accordingly as per the field and word combination matches

Resources