Problem about using semantic-rules with space trems

Problem about using semantic-rules with space trems - vespa

We reference https://docs.vespa.ai/documentation/reference/semantic-rules.html
When we set the statement like this:
i7 +> ?"iphone 7";
It works.
However, when we are trying to state like this:
iphone 7 +> ?i7;
It fails when we searched it.
Do you have any advice about setting the term with space?

Nothing special is needed to handle space since the rules match multiple consecutive terms by default. If you want to match with gaps you need to add a "...", e.g "iphone ... 7". So
iphone 7 +> ?i7;
will rewrite a query containing the consecutive terms 'iphone' and '7' to
"OR (AND iphone 7) i7" - i.e it will match documents containing either "iphone 7" or "i7".
You can see this by adding &tracelevel=3 to the query url.

is any issue abount utf-8 term not english term ?
we are using vespa in Taiwan, use Traditional Chinese terms.
we found that iphone 7 +> ?EnglishTerm is OK
but iphone 7 +> ?ChineseTerm is not work for us

Related

Solr how get top keyword without common words

I am using Solr and I would like to get the top 10 keywords of all my dataset, without the common words (like "I", "go", "the"...).
I used the "facet.excludeTerms", but there is too much to list all the common words in the query.
For now, I used the facet parameters in my query :
http://localhost:8983/solr/<my_core>/select?facet=true&facet.field=content&facet.limit=10&facet.minCount=1&facet.excludeTerms=I,go,the&q=content:(%2A)&rows=1
My dataset can contain data from many different languages (English, French, Spanish...), so I can't use the OpenNLPTokenizer in my schema, because it is language specific, and I don't know in advance what language is gonna be inserted.
I'm also trying something with tf-idf, but nothing right for now.
http://localhost:8983/solr/<my_core>/select?fl=idf(content,'covid')&idf(content,'and')&tf(content,'covid')&tf(content,'covid')&q=*:*&fl=score&debugQuery=true
I don't understand the mean of idf :
"covid" gets 5.2 -> interesting word - OK
"and" gets 7.3 -> common word - KO
No really big difference between the 2 values, so how can I use them ?
And all tf values are 0 :(.
Any idea please ?

Solr Search Query for word must occur after specific position

I want to write Solr query for something as below :
apple w/5 pear - apple must occur within 5 words of pear
How i can achieve the same ? Is their any regex for same ?

This can be done by enclosing the terms in quotations and using a tilde to indicate proximity:
q="apple pear"~5
If just the quotations are present, that is the same as ~0, that is, they should be next to each other.

What is the difference between city:Athens vs city:Athens ? in the q box of Solr Admin panel?

What is the difference between city:Athens vs city:*Athens* , in the q box of Solr Admin panel. Why are the results fetched different (asterisc results are greater in numbers). Is one super set of the other or are they entirely different sets ?

Asterisk brings the super set of the other i.e. it will bring all the results containing the word athens
for ex:city:*Athens* will bring results like heathens, preathens ,athens is good
while city:Athens will bring the results like athens is good.
Using wild cards can cause problems at times this links provides the knowledge of how the solr handles the wildcards:http://lucidworks.com/blog/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
And this is one of the common problems when using wildcards:
Solr: Using a wildcard on a string with whitespace

In Solr, expanding multi-word synonyms and term positions

I have a synonym file, used at index time, that contains this equivalence:
uc, university of california
I then looked at how indexing "uc berkeley" would look on analysis.jsp. I was surprised:
org.apache.solr.analysis.SynonymFilterFactory {synonyms=companysyns.txt, expand=true, ignoreCase=true, luceneMatchVersion=LUCENE_36}
position 1 2 3
term text university berkeley california
uc of
type SYNONYM word SYNONYM
SYNONYM SYNONYM
startOffset 0 3 3
0 3
endOffset 2 11 11
2 11
Note that "berkeley" appears in between "university" and "california". This has meant that, when I search for "university of california berkeley", I don't get a match. But "university berkeley california" works!
How can I make sure "university of california berkeley" works properly?
Thanks!

I am facing a similar problem where the highlighted response has the wrong words highlighted. I am using solr 3.6
In my use case have synonyms configured at the indexing side, with expand=true.
For example if I have the following in the synonyms.txt,
dns, domain name system
and I index something like "A sample dns entry that works" . When I search for "name" (without quotes) in the highlighted response I get "A sample dns entry that works". As you can see the word entry is highlighted.
Also a search for "system" results in "A sample dns entry that works"

Looks like a known problem. There is a fix mentioned (setting luceneMatchVersion to LUCENE_33). Not sure if it works for you. Let's hope they fix it soon.

Solr complicated faceting

I have problems with faceting. Imagine this situation. Product can be in more than one category. This is common behavior for faceting:
Category
Android (25)
iPhone (55)
other (25)
Now when I select "Android", I make new query with "fq" => "category:Android", I will get:
Category
Android
iPhone (15)
other (2)
But this means that there is 15 products, that are in categories "Android" AND "iPhone". I would like something like this: ("Android" OR "iPhone")
Category
Android
iPhone (+5)
other (+1)
Meaning I will get 25 results by selecting "Android (25)" and another 5 by selecting "iPhone (+5)", so finally I will get 30 search results..
Does anyone know if this is possible with SOLR's faceting? Or perhaps with more than one query and calculate it manually?
Thanks for advice!

Try a new query with the negative of the selections, like "fq" => "-category:Android" - you should then get the facet counts you are looking for.

Depending on all the permutations you need, you probably want to look into query facets that enable you to get counts for arbitrary queries. For instance, you can do facet.query=category:("Android" OR "iPhone") and get a count results keyed on category:("Android" OR "iPhone"). And, you can do this for any number of queries you want counts for. So, in your case, you can probably get to a final solution with some combination of straight field facets and query facets.
Edit: Re-reading you question, you may also want to look into tagging and excluding parts of an extra fq, depending on how you are allowing your users to "select into" the choices. (The example in the docs is fairly close to your original setup, although I'm not sure the end behavior is exactly as you desire).