How to penalize a document for a particular value in solr? - solr

I am trying to implement a jobsearch in solr.
What I want is to boost the title and keyword field.
And also to negatively boost the those documents in which location is Anywhere.
For example :
I searched for "Perl" and Location "Mumbai"
The The result must contain all resumes with Perl in their title or keyword and location "Mumbai or Anywhere".
But Resume with Anywhere field must come Last.
I made the following query:
((((perl)) AND ( (perl) ttl:(perl)^5 kw:(perl)^2) )
AND (( pref:(Mumbai) (pref:Anywhere)^0.000000001)) )
But It is not giving proper result.
Please suggest.

One way to fake a "negative boost" is to give a large boost to everything that does not match. You can do this something like this with you query (not tested , so experiment with this) :
((((perl)) AND ( (perl) ttl:(perl)^5 kw:(perl)^2) )
AND (( pref:(Mumbai) (*:* -pref:Anywhere)^999 ) )
Here is more about it: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F

Related

How to get regular expressions working in the filter clause in Azure Cognitive Search?

I cant seem to get the filter clause to retrieve documents from my index using a regex clause. The schema for my index is straight forward, I only have a single field which is both searchable and filterable and is of type Edm.String, called someId (which normally contains a hash value of something) and has sample values like:
someId
k6l7k2oj
k6l55iq8
k6l61ff8
...
I need to be able to extract all values from this field that start with K6 and end with 8. So based on the documentation I am using this in my POST request body
{
"filter": "search.ismatch('/^k6[?]*d$/','someId','simple','all')",
"select":"someId",
"count":"true"
}
and it comes up with nothing.
On the other hand if I simplify and say I only need data where someId starts with K6, I seem to get some success if i just use a wild card.
like this:
{
"filter": "search.ismatch('k6l*','someId','simple','all')",
"select":"someId",
"count":"true"
}
I do get what I am looking for. Question is why does the regex not work with search.isMatch(), what am i missing?
...
Regex is part of the full Lucene syntax; It is not available in the simple syntax. Try changing the third parameter of search.ismatch to 'full'.
Also, did you mean to use search.ismatch or search.ismatchscoring? The latter is functionally equivalent to using the top-level search, searchFields, queryType, and searchMode parameters. The former does not count matches towards relevance scoring.
Your regex does not do what you intend either it seems. I tested your regex with your sample data and it does not match. Try this regex instead:
^k6.{5}8$
It matches a lowercase k6 from the start of the string, followed by 5 characters of anything and finally an 8.
Complete example
{ "filter": "search.ismatch('^k6.{5}8$','someId','full','all')", "select":"someId", "count":"true" }
Thanks to Dan and Bruce.
This exact expression worked for me
{
"filter": "search.ismatch('/k6.{5}8/','someId','full','all')",
"select":"someId",
"count":"true"
}

Entity Framework complex search function

I'm using Entity Framework with a SQL Express database and now I have to make a search function to find users based on a value typed in a textbox, where the end user just can type in everything he wants (like Google)
What is the best way to create a search function for this. The input should search all columns.
So for example, I have 4 columns. firstname,lastname,address,emailaddress.
When someone types in the searchbox foo, all columns need to be searched for everything that contains foo.
So I thought I just could do something like
context.Users.Where(u =>
u.Firstname.Contains('foo') ||
u.Lastname.Contains('foo') ||
u.Address.Contains('foo') ||
u.EmailAddress.Contains('foo')
);
But... The end user may also type in foo bar. And then the space in the search value becomes an and requirement. So all columns should be searched and for example firstname might be foo and lastname can be bar.
I think this is to complex for a Linq query?
Maybe I should create a search index and combine all columns into the search index like:
[userId] [indexedValue] where indexedValue is [firstname + " "+ lastname + " "+ address +" " + emailaddress].
Then first split the search value based on spaces and then search for columns that have all words in the search value. Is that a good approach?
The first step with any project is managing expectation. Find the minimum viable solution for the business' need and develop that. Expand on it as the business value is proven. Providing a really flexible and intelligent-feeling search capability would of course make the business happy, but it can often not do what they expect it to do, or perform to a standard that they need, where a simpler solution would do what they need, be simpler to develop and execute faster.
If this represents the minimum viable solution and you want to "and" conditions based on spaces:
public IQueryable<User> SearchUser(string criteria)
{
if(string.IsNullOrEmpty(criteria))
return new List<User>().AsQueryable();
var criteriaValues = criteria.Split(' ');
var query = context.Users.AsQueryable();
foreach(var value in criteriaValues)
{
query = query.Where(u =>
|| u.Firstname.Contains(value)
|| u.Lastname.Contains(value)
|| u.Address.Contains(value)
|| u.EmailAddress.Contains(value));
}
return query;
}
The trouble with trying to index the combined values is that there is no guarantee that for a value like "foo bar" that "foo" represents a first name and "bar" represents a last name, or that "foo" represents a complete vs. partial value. You'd also want to consider stripping out commas and other punctuation as someone might type "smith, john"
When it comes to searching it might pay to perform a bit more of a pattern match to detect what the user might be searching for. For instance a single word like "smith" might search an exact match for first name or last name and display results. If there were no matches then perform a Contains search. If it contains 2 words then a First & last name match search assuming "first last" vs. "last, first" If the value has an "#" symbol, default to an e-mail address search, if it starts with a number, then an address search. Each detected search option could have a first pass search (expecting more exact values) then a 2nd pass more broad search assumption if it comes back empty. There could be even 3rd and 4th pass searches available with broader checks. When results are presented there could be a "more results..." button provided to trigger a 2nd, 3rd, 4th, etc. pass search if the returned results didn't return what the user was expecting.
The idea being when it comes to searching: Try to perform the most typical, narrow expected search and allow the user to broaden the search if they so desire. The goal would be to try and "hit" the most relevant results early, helping mold how users enter their criteria, and then tuning to better perform based on user feedback rather than try and write queries to return as many possible hits as possible. The goal is to help users find what they are looking for on the first page of results. Either way, building a useful search will add complexity of leverage new 3rd party libraries. First determine if that capability is really required.

LIKE query on elements of flat jsonb array

I have a Postgres table posts with a column of type jsonb which is basically a flat array of tags.
What i need to do is to somehow run a LIKE query on that tags column elements so that i can find a posts which has a tags beginning with some partial string.
Is such thing possible in Postgres? I'm constantly finding super complex examples and no one is ever describing such basic and simple scenario.
My current code works fine for checking if there are posts having specific tags:
select * from posts where tags #> '"TAG"'
and I'm looking for a way of running something among the lines of
select * from posts where tags #> '"%TAG%"'
SELECT *
FROM posts p
WHERE EXISTS (
SELECT FROM jsonb_array_elements_text(p.tags) tag
WHERE tag LIKE '%TAG%'
);
Related, with explanation:
Search a JSON array for an object containing a value matching a pattern
Or simpler with the #? operator since Postgres 12 implemented SQL/JSON:
SELECT *
-- optional to show the matching item:
-- , jsonb_path_query_first(tags, '$[*] ? (# like_regex "^ tag" flag "i")')
FROM posts
WHERE tags #? '$[*] ? (# like_regex "TAG")';
The operator #? is just a wrapper around the function jsonb_path_exists(). So this is equivalent:
...
WHERE jsonb_path_exists(tags, '$[*] ? (# like_regex "TAG")');
Neither has index support. (May be added for the #? operator later, but not there in pg 13, yet). So those queries are slow for big tables. A normalized design, like Laurenz already suggested would be superior - with a trigram index:
PostgreSQL LIKE query performance variations
For just prefix matching (LIKE 'TAG%', no leading wildcard), you could make it work with a full text index:
CREATE INDEX posts_tags_fts_gin_idx ON posts USING GIN (to_tsvector('simple', tags));
And a matching query:
SELECT *
FROM posts p
WHERE to_tsvector('simple', tags) ## 'TAG:*'::tsquery
Or use the english dictionary instead of simple (or whatever fits your case) if you want stemming for natural English language.
to_tsvector(json(b)) requires Postgres 10 or later.
Related:
Get partial match from GIN indexed TSVECTOR column
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

prolog avoiding duplicate predicates

I was wondering whether it is possible to test whether a predicate already exists (with the same information) to then avoid the user being able to input the same information again.
I have already managed to do it for a single predicate:
:- dynamic(test/2).
test(a,b).
top(X,Y) :-
(test(X,Y),
write('Yes'),!
;write('No'),!
).
This version works just fine, returning 'Yes' if the information already exists and 'No' if it doesn't.
I was wondering whether it would be possible to do this for multiple prediactes, not just for 'test/2';
I have tried to replace the predicate 'test' with a variable Pred but unfortunately I get a syntax error when I try to compile it.
Here is my attempt:
main(Pred,X,Y) :-
(Pred(X,Y),
write('Yes'),!
;write('No'),!
).
Is it even possible to do something like this and if it is how would it be possible?
Btw I am using GNU Prolog if it helps.
Thank you very much for your help :D !!
You want call/2, to call a dynamic goal with arguments, evaluated at runtime. In your case, it would be call(Pred,X,Y):
main(Pred,X,Y) :-
(
call(Pred,X,Y),
write('Yes'),!
)
;
(
write('No'),!
).
Do note that Pred/2 must resolve to an actual predicate at runtime, and you will need to build a different rule for each number of arguments.
#Tomas-By's answer, using (=..)/2 lets you create a single rule, with a list of args, but with the same caveats regarding predicates existing, albeit with an extra line:
main(Pred,L) :- % where L is a list of args [X,Y|...]
Term =.. [Pred | L],
(
Term,
write('Yes'),!
)
;
(
write('No'),!
).
And, as pointed out in the comments by #lurker, in either instance, using (->)/2:
(call(Pred,X,Y) -> write('Yes') ; write('No'))
or
(Term -> write('Yes') ; write('No'))
may be preferable as the destruction of choice points is limited to the if->then;else structure.
There is an operator =.. for constructing terms, as in:
Term =.. [Op,V1,V2]
not sure if that is in Gnu Prolog.
Using Sicstus:
xxx(1,2).
check(Pred,X,Y) :-
Term =.. [Pred,X,Y],
( Term ->
write('Yes')
; write('No') ).
and after loading the file:
| ?- check(xxx,1,2).
Yes

SOLR (3.1+) - Multiple Spatial Queries with OR in Same Request

Is it possible to conduct multiple spatial queries within the same SOLR (3.1+) request?
We currently have a need to allow user to search for inventory with a location of their choice via a frontend search form. But we want to also add another spatial search behind the scenes so it will include more inventory. The resulting search would result in a venn diagram type of search.
Edit 10.4.2011
Example construct: q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20_query_:(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
The above construct does not work, but hopefully demonstrates what I am trying to accomplish.
This is old, but it doesn't seem like it ever got a full answer. I had the same issue and found that this syntax works:
q =*:*& fq = (({
!geofilt sfield = Location pt = 40.68063802521456,
-74.00390625 d = 80.4672
}
AND ClientId : "client1")OR({
!geofilt sfield = Location pt = 36.1146460,
-115.1728160 d = 80.4672
}
AND ClientId : "client2"))
It looks like, you like to run N querys in one request in order to get one result set per query?!
So Field Collapsing ( http://wiki.apache.org/solr/FieldCollapsing ) is what you are looking for. Unfortunately FieldCollapsing is only available from 3.3.
Depending on your needs, maybe counted results from different faceted searches could be also useful?!
What if you moved your second location query into an additional filter query, like below:
q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)&fq={!geofilt}&sfield=Location&(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
Will that provide the results that you are looking for? It might end up being too limiting, but thought it was worth trying.
You might also try:
q=*:*&fq={!geofilt}&sfield=Location&((ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672))

Resources